PORTFOLIO

This is the web scraper project that employs the scraper on the Ofsted website for primary and secondary public school. The project came as an idea of combining data mining/ pipeline and education: the goal is an automation of the data collection (school info, rating, snapshot) or update, storage, remote monitoring. The pipeline uses various techniques and engines to collect, store, process, display information about data and system metrics as well.

Technologies Used:

Python 3.9.7
Chromedriver: latest
Chrome: latest
AWS S3
AWS RDS and Pg4Admin/ PostgreSQL
AWS EC2 instance
Docker and Dockerd set up on the EC2 instance
Prometheus
Node-Exporter
Grafana Dashboard

Football Match Outcome Prediction, Python

The Football Match Outcome Prediction project: the user processes a large number of files that contain information about football matches that have taken place since 1990. The data has to be cleaned so it can be fed to the model. Then, different models are trained with the dataset, and the best performing model is selected. The hyperparameters of this model are tuned, so its performance is improved.

Technologies Used:

Pandas
Seaborn
Selenium
Webdriver / Chrome
SciKit Learn

Trained Models:

Logistic Regression
Random Forest
Decision Tree
SVM
AdaBoost on Decision Tree
AdaBoost on Logistic Regression
Gradient Boost
MLP

Human Activity Recognition Project (ML) using R

The goal of the project is to predict the manner in which the participants did the exercise. Furthermore, data is split into two sets for training and testing purposes. After the initial data overview and preparation the machine learning algorithm is applied to obtain satisfying level of performance first on the training set and further on the test set.

Trained Models:

Decision Tree (rpart)
Random Forest (rf)
Stochastic Gradient Boosting Models

Confusion matrix and accuracy levels used for evaluation

A/B Testing, conversion rate test with Z-stat

The goal of the project: to identify if a change to the web page increase the outcome of an interest.

Task: suppose you are working for an e-commerce company and the marketing team is trying to decide if they should launch a new webpage.

They ran an A/B test and need help analyzing the results.

They provided you with this dataset, which contains the following fields:

user_id: the user_id of the person visiting the website
timestamp: the time in which the user visited the website
group: treatment vs control, treatment saw the new landing page, control saw the old landing page
landing_page: new vs old landing page, labeled 'new_page'/'old_page'
converted: 0/1 flag denoted whether the user visiting the page ended up converting

Given this information, you're asked to come up with a recommendation for the marketing team -- should the marketing team adopt the new landing page?

The team wants the landing page with the highest conversion rate.

Project summary:

data preparation: NA values removal, align values

define statistics: sample size, alpha value

visualisation: sns.kde plot to show probabiity distribution of two sample sets

apply z-stat to evaluate p-value

RESULT: calculated p-value is above alpha value, therefore it is recommended to not proceed with the new_page launch.

PORTFOLIO

Data Pipeline, Python

Football Match Outcome Prediction, Python

Human Activity Recognition Project (ML) using R

A/B Testing, conversion rate test with Z-stat

Create Your Own Website With Webador