This repository is designed to try and demonstrate my ability to do data analysis and data science projects. This will mainly focus on Machine Learning as my PhD demonstates data analysis.
There is a series of projects that I did as part of Codecademy is its own directory. There is also the repository which was created to complete the IBM data science course. These demonstrate a series of skills needed for data science.
What I view as the best projects are:
- Regression: HousingPrices has the best coding but LaptopPrice is the most original
- Classification: Titanic Survial hass all the descriptions but pokemonClassification is the most original
- Clustering customer-segmentation-clustering
- Data Cleaning NuclearMissle
- EDA netflix-eda
Explainations Included - HousingPrice
Using regression to predict housing prices. I demonstrate tuning of hyperparameters, data cleaning, pipelines and simple stacking.
MedicalCost
Using regression based on HousingPrice to estimate the costs of submitted to insurance firms
LaptopPrice
Using regression, multiple and linear, to find the price of laptops.
Explainations Included - Titanic Survial
Classification project on titanic survial rate with explainations included
pokemonClassification
Using a pokemon dataset to classifing if it is legendary or not
Iris
Using the Iris dataset to get experience with classification. It also includes a first look at kMeans clustering.
mnist
Using the mnist dataset to learn classification
Explainations Included - customer-segmentation-clustering
Using kMeans clustering to cluster customers together on various properties.
netflix-eda
Using a neflix data set to do exploratory data analysis
NuclearMissle
Using the Wikipedia package to extract the nuclear missile information from Wikipedia. Then importing them to a data frame, plotting them and then making a matplotlib movie on progression.
LearnCVXPY
File copying the CVXPY's tutorial into my own words to assist with memory