This repository will house all code, data, and files related to peronal projects and my work in the Springboard Data Science Program. The following acts as a table of contents for the whole repository
Key Skills
- Correlation Plots
- Geospatial Analysis (Geopandas, QGIS, geopy)
- Feature Engineering
- Data Wrangling
- SHAP & Permutation Feature Importance
- Ensemble Modeling
WINNING ENTRY : A social good hackathon hosted by Booz Allen Hamilton & Chesapeake Monitoring Cooperative (CMC) to explore data monitoring the health of the Chesapeake Bay watershed. The challenge was to create a predictive model & correlation analysis for explaining patterns found from condition measures expressed by water quality indicator(s) assessments in the Chesapeake Bay watershed. The judging criteria were based upon robustness, scalability, and creativity by 16 expert judges representing leadership from 7 organizations and expertise across environmental science and modeling, data science, machine learning, and human-centered design. The task was to build a predictive model or correlation analysis for pollution in a section of the Chesapeake Bay using CMC monitoring data.
See the Devpost submission here
Key Skills
- Bokeh Plot Visuals / Dashboard Creation
- Imbalanced Dataset Handling
- Bootstrap Statistical Analysis
- Multi Class Classification
Proof of Concept modeling to determine if the race of a stopped subject can be predicted by the race of the stopping officer. Also, determine a Frisk be predicted based upon the demographics of the officer and subject.
Key Skills
- K-Means
- PCA - Principle Component Analysis
- Silhouette Method
- Elbow Sum of Squares Method
Mini project on customer segmentation and identifying unkown relationships among customers. The more you know of your customers, the more you can personalize your service! The dataset contains information on marketing newsletters/e-mail campaigns (e-mail offers sent) and transaction level data from customers (which offer customers responded to and what they bought).
Several EDA's performed on varying data categories.
Hospital Readmittance performs a statistical analysis on a previously done analysis to critique its validity. Center for Medicare and Medicaid Services (CMS) began reducing Medicare payments for Inpatient Prospective Payment System hospitals with excess readmissions. The analysis focuses on Hospital readmittance data analysis.
Human Temperature EDA uses bootstrap statistics to determine the true average temperature of the human body in both male and females.
Racial Discrimination performs a statistical analysis on if race has a meaningful impact on the callback rate of callbacks for resumes.
Key Skills
- Central Limit Theorem
- Statistical Analysis
- Data Visualization
- z-test
- t-test
- Margin of Error (MOE)
- Chi-Squared Test
- Bootstrap Statistics
- Hypothesis Testing
Key Skills
- Logistic Regression
- Hyperparameter Tuning
- K-Fold CV
- Linear Regression
- Metric Evaluation
- Residual Plot
- Influence Plot
- Naive Bayes
- NLP
- Tokenization
- TF-IDF
- n-grams
Performing several Machine Learning Algorithms in miniprojects such as: Labeling an obersvation as either male or female based on height and weight data (Logistic Regression), Regression Price Estimate on Boston Housing data using Linear Regression, and predicting rotten/fresh from critic reviews with Naive Bayes Models
Performing several exercises utlitizing MapReduce Pyspark (RDD) with a touch of MLlib
Key Skills
- Pyspark
- RDD
- Spark Dataframes
Key Skills
- SQL
- Time Series Analysis
- sqlalchemy
This is a SQL project to utilize SQLAlchemy & PyMysql to connect to a mysql server and import data using SQL in python.
Key Skills
- JSON Manipulation and Extraction
- Applied Plotting and Charting
An exercise of data extraction and exploration utilizing a JSON data source
Defining an "adopted user" as a user who has logged into a product on three separate days in at least one seven-day period, identify which factors predict future user adoption.
- Exploratory Data Analysis
- Experiement and metrics design
- Predictive modeling and recomendations
Key Skills
- Full Stack Data Scientist