This GitHub is my data science playground. Think of it as a messy workbench overflowing with cool experiments (mostly Jupyter notebooks) I built while learning and expanding my workflow.
Each project follows a clear path: wrangling data, cleaning it up, analyzing extensively to understand the problem (think model prep!), and building production-ready machine learning models – both classification and regression.
Some projects prioritize education, skipping steps to highlight the core concepts. It's a win-win – solidifies my knowledge and lets others learn from my tinkering.
Feel free to hit me up on LinkedIn to share feedback, knowledge, and thoughts – eager to learn from others!
Repository | Project Type | Description |
---|---|---|
Ecommerce Purchase Prediction | End-to-End | Advanced predictive analytics for e-commerce conversion—featuring EBM, LightGBM models, SMOTE sampling, and a tailored interpret ML dashboard for stakeholder insights. |
Analyzing Crime Data | End-to-End | In this ML Engineering lab, we clean and explore Chicago crime data, culminating in an XGBoost model fine-tuned with Hyperopt and Recursive Feature Elimination, yielding 89% precision. |
Logistic Regression: Part 2 | Fundamentals (Educational) | Applying logistic regression and random forest to optimize revenue. Thorough data preprocessing, insightful EDA, and comprehensive model evaluation. Builds upon prior knowledge in logistic regression fundamentals. |
Logistic Regression: Part 1 | Fundamentals (Educational) | Delve into model mathematics, error analysis, and performance. Focus on statistical fundamentals with statsmodels for understanding and analyzing logistic regression. Linear regression background advised. |
Linear Regression: Part 2 | Fundamentals (Educational) | Exploring Supervised ML, this study of California's housing employs EDA and OLS evaluation. Techniques like Polynomial Transformation, Ridge and Lasso Regularization, and Quantile Regression are used with scikit-learn for in-depth insights. |
Linear Regression: Part 1 | Fundamentals (Educational) | This project serves as an entry point into machine learning, focusing on building and evaluating basic linear regression models. By applying feature selection and understanding model performance, we offer a foundational approach to predictive modeling, blending algorithms with statistical insights for accuracy and interpretability. |
Predicting Insurance Charges | End-to-End | Using Random Forest and XGBoost for regression to predict health insurance charges based on patient data. Features EDA, preprocessing, and in-depth insights. |
Decoding Titanic | End-to-End | Comprehensive Titanic survival prediction using machine learning models like Logistic Regression and ensemble techniques for classification. Includes EDA, feature engineering, and model interpretation insights. Achieved 83.5% accuracy. |
Bike Store Analysis | Business Intelligence | Analyzing a European bicycle retail business to enhance growth and profitability. Features in-depth EDA, business performance analysis, and strategic insights based on comprehensive sales data. |
Outside of work, I keep my coding skills sharp by engaging with coding challenges on Codewars: