A recent graduate with a Master's in Data Science. Currently looking for a new career as a data science professional to work with a diverse team and support innovation for data-driven business decisions and goals. Motivated to utilize my analytical, statistical, and programming skills to collect, analyze and interpret large datasets.
- Given raw unscaled data with both numerical and categorical variables.
- Performed exploratory data analysis in order to visualize the characteristics of our given variables.
- Constructed various models to train the data, utilizing Optuna hyperparameter tuning to get parameters that maximize the model accuracies.
- Used feature engineering techniques, we built new variables to help improve the accuracy of the models.
- Using the strategies above, we built the final model and generated the forest cover type predictions for the test dataset.
- Created an interactive data visualization to raise people's awareness on the issue of climate change.
- Interacted with data to get insights faster and make critical decisions for the purpose of the project.
- Created visualizations with large amounts of data for the following supporting points:
- Economoic Development
- Human Influence Factors
- Energy Consumption
- Performed data feature selection, feature eliminateion, and feature importance using techniwues such as Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), and Random Forest.
- Developed models using supervised, unsupervised, and semi-supervised learning techniques such as decision trees, regression trees, neural networks, and support vector machines.
- Tuned model parameters, estimated prediction errors, and model validation.
- Compared and ensembled multiple models in pipeline and automatically selected the best model.
- Utitlizing a custom TRAIN dataset, a model was built to predict whether a data scientist will remain a member.
- Performed data cleaning and pre-processing of data.
- Performed PCA and Correlation to understand the relationship between the data.
- Performed the following models: Logistic Regression, SVM, Decision Tree and Random Forest.
- Used Recursive Feature Elimination (RFE) for feature selection.
- Compared models to find the best model for testing accuracy and training convergence.
- Utilized pyspark ML and created a SparkSession object using Databricks.
- Explored and analyzed different datasets to build better insights on the Lahman Baseball database.