ML-DS-Playground

This repository is a collection of personal machine learning and data science projects created while learning and experimenting with various techniques. The projects included here cover a wide range of topics such as supervised and unsupervised learning, clustering, feature engineering, and more.

Projects

Here is a list of some projects included in this repository:

Supervised Learning
- Classify Song Genres from Audio Data
  - Techniques: Exploratory Data Visualization, Feature Scaling, Feature Reduction, Cross-Validation, Logistic Regression, Decision Trees
  - Tasks: Classification of song genres based on audio features, model evaluation using classification reports.
Unsupervised Learning
- Clustering Antarctic Penguin Species:
  - Techniques: Data Manipulation with Pandas, Dummy Variable Creation, Elbow Method, K-Means Clustering, Feature Scaling
  - Tasks: Clustering penguin species based on features like flipper length and body mass.
Feature Engineering
- Exploring NYC Public School Test Result Scores
  - Techniques: Sorting and Subsetting, Grouped Summary Statistics
  - Tasks: Feature engineering to understand NYC public school test results.
Deep Learning with Keras
- Building an E-Commerce Clothing Classifier Model with Keras
  - Techniques: Neural Networks in Keras, Compiling and Training Models, CNNs, Image Classification
  - Tasks: Classification of clothing items in e-commerce datasets using convolutional neural networks.
Natural Language Processing and Audio Analysis Techniques
- Customer Support Calls
  - Techniques: Speech Recognition, Audio Feature Extraction, Sentiment Analysis, Named Entity Recognition (NER), Text Similarity/Embedding
  - Tasks: Analyzing customer support calls to identify sentiment, entities, and key topics from audio.
Data Engineering for E-commerce Orders and Demand Forecasting with PySpark
- Cleaning an Orders Dataset with PySpark
  - Techniques: Data Cleaning, PySpark, Data Transformation
  - Tasks:
    - Cleaning and preprocessing an e-commerce orders dataset.
    - Removed orders placed between 12 am and 5 am.
    - Created a time_of_day column.
    - Filtered out products that are no longer sold.
    - Converted columns to lowercase and extracted relevant address data.
    - Exported cleaned data for use in demand forecasting.
- Building a Demand Forecasting Model
  - Techniques: Random Forest Regression, Feature Engineering with PySpark, Time-Series Forecasting
  - Tasks:
    - Forecasting sales and inventory needs for promotional planning.
    - Cleaned and aggregated sales data at daily intervals.
    - Built and trained a Random Forest model to predict future product sales.
    - Evaluated model performance using Mean Absolute Error (MAE).
    - Predicted sales for promotional week and ensured optimal inventory management.
Recommendation Systems and Data Visualization
- Comparing Cosmetics by Ingredients
  - Techniques: Content-Based Filtering, Ingredient Tokenization, Document-Term Matrix (DTM), Dimensionality Reduction with t-SNE, Interactive Visualization with Bokeh
  - Tasks: Built a recommendation system for moisturizers targeting dry skin by analyzing ingredient similarities. Used t-SNE for dimensionality reduction and Bokeh for interactive product comparison visualization.

Technologies Used

The projects in this repository are implemented using the following technologies:

Programming Language: Python
Interactive Environment: Jupyter Notebook
Libraries for Data Manipulation and Analysis: Pandas, NumPy
Machine Learning Frameworks: Scikit-Learn, TensorFlow/Keras (for deep learning projects), PySpark (for large-scale data processing and engineering)
Visualization Tools: Matplotlib, Seaborn, Bokeh
NLP Libraries: NLTK, SpaCy, SpeechRecognition (for speech-to-text)
Clustering and Feature Engineering Techniques: K-Means Clustering, t-SNE, Document-Term Matrix (DTM)
Specialized Techniques: Random Forest Regression for forecasting, Content-Based Filtering for recommendation systems

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
building_demand_forecasting_model.ipynb		building_demand_forecasting_model.ipynb
classify_song_genres.ipynb		classify_song_genres.ipynb
cleaning_dataset_PySpark.ipynb		cleaning_dataset_PySpark.ipynb
clustering_antarctic_penguin.ipynb		clustering_antarctic_penguin.ipynb
comparing_cosmetics_by_ingredients.ipynb		comparing_cosmetics_by_ingredients.ipynb
customer_support_call.ipynb		customer_support_call.ipynb
e_commerce_clothing_classifier.ipynb		e_commerce_clothing_classifier.ipynb
exploring_school_test_result.ipynb		exploring_school_test_result.ipynb
service_desk_ticket_classification.ipynb		service_desk_ticket_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-DS-Playground

Projects

Technologies Used

About

Releases

Packages

Languages

Parissai/ML-DS-Playground

Folders and files

Latest commit

History

Repository files navigation

ML-DS-Playground

Projects

Technologies Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages