This repository serves as a personal playground to understand, learn, and experiment with different machine learning algorithms and techniques. The goal is to implement various ML algorithms on different datasets and document the process.
Different source were used creating this repository, such as Udemy or YouTube courses. For some algorithms different exercise notebooks and datasets of said courses were used. For others online the datasets were downloaded and used for further exploration.
The project is organized into directories, each representing a different machine learning algorithm. Each algorithm directory contains the following:
- Data: A collection of datasets used for training and testing the algorithm.
- Notebooks: Jupyter notebooks that demonstrate the implementation and experimentation with the algorithm.
- Documentation: A markdown file explaining the algorithm, its implementation, and any relevant details.
-
Linear Regression: describes a linear approach to modeling the relationship between a dependent variable and one or more independent variables.
-
Logistic Regression: is a statistical method for analyzing datasets in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
-
K-Nearest Neighbors: is describes an algorithm that classifies data points based on the majority class of their k nearest neighbors in the feature space, making it useful for both classification and regression tasks.
-
Decision Trees and Random Forest: are tree-based algorithms where decision trees make predictions by learning decision rules from features, while random forests combine multiple trees to create a more robust model for both classification and regression tasks.
-
Support Vector Machines: is a powerful supervised learning algorithm that finds the optimal hyperplane that maximally separates different classes in the feature space, making it effective for complex, high-dimensional problems.
-
Principal Component Analysis: is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional form while preserving as much variance as possible.
-
K-Means Clustering: is an unsupervised learning algorithm that partitions data into K distinct, non-overlapping clusters based on feature similarity.
-
Bias-Variance Tradeoff: explains the fundamental concept in machine learning that describes the tradeoff between two sources of error affecting model performance - bias and variance.
-
Recommender Systems: explores algorithms designed to suggest relevant items to users based on patterns in data, analyzing past interactions and preferences to predict future interests.