This is a one-day machine learning introductory course for beginners. The course covers the basics of supervised and unsupervised learning, including regression, classification, clustering, dimensinality reduction and anomaly detection. It also includes hands-on exercises and examples using popular Machine Learning (ML) libraries like Scikit-learn.
The presentation is used to guide the instructor through the course, providing a structured outline of the topics to be covered.
- What is AI and ML?
- AI: The ability of machines to simulate intelligent behavior.
- ML: A branch of AI where models are trained to learn from data and improve over time.
- Applications:
- ChatGPT, Netflix recommendations, fraud detection, self-driving cars, etc.
- Types of ML:
- Supervised Learning, Unsupervised Learning, Reinforcement Learning.
- Key Terminology:
- Dataset, Features, Labels, Model, Training, Testing, Hyperparameters, Overfitting, Underfitting.
Highlights
- Comparison between supervised and unsupervised learning using Linear Regression and K-Means examples.
- Basic visualizations of regression and clustering tasks.
Related notebook: Introduction to Machine Learning
- Define the Problem: Set objectives (e.g., regression, classification).
- Collect and Clean Data: Handle missing values, duplicates, outliers.
- Explore and Visualize Data: Use summary statistics and visual tools like histograms and scatterplots.
- Feature Engineering:
- Selection: Remove irrelevant features.
- Transformation: Normalize and encode data.
- Creation: Generate new features.
- Split Data: Divide into training, validation, and test sets.
- Choose and Train a Model: Select an algorithm based on the task.
- Evaluate the Model: Use metrics like RMSE, Accuracy, and Silhouette Score.
- Hyperparameter Optimization: Use
GridSearchCV
orRandomizedSearchCV
for fine-tuning.
Highlights
- End-to-end example of an ML pipeline using Scikit-learn.
- Visualization of preprocessing and evaluation results.
Related notebook: Machine Learning Workflow
- Goal: Predict continuous outputs (e.g., house prices, temperature).
- Common Algorithms:
- Linear Regression, Polynomial Regression, Ridge, and Lasso.
- Evaluation Metrics:
- MAE, MSE, RMSE, ( R^2 ).
Highlights
- Hands-on example of Linear Regression with visualization of results.
- Analysis of regression coefficients.
Related notebook: Supervised Learning - Regression
- Goal: Predict discrete categories (e.g., spam detection, disease diagnosis).
- Types:
- Binary, Multi-Class, Multi-Label Classification.
- Evaluation Metrics:
- Accuracy, Precision, Recall, F1-Score, Confusion Matrix.
Highlights
- Logistic Regression example for binary classification.
- Hands-on exercise with Random Forest Classifier.
- Visualization of confusion matrix results.
Related notebook: Supervised Learning - Classification
- Goal: Group data points into clusters based on similarity without labels.
- Types:
- Partition-Based: K-Means.
- Hierarchical: Agglomerative Clustering.
- Density-Based: DBSCAN.
- Evaluation Metrics:
- Silhouette Score, Inertia, Visualization.
Highlights
- K-Means Clustering example with synthetic data.
- Visualizing clusters and centroids.
Related notebook: Unsupervised Learning - Clustering
- Dimensionality Reduction:
- Reduces input features while preserving patterns.
- Example: PCA (Principal Component Analysis).
- Anomaly Detection:
- Identifies outliers or unusual patterns.
- Examples: Isolation Forest, Z-scores.
Highlights
- PCA visualization of high-dimensional data projected into 2D.
- Hands-on example of Isolation Forest for anomaly detection.
Related notebook: Unsupervised Learning - Other Techniques
-
Objective:
- Develop a classification model using a dataset of your choice.
- Save the model as a pickle file.
-
Steps:
- Preprocess the data (handle missing values, encode categorical variables).
- Train, evaluate and optimize the model.
- Submit the pickle file of the trained model.
Related notebook: In-Class Assignment
You can run this repository directly in GitHub Codespaces without needing to set up anything locally.
Steps:
- Click the Code button in the repository.
- Select Open with Codespaces.
- If you don’t see the option, click Create codespace on main.
Once the Codespace environment loads:
- All dependencies (e.g., Python packages) specified in
requirements.txt
will be automatically installed. - You can open and run the Jupyter Notebooks directly in the Codespace.
Thanks to Leon Boschman for contributing his ideas, slides and feedback to this course material.