Introduction to Machine Learning: One-Day Course

This is a one-day machine learning introductory course for beginners. The course covers the basics of supervised and unsupervised learning, including regression, classification, clustering, dimensinality reduction and anomaly detection. It also includes hands-on exercises and examples using popular Machine Learning (ML) libraries like Scikit-learn.

The presentation is used to guide the instructor through the course, providing a structured outline of the topics to be covered.

Image created by using ChatGPT

1. Introduction to Machine Learning

What is AI and ML?
- AI: The ability of machines to simulate intelligent behavior.
- ML: A branch of AI where models are trained to learn from data and improve over time.
Applications:
- ChatGPT, Netflix recommendations, fraud detection, self-driving cars, etc.
Types of ML:
- Supervised Learning, Unsupervised Learning, Reinforcement Learning.
Key Terminology:
- Dataset, Features, Labels, Model, Training, Testing, Hyperparameters, Overfitting, Underfitting.

Highlights

Comparison between supervised and unsupervised learning using Linear Regression and K-Means examples.
Basic visualizations of regression and clustering tasks.

Related notebook: Introduction to Machine Learning

2. Understanding the Machine Learning Workflow

Steps in the Workflow

Define the Problem: Set objectives (e.g., regression, classification).
Collect and Clean Data: Handle missing values, duplicates, outliers.
Explore and Visualize Data: Use summary statistics and visual tools like histograms and scatterplots.
Feature Engineering:
- Selection: Remove irrelevant features.
- Transformation: Normalize and encode data.
- Creation: Generate new features.
Split Data: Divide into training, validation, and test sets.
Choose and Train a Model: Select an algorithm based on the task.
Evaluate the Model: Use metrics like RMSE, Accuracy, and Silhouette Score.
Hyperparameter Optimization: Use GridSearchCV or RandomizedSearchCV for fine-tuning.

Highlights

End-to-end example of an ML pipeline using Scikit-learn.
Visualization of preprocessing and evaluation results.

Related notebook: Machine Learning Workflow

3. Supervised Learning

3.1 Regression

Goal: Predict continuous outputs (e.g., house prices, temperature).
Common Algorithms:
- Linear Regression, Polynomial Regression, Ridge, and Lasso.
Evaluation Metrics:
- MAE, MSE, RMSE, ( R^2 ).

Highlights

Hands-on example of Linear Regression with visualization of results.
Analysis of regression coefficients.

Related notebook: Supervised Learning - Regression

3.2 Classification

Goal: Predict discrete categories (e.g., spam detection, disease diagnosis).
Types:
- Binary, Multi-Class, Multi-Label Classification.
Evaluation Metrics:
- Accuracy, Precision, Recall, F1-Score, Confusion Matrix.

Highlights

Logistic Regression example for binary classification.
Hands-on exercise with Random Forest Classifier.
Visualization of confusion matrix results.

Related notebook: Supervised Learning - Classification

4. Unsupervised Learning

4.1 Clustering

Goal: Group data points into clusters based on similarity without labels.
Types:
- Partition-Based: K-Means.
- Hierarchical: Agglomerative Clustering.
- Density-Based: DBSCAN.
Evaluation Metrics:
- Silhouette Score, Inertia, Visualization.

Highlights

K-Means Clustering example with synthetic data.
Visualizing clusters and centroids.

Related notebook: Unsupervised Learning - Clustering

4.2 Other Unsupervised Learning Techniques

Dimensionality Reduction:
- Reduces input features while preserving patterns.
- Example: PCA (Principal Component Analysis).
Anomaly Detection:
- Identifies outliers or unusual patterns.
- Examples: Isolation Forest, Z-scores.

Highlights

PCA visualization of high-dimensional data projected into 2D.
Hands-on example of Isolation Forest for anomaly detection.

Related notebook: Unsupervised Learning - Other Techniques

5. In-Class Assignment

Objective:
- Develop a classification model using a dataset of your choice.
- Save the model as a pickle file.
Steps:
- Preprocess the data (handle missing values, encode categorical variables).
- Train, evaluate and optimize the model.
- Submit the pickle file of the trained model.
Related notebook: In-Class Assignment

Using with GitHub Codespaces

You can run this repository directly in GitHub Codespaces without needing to set up anything locally.

Steps:

Click the Code button in the repository.
Select Open with Codespaces.
If you don’t see the option, click Create codespace on main.

Once the Codespace environment loads:

All dependencies (e.g., Python packages) specified in requirements.txt will be automatically installed.
You can open and run the Jupyter Notebooks directly in the Codespace.

Acknowledgements

Thanks to Leon Boschman for contributing his ideas, slides and feedback to this course material.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction to Machine Learning: One-Day Course

Table of Contents

1. Introduction to Machine Learning

2. Understanding the Machine Learning Workflow

Steps in the Workflow

3. Supervised Learning

3.1 Regression

3.2 Classification

4. Unsupervised Learning

4.1 Clustering

4.2 Other Unsupervised Learning Techniques

5. In-Class Assignment

Using with GitHub Codespaces

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction to Machine Learning: One-Day Course

Table of Contents

1. Introduction to Machine Learning

2. Understanding the Machine Learning Workflow

Steps in the Workflow

3. Supervised Learning

3.1 Regression

3.2 Classification

4. Unsupervised Learning

4.1 Clustering

4.2 Other Unsupervised Learning Techniques

5. In-Class Assignment

Using with GitHub Codespaces

Acknowledgements