layout | title | description | header-img |
---|---|---|---|
page |
The Mathematics of IA |
10 lectures |
img/boat-waves.jpg |
This series consists of 10 sessions, each lasting 2 hours, focused on the mathematics of machine learning. It outlines the primary concepts without delving into the intricacies of the proofs. Clicking on the title of each session provides access to the transcript. Additionally, there are basic notes available to guide you through the structure and progression of the content.
Course #1 - Smooth Optimization
Content:
- Introduction and motivation
- Gradients, Jacobians, Hessians
- Gradient descent and acceleration
- Stochastic Gradient Descent (SGD)
Materials:
- Notebook on Regression
- Notebook on Classification
- My course notes: Optimization for Machine Learning
- Exercises Sheet
Bibliography:
- Convex Optimization, by Boyd and Vandenberghe
- Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications, by Amir Beck
Course #2 - From Smooth to Non-Smooth Optimization
Content:
- Proofs of gradient descent and acceleration
- Linear models and regularization
- Ridge versus Lasso
- ISTA Algorithm
Materials:
- Notebook on Linear Regression (specifically the Lasso part)
- Notebook on Interior Point Methods
- My course notes: The Mathematical Tours of Signal Processing
Bibliography:
- Course Notes on Convexity by Vincent Duval
- Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications, by Amir Beck
Course #3 - Lasso and Compressed Sensing
Content:
- Examples of non-smooth functionals (Lasso, TV regularization, constraints)
- Subgradient and proximal operators
- Forward-backward splitting, connection with FISTA
- ADMM, Douglas-Rachford (DR), Primal-Dual
- Compressive sensing theory
Materials:
- My course notes: The Mathematical Tours of Signal Processing,
- Notebook on Douglas-Rachford Proximal Method
- Proximal Operators Repository (including Python code)
- Non-Smooth Optimization Slides
- Compressed Sensing Slides
Bibliography:
- A Mathematical Introduction to Compressive Sensing by Foucart, Simon and Rauhut, Holger (advanced)
- Convex Optimization, by Boyd and Vandenberghe
- Proximal Algorithms, by N. Parikh and S. Boyd
Course #4 - Kernel, Perceptron, CNN, and Transformers
Content:
- Transition from ridge regression to kernels
- Multilayer Perceptron (MLP)
- Convolutional Neural Networks (CNN)
- ResNet architecture
- Transformer models
Materials:
- Slides on deep learning
- My course notes on Optimization for Machine Learning
- Notebook on Multilayer Perceptron and Autograd
Bibliography:
- The Elements of Statistical Learning, by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Machine Learning: A Probabilistic Perspective, by Kevin Patrick Murphy (covers the theory of ML)
Course #5 - Deep Learning: Theory and Numerics
Content:
- Review of MLP and its variants (CNN, ResNet)
- Theoretical framework of two-layer MLPs
- Gradient and Jacobians in neural networks
- Introduction to backpropagation
Materials:
- Slides on deep learning
- Slides on automatic differentiation
- My course notes on Optimization for Machine Learning
- Notebook on deep learning
- Notebook on texture synthesis with deep networks
Bibliography:
Course #6 - Differential Programming
Content:
- Recap on Gradient and Jacobian
- Forward and reverse mode automatic differentiation
- Introduction to PyTorch
- The adjoint method in computational mathematics
Materials:
- Slides on automatic differentiation
- My course notes on Optimization for Machine Learning
- Code example: Multilayer perceptron and autograd
Bibliography:
Course #7 - Sampling and Diffusion Models
Content:
- Refresher on Stochastic Gradient Descent (SGD)
- Introduction to Langevin dynamics
- Overview of diffusion models
Materials:
- Numerical tour on diffusion models
- Course notes on Diffusion Models
Course #8 - LLM and Generative AI
Content:
- Overview of different generative model concepts
- Introduction to generative models (VAE, GANs, U-Net, diffusion)
- Semi-supervised learning and next token prediction
- Tokenizers
- Transformer architectures, Flash attention
- State space models
Materials:
Bibliography:
- Andrej Karpathy's video on tokenization
- Byte Pair Encoding (Wikipedia)
- Online tokenizer demo by OpenAI
- Rotary Position Embedding paper
- Codes: Flash attention, xFormers, Triton
- Theory paper on Flash Attention
- Mamba paper, Blog on Mamba (SSM), Parallel Prefix Sum algorithm
Course #9 - Generative Models
Content:
- Understanding generative models as density fitting techniques.
- Basics of Maximum Likelihood Estimation and f-divergences.
- Gaussian mixtures and the Expectation-Maximization algorithm.
- Variational Autoencoders (VAE).
- Introduction to Normalizing Flows.
- Generative Adversarial Networks (GANs), Wasserstein GANs (WGANs).
- Diffusion Models.
Materials:
Bibliography:
Content:
- Introduction to Monge and Kantorovich formulations.
- The Sinkhorn algorithm.
- Training of generative models.
- Duality and Wasserstein GANs.
Materials:
- Slides on Optimal Transport
- Notebook on Linear Programming for Optimal Transport
- Notebook on the Sinkhorn algorithm
Bibliography:
- Computational Optimal Transport, by Gabriel Peyré and Marco Cuturi
- Optimal Transport for Applied Mathematicians, by Filippo Santambrogio (advanced)
- Python POT (Python Optimal Transport) toolbox