Practical Deep Learning for Coders

Notes for the 2022 edition of the course.

Lesson 1

"How do I get this data into my model?" is way more important than tweaking neural network architectures
Datablocks -> DataLoaders -> Learner (check out the Data block tutorial and the high level data loader classes)
Check Pytorch image models (timm)
For tabular data we usually can't fine tune models and thus use fit_one_cycle() instead of fine_tune()
On a high level, neural networks work the following: Multiply inputs by weights -> put them through model -> calculate loss -> update weights

Lesson 2

Check out AI Quizzes
Train your model before you clean the data and use ImageClassifierCleaner to clean them
Use RandomResizedCrop to get different, cropped variations of an image
Check out Gradio and HuggingFace Spaces (HF spaces can be used as a model endpoint as well)
Check out nbdev and notebook2script
Write your own UI using TypeScript and hook it up to HF spaces API!
Bonus: Write an iOS App that does the same thing

Lesson 3

Check out Paperspace
timm models can be used in fastai by specifying their name as a string in vision_learner
Inspect the contents of a model by accessing learner.model - use learner.get_submodule() to get information on a sub module
Get the categories of a multi-class prediction by accessing learner.dls.vocab
Use interact from ipywidgets to get UI widgets to control input to a function
Gradient descent is just calculating the loss a function, calculating its gradients and decreasing them slightly
ReLU returns 0 if linear function is <= 0 and the actual output of the function otherwise
ReLU means replace negatives with zeros
Deep Learning is using gradient descent to set some parameters to make a wiggly function (which is just the addition of many RELUs - or something similar) to match your data
Start your project with simple, small models and spend time on the data - trying better architectures is the very last step
Check out matrixmultiplication.xyz for a visual walkthrough of matrix multiplication
GPUs are great a matrix multiplication
Read chapter 4 of Deep Learning for Coders with fastai & Pytorch book to get an even deeper understanding of the content in lesson 3

Lesson 4

Transformers take good advantage of modern TPUs
Check out Python for Data Analysis 3rd Edition
deberta-v3 is a good base model for NLP
Try ULMFiT for long documents (>= 2000 words) instead of transformers
Review How and why to create a good validation set and The problem with metrics[...] article
Tokenization transforms words in documents into a numeric representation
Always check you inputs (training data) and outputs (predictions)

Lesson 5

Neural Nets

Take the log of things which can grow exponentially, i.e. money
Add 1 to NAN values before taking the log
A tensor is just a matrix
Rank of a tensor refers to its dimensions
Use mean absolute value to get started with a loss function
Methods with an underscore at the end, are executed in-place in PyTorch
Use the sigmoid function to keep prediction between 0 and 1 if dealing with a binary target
You need to know about what happens to the inputs in the first layer and what happens to the outputs in the last layer
Use the @ operator to multiply matrices in PyTorch
You can slice a vector with None to turn it into a matrix: vec[:, None]
Dive into hand-crafted deep learning code in PyTorch!
In 2023, tabular data still requires careful feature engineering and works well with tree-based models
Check out fastai's improved learning rate finder and choose on between slide and valley
Calling _test_dl on dataloaders will give you a pipeline containing all preprocessing steps from the original learner (which can then be used on the test set)

Random Forests

Use pandas categorical dtype when feeding categorical data into tree-based models
A binary split is something that turns the rows of your data into two groups
Always get a baseline with a "OneR" model (decision tree with a single binary split)

Lesson 6

A way to think about Gini: How likely is it, if pulling an item from a sample, that you pick the same item twice in a row?
Check out RandomForest feature importances to get a sense of the relevant features of a large data set
Using 100 trees is a good rule of thumb
Out-of-bag (OOB) error allows to check if a RandomForest is overfitting without using a separate validation set
Partial Dependence Plots are great, use them!
Treeinterpreter creates feature importance plots for a single prediction
Check out explained.ai
Check out best vision models for fine-tuning to select proper models for computer vision
Check out test time augmentation (TTA)

Lesson 7

Gradient accumulation is a technique to run models requiring lager batch sizes on small GPUs
It works by calculating the loss for every item in the batch, but delaying the update of the weights (up to a certain threshold)
Use the DataBlock API to create DataLoaders having 2 targets
In order to create a multi-target model (that predicts two - or more - targets), create corresponding DataLoaders, adapt the error and loss functions (define the correct columns manually and create a combined loss functions adding up the results from the individual loss functions)
Use cross-entropy-loss when predicting multiple targets
Check out Things that confused me about cross-entropy article
All of the loss functions in PyTorch have two versions, they come as classes (including params to tune) and functions

Collaborative Filtering

An embedding is just looking something up in an array
Think of an embedding as being a computational shortcut for multiplying something with an one-hot encoded vector
Calculate latent factors (e.g. things that people like about movies):
- start off with random weights for latent factors and users
- calculate the dot product between user preference and movies
- calculate root mean squared error between actuals and predictions
- optimize using SGD
Use CollabDataLoaders and collab_learner (eventually with using a neural network with nn=True - useful if you have metadata about your items and users) when doing collaborative filtering
In collaborative filtering, items (movies) with a low/high bias are the ones that are particularly (un)popular with an audience (even though it loves the category of the item)
Use get_emb_sz to let fastai figure out the size of the embeddings you should use
In order to create a model in PyTorch, create a class inheriting from Module and define the forward method - it will be called automatically when doing calculations on the model class
Weight decay (a.k.a. L2 regularization) is adding the sum of all the weights squared (multiplied by some small number) to your loss function - it encourages the weights to be as small as possible
Try a few multiples of 10 for weight decay parameter in collaborative filtering

Lesson 8

Embeddings are not only useful for collaborative filtering, they are also used in NLP, and when working with tabular data
Check ouf fastai legacy notes for a discussion of convolutions, kernels, MaxPooling, Dropout, etc.
Nowadays, we don't do MaxPooling anymore, we use stride convolutions and do average pooling
Think of dropout as data augmentation for activations
It still makes sense to use MaxPooling if you don't have a good intuition about what's on an image
Check out Meta Learning book

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastai_deep_learning_for_coders.md

fastai_deep_learning_for_coders.md

Practical Deep Learning for Coders

Lesson 1

Lesson 2

Lesson 3

Lesson 4

Lesson 5

Neural Nets

Random Forests

Lesson 6

Lesson 7

Collaborative Filtering

Lesson 8

Files

fastai_deep_learning_for_coders.md

Latest commit

History

fastai_deep_learning_for_coders.md

File metadata and controls

Practical Deep Learning for Coders

Lesson 1

Lesson 2

Lesson 3

Lesson 4

Lesson 5

Neural Nets

Random Forests

Lesson 6

Lesson 7

Collaborative Filtering

Lesson 8