Notes for the 2022 edition of the course.
- "How do I get this data into my model?" is way more important than tweaking neural network architectures
- Datablocks -> DataLoaders -> Learner (check out the Data block tutorial and the high level data loader classes)
- Check Pytorch image models (timm)
- For tabular data we usually can't fine tune models and thus use
fit_one_cycle()
instead offine_tune()
- On a high level, neural networks work the following: Multiply inputs by weights -> put them through model -> calculate loss -> update weights
- Check out AI Quizzes
- Train your model before you clean the data and use
ImageClassifierCleaner
to clean them - Use
RandomResizedCrop
to get different, cropped variations of an image - Check out Gradio and HuggingFace Spaces (HF spaces can be used as a model endpoint as well)
- Check out nbdev and notebook2script
- Write your own UI using TypeScript and hook it up to HF spaces API!
- Bonus: Write an iOS App that does the same thing
- Check out Paperspace
- timm models can be used in fastai by specifying their name as a string in
vision_learner
- Inspect the contents of a model by accessing
learner.model
- uselearner.get_submodule()
to get information on a sub module - Get the categories of a multi-class prediction by accessing
learner.dls.vocab
- Use
interact
from ipywidgets to get UI widgets to control input to a function - Gradient descent is just calculating the loss a function, calculating its gradients and decreasing them slightly
- ReLU returns 0 if linear function is <= 0 and the actual output of the function otherwise
-
ReLU means replace negatives with zeros
-
Deep Learning is using gradient descent to set some parameters to make a wiggly function (which is just the addition of many RELUs - or something similar) to match your data
- Start your project with simple, small models and spend time on the data - trying better architectures is the very last step
- Check out matrixmultiplication.xyz for a visual walkthrough of matrix multiplication
- GPUs are great a matrix multiplication
- Read chapter 4 of Deep Learning for Coders with fastai & Pytorch book to get an even deeper understanding of the content in lesson 3
-
Transformers take good advantage of modern TPUs
- Check out Python for Data Analysis 3rd Edition
- deberta-v3 is a good base model for NLP
- Try ULMFiT for long documents (>= 2000 words) instead of transformers
- Review How and why to create a good validation set and The problem with metrics[...] article
- Tokenization transforms words in documents into a numeric representation
- Always check you inputs (training data) and outputs (predictions)
-
Take the log of things which can grow exponentially, i.e. money
-
Add 1 to NAN values before taking the log
- A tensor is just a matrix
- Rank of a tensor refers to its dimensions
- Use mean absolute value to get started with a loss function
- Methods with an underscore at the end, are executed in-place in PyTorch
- Use the sigmoid function to keep prediction between 0 and 1 if dealing with a binary target
-
You need to know about what happens to the inputs in the first layer and what happens to the outputs in the last layer
- Use the @ operator to multiply matrices in PyTorch
- You can slice a vector with None to turn it into a matrix:
vec[:, None]
- Dive into hand-crafted deep learning code in PyTorch!
- In 2023, tabular data still requires careful feature engineering and works well with tree-based models
- Check out fastai's improved learning rate finder and choose on between
slide
andvalley
- Calling
_test_dl
on dataloaders will give you a pipeline containing all preprocessing steps from the original learner (which can then be used on the test set)
- Use pandas categorical dtype when feeding categorical data into tree-based models
-
A binary split is something that turns the rows of your data into two groups
-
Always get a baseline with a "OneR" model (decision tree with a single binary split)
-
A way to think about Gini: How likely is it, if pulling an item from a sample, that you pick the same item twice in a row?
- Check out RandomForest feature importances to get a sense of the relevant features of a large data set
- Using 100 trees is a good rule of thumb
- Out-of-bag (OOB) error allows to check if a RandomForest is overfitting without using a separate validation set
- Partial Dependence Plots are great, use them!
- Treeinterpreter creates feature importance plots for a single prediction
- Check out explained.ai
- Check out best vision models for fine-tuning to select proper models for computer vision
- Check out test time augmentation (TTA)
- Gradient accumulation is a technique to run models requiring lager batch sizes on small GPUs
- It works by calculating the loss for every item in the batch, but delaying the update of the weights (up to a certain threshold)
- Use the DataBlock API to create DataLoaders having 2 targets
- In order to create a multi-target model (that predicts two - or more - targets), create corresponding DataLoaders, adapt the error and loss functions (define the correct columns manually and create a combined loss functions adding up the results from the individual loss functions)
- Use cross-entropy-loss when predicting multiple targets
- Check out Things that confused me about cross-entropy article
-
All of the loss functions in PyTorch have two versions, they come as classes (including params to tune) and functions
-
An embedding is just looking something up in an array
-
Think of an embedding as being a computational shortcut for multiplying something with an one-hot encoded vector
- Calculate latent factors (e.g. things that people like about movies):
- start off with random weights for latent factors and users
- calculate the dot product between user preference and movies
- calculate root mean squared error between actuals and predictions
- optimize using SGD
- Use CollabDataLoaders and collab_learner (eventually with using a neural network with
nn=True
- useful if you have metadata about your items and users) when doing collaborative filtering - In collaborative filtering, items (movies) with a low/high bias are the ones that are particularly (un)popular with an audience (even though it loves the category of the item)
- Use get_emb_sz to let fastai figure out the size of the embeddings you should use
- In order to create a model in PyTorch, create a class inheriting from Module and define the forward method - it will be called automatically when doing calculations on the model class
- Weight decay (a.k.a. L2 regularization) is adding the sum of all the weights squared (multiplied by some small number) to your loss function - it encourages the weights to be as small as possible
- Try a few multiples of 10 for weight decay parameter in collaborative filtering
- Embeddings are not only useful for collaborative filtering, they are also used in NLP, and when working with tabular data
- Check ouf fastai legacy notes for a discussion of convolutions, kernels, MaxPooling, Dropout, etc.
-
Nowadays, we don't do MaxPooling anymore, we use stride convolutions and do average pooling
-
Think of dropout as data augmentation for activations
- It still makes sense to use MaxPooling if you don't have a good intuition about what's on an image
- Check out Meta Learning book