Skip to content

Latest commit



148 lines (124 loc) · 9 KB

File metadata and controls

148 lines (124 loc) · 9 KB

Practical Deep Learning for Coders

Notes for the 2022 edition of the course.

Lesson 1

  • "How do I get this data into my model?" is way more important than tweaking neural network architectures
  • Datablocks -> DataLoaders -> Learner (check out the Data block tutorial and the high level data loader classes)
  • Check Pytorch image models (timm)
  • For tabular data we usually can't fine tune models and thus use fit_one_cycle() instead of fine_tune()
  • On a high level, neural networks work the following: Multiply inputs by weights -> put them through model -> calculate loss -> update weights

Lesson 2

  • Check out AI Quizzes
  • Train your model before you clean the data and use ImageClassifierCleaner to clean them
  • Use RandomResizedCrop to get different, cropped variations of an image
  • Check out Gradio and HuggingFace Spaces (HF spaces can be used as a model endpoint as well)
  • Check out nbdev and notebook2script
  • Write your own UI using TypeScript and hook it up to HF spaces API!
  • Bonus: Write an iOS App that does the same thing

Lesson 3

  • Check out Paperspace
  • timm models can be used in fastai by specifying their name as a string in vision_learner
  • Inspect the contents of a model by accessing learner.model - use learner.get_submodule() to get information on a sub module
  • Get the categories of a multi-class prediction by accessing learner.dls.vocab
  • Use interact from ipywidgets to get UI widgets to control input to a function
  • Gradient descent is just calculating the loss a function, calculating its gradients and decreasing them slightly
  • ReLU returns 0 if linear function is <= 0 and the actual output of the function otherwise
  • ReLU means replace negatives with zeros

  • Deep Learning is using gradient descent to set some parameters to make a wiggly function (which is just the addition of many RELUs - or something similar) to match your data

  • Start your project with simple, small models and spend time on the data - trying better architectures is the very last step
  • Check out for a visual walkthrough of matrix multiplication
  • GPUs are great a matrix multiplication
  • Read chapter 4 of Deep Learning for Coders with fastai & Pytorch book to get an even deeper understanding of the content in lesson 3

Lesson 4

Lesson 5

Neural Nets

  • Take the log of things which can grow exponentially, i.e. money

  • Add 1 to NAN values before taking the log

  • A tensor is just a matrix
  • Rank of a tensor refers to its dimensions
  • Use mean absolute value to get started with a loss function
  • Methods with an underscore at the end, are executed in-place in PyTorch
  • Use the sigmoid function to keep prediction between 0 and 1 if dealing with a binary target
  • You need to know about what happens to the inputs in the first layer and what happens to the outputs in the last layer

  • Use the @ operator to multiply matrices in PyTorch
  • You can slice a vector with None to turn it into a matrix: vec[:, None]
  • Dive into hand-crafted deep learning code in PyTorch!
  • In 2023, tabular data still requires careful feature engineering and works well with tree-based models
  • Check out fastai's improved learning rate finder and choose on between slide and valley
  • Calling _test_dl on dataloaders will give you a pipeline containing all preprocessing steps from the original learner (which can then be used on the test set)

Random Forests

  • Use pandas categorical dtype when feeding categorical data into tree-based models
  • A binary split is something that turns the rows of your data into two groups

  • Always get a baseline with a "OneR" model (decision tree with a single binary split)

Lesson 6

  • A way to think about Gini: How likely is it, if pulling an item from a sample, that you pick the same item twice in a row?

  • Check out RandomForest feature importances to get a sense of the relevant features of a large data set
  • Using 100 trees is a good rule of thumb
  • Out-of-bag (OOB) error allows to check if a RandomForest is overfitting without using a separate validation set
  • Partial Dependence Plots are great, use them!
  • Treeinterpreter creates feature importance plots for a single prediction
  • Check out
  • Check out best vision models for fine-tuning to select proper models for computer vision
  • Check out test time augmentation (TTA)

Lesson 7

  • Gradient accumulation is a technique to run models requiring lager batch sizes on small GPUs
  • It works by calculating the loss for every item in the batch, but delaying the update of the weights (up to a certain threshold)
  • Use the DataBlock API to create DataLoaders having 2 targets
  • In order to create a multi-target model (that predicts two - or more - targets), create corresponding DataLoaders, adapt the error and loss functions (define the correct columns manually and create a combined loss functions adding up the results from the individual loss functions)
  • Use cross-entropy-loss when predicting multiple targets
  • Check out Things that confused me about cross-entropy article
  • All of the loss functions in PyTorch have two versions, they come as classes (including params to tune) and functions

Collaborative Filtering

  • An embedding is just looking something up in an array

  • Think of an embedding as being a computational shortcut for multiplying something with an one-hot encoded vector

  • Calculate latent factors (e.g. things that people like about movies):
    • start off with random weights for latent factors and users
    • calculate the dot product between user preference and movies
    • calculate root mean squared error between actuals and predictions
    • optimize using SGD
  • Use CollabDataLoaders and collab_learner (eventually with using a neural network with nn=True - useful if you have metadata about your items and users) when doing collaborative filtering
  • In collaborative filtering, items (movies) with a low/high bias are the ones that are particularly (un)popular with an audience (even though it loves the category of the item)
  • Use get_emb_sz to let fastai figure out the size of the embeddings you should use
  • In order to create a model in PyTorch, create a class inheriting from Module and define the forward method - it will be called automatically when doing calculations on the model class
  • Weight decay (a.k.a. L2 regularization) is adding the sum of all the weights squared (multiplied by some small number) to your loss function - it encourages the weights to be as small as possible
  • Try a few multiples of 10 for weight decay parameter in collaborative filtering

Lesson 8

  • Embeddings are not only useful for collaborative filtering, they are also used in NLP, and when working with tabular data
  • Check ouf fastai legacy notes for a discussion of convolutions, kernels, MaxPooling, Dropout, etc.
  • Nowadays, we don't do MaxPooling anymore, we use stride convolutions and do average pooling

  • Think of dropout as data augmentation for activations

  • It still makes sense to use MaxPooling if you don't have a good intuition about what's on an image
  • Check out Meta Learning book