The goal of this practical work is to build convolutional neural networks to complete two image classification tasks, MNIST, on which we worked during the previous course, and CIFAR-10, a 32x32 RGB image classification task.
During the last practical we did the data loading an normalization
ourselves in order to get familiar with tensor manipulations. During
this course, we will do it the "right" way by using torchvision
which is PyTorch library specialized in image dataset manipulations.
First, let's start with the usual imports plus a few new ones for torchvision.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
We then download, standardize and create loaders for the MNIST dataset with the following code:
mnist_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_dataset = datasets.MNIST(
root = '../data',
train = True,
download = True,
transform = mnist_transform
)
test_dataset = datasets.MNIST(
root = '../data',
train = False,
download = True,
transform = mnist_transform
)
train_loader = DataLoader(train_dataset, batch_size = 64, shuffle = True)
test_loader = DataLoader(test_dataset, batch_size = 64, shuffle = True)
transforms.Normalize
performs the mean subtraction and standard deviation division that we
coded last week.
Using code from the previous practical work, create a training and evaluation functions. The evaluation function takes the number of batches used for evaluation as an optional argument. If no value is given the whole dataloader is use otherwise we stop the evaluation after the specified number of batches.
You can use the following code as inspiration
>>> s = 'who you gonna call ?'.split()
>>> s
['who', 'you', 'gonna', 'call', '?']
>>> for word_index, word in enumerate(s):
if word_index == 3:
break
print(word_index, word)
0 who
1 you
2 gonna
Create a convolutional neural network using the one we created during the last course and the convolutional layers example of this course.
The following architecture choice is reasonable, you are not forced to used it:
- Convolution (convolution with a padding parameter in order to not
lose a pixel band on this outside of the picture) with
32
filters of size3x3
and a padding of1
, - ReLU activation,
- Convolution layer with
64
filters of size3x3
and a padding of1
, - ReLU activation,
2x2
max pooling- Tensor flattening in order to be able to use linear layers. You can
either use
tensor.view
ortorch.flatten
to do this. - Linear layer with
128
outputs - ReLU activation
- Linear layer (output) with 10 outputs
log_softmax
activation
This model architecture is essentially a miniature version of the VGG16 network.
Write a main function that instantiate the model, the optimizer, the loss and calls the training and evaluation methods.
Try to change the optimizer in your code from SGD
to
torch.optim.Adagrad
, torch.optim.RMSprop
or torch.optim.Adam
and
analyze the impact.
Repeat the whole process with the CIFAR-10 dataset. You should only
have to change a few lines from the loading code and the architecture
of your model. This dataset is also available through the
torchvision.datasets
interface. You have to adapt the normalization
code to take into account the three channels of CIFAR-10 images (RGB).