The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
1. Here is a link to my project code
- The size of training set is 34,799.
- The size of the validation set is 4,410.
- The size of test set is 12,630.
- The shape of a traffic sign image is (32, 32, 3).
- The number of unique classes/labels in the data set is 43.
2. Here is an exploratory visualization of the data set. It is a bar chart showing how the data is distributed among the 43 sign classes.
As a first step, I decided to shuffle the images to make sure the ordering of the data does not affect the training of the network.
I have tried on converting images to grayscale and sticking to colored images because I have found the results not having any significant difference. And so, I have deferred on using grayscale and used colored images upon proceeding to normalization to avoid data loss when converting to grayscale.
Here is an example of a traffic sign accuracy results before and after grayscaling.
As a last step, I normalized the images because it helps process the data faster.
I decided to defer to generate additional data because the result of the validation accuracy was already 93.1%.
My final model consisted of the following layers:
Layer | Description |
---|---|
Input | 32x32x3 RGB image |
Convolution 5x5 | 1x1 stride, valid padding, outputs 28x28x6 |
RELU | |
Max pooling | 2x2 stride, outputs 14x14x6 |
Convolution 5x5 | 1x1 stride, valid padding, outputs 10x10x16 |
RELU | |
Max pooling | 2x2 stride, outputs 5x5x16 |
Flattening | 5x5x16 image, outputs 400 |
Fully connected | 400 array, outputs 120 |
RELU | |
Dropout | keep_prob 0.5 |
Fully connected | 120 array, outputs 84 |
RELU | |
Dropout | keep_prob 0.5 |
Fully connected | 84 array, outputs 43 |
To train the model, I used
- the softmax_cross_entropy_with_logits function to find cross_entropy with the one_hot_y and the logits as its parameters.
- the AdamOptimizer and the Stochastic Gradient Descent but settled with the AdamOptimizer as optimizer as it is ideal with this amount of dataset.
- 0.001 as initial learning rate after several trials with different learning rates.
- 128 as batch size as it is ideal after several trials as well.
- 10, 15 and 20 epochs but settled at 10 as the ideal number of epochs after playing with the different numbers.
4. Approach taken for finding a solution and getting the validation set accuracy to be at least 0.93.
As a model architecture for predicting numbers with MNIST dataset, the LeNet architecture was chosen because it has already learned how to read figures. A few tweaks to customize it with the traffic sign dataset will save time for programmers from starting from scratch. These tweaks involve considering the differences on the dataset to be used, from grayscaled numbers to colored raster images, from a few thousand images to ten times the number or even more if augmentation is employed. In my model, the use of normalization, maxpool, dropout and additional channels for colored images are employed.
Initially, with no pre-processing and only the basic LeNet architecture, the highest validation accuracy reached by my model is 86.6% after tweaking with the learning rate, epochs and batch size values.
The resulting graph shows a significant overfitting; hence, I thought of adding a dropout layer right after the last RELU layer of the second fully connected activation layer. The training accuracy reached 95% and the validation accuracy reached 87.1% after 10 epochs.
Since the overfitting was slightly resolved, I added another one, this time after the RELU layer of the first fully connected activation layer. The training accuracy reached only 89.1% and the validation accuracy reached only 84% but the loss plot gave a better, much tighter curve.
I then worked on the normalization of the images for pre-processing. I tried both the normalization of the grayscaled version of the images (see Pre-processing above) and the colored version. I dropped the grayscaling since there was no significant difference in the result with colored images. The training accuracy reached 99.1% and the validation accuracy reached 94.8% after 20 epochs.
My final model results were:
- training set accuracy of 98%.
- validation set accuracy of 93%.
- test set accuracy of 92.7%.
The first image looks like it might be fairly easy to classify because it is in the list of sign classes with a fair amount of image samples but somehow the model gets it wrong. It may be the slight rotation of the image to the right.
The second image maybe difficult to classify because it is rotated -90 degrees and likely much similar to another sign.
The third image maybe difficult to classify because there is only a few number of samples in the training data.
The fourth image should be easy to classify because it is very similar to the images in the training samples with a significant amount of sample images and true enough, the model gets it right.
The fifth image should be difficult to classify because of the few number of sample images in the training data but for some reason, the model gets it right.
Here are the results of the prediction:
Image | Prediction |
---|---|
50 km/h | 30 km/h |
80 km/h | 120 km/h |
Pedestrians | 30 km/h |
Stop Sign | Stop sign |
Slippery Road | Slippery Road |
The model was able to correctly guess 2 of the 5 traffic signs, which gives an accuracy of 40%. The accuracy on the test set of the model is 92.7% which is more than twice the model's guess accuracy.
The code for making predictions on my final model is located in the 25th cell of the Ipython notebook.
Probability | Prediction |
---|---|
.505 | 50 km/h |
.202 | 80 km/h |
.992 | Pedestrians |
.999 | Stop signs |
.925 | Slippery Road |