Skip to content


Repository files navigation

Bird Classification on Caltech-UCD dataset using CNNs

Problem Statement : ​ ​Write code to implement simple ​ bird image classification ​ on ​ customized CNN ​ The accuracy has to be ​ above 95% ​.You can use either ​ TensorFlow ​ or ​ PyTorch. ​You have to use the below dataset.


Download it, split into train, val, test sets in the ratio (70, 10, 20).

Caltech-USD Birds 200 ​ : The dataset was released by Caltech It's an image dataset with photos of 200 bird species (mostly North American). For detailed information about the dataset, please see the technical report linked below.

Number of categories: ​ 200

Number of images: ​ 6033

Annotations: ​ Bounding Box, Rough Segmentation, Attributes Size of the images folder is 648 MB in .tgz format.

  • For this classification task we only need images and lists folder.

    ● Images

The images organized in subdirectories based on species.

● Lists

 classes.txt : list of categories (species)

 files.txt : list of all image files (including subdirectories)

 train.txt : list of all images used for training

 test.txt : list of all images used for testing

 splits.mat : training/testing splits in MATLAB .mat format
  • Size after augmentation : 12066 Images (64X64)

  • Augmentation : Image Flip

    ● X_train : 16892

    ● X_test : 4826

    ● X_valid : 3016

Images were converted into numpy arrays and classes.txt folder was used to label the respective images.

CNN Architecture image

Brief Explanation :

We start with 64 X 64 RGB (3 Channels) images and feed it to our 11 layer neural network. The following diagrams will give an even better idea.


64@21X21 undergoes Max Pooling 3X3 to give 64@7X7 as shown in the next picture. Then followed by a 256@7X7 and another Max Pooling of 3X3 gives 256@2X2 which is flattened out into a 1024 vector.


Which undergoes matrix multiplication to give 512 vectors before undergoing another matrix multiplication to give 201 numbers representing a dummy label and 200 species.To know more about the dummy label go through the notebook.


Model Summary



  • Library Used: Keras with Tensorflow

  • Loss Function: Categorical CrossEntropy Since Classes are mutually exclusive. ( A bird can be of only one species)

  • Optimizer Used : Adam

  • Learning Rate : Learning rate=0.001, β1=0.9, β2=0.=0.999

  • Dropout : 0.5 , that is half of total neurons but only during train time.

  • Total Number of epochs: 5X25 = 125


  • Best Results from CNN
Data Type Acccuracy(%)
Training Set 90.16
Test Set 90.64
Validation Set 88.56
  • Other Architecture Performances

As the required accuracy was to be more than 95% I tried out a few more existing architectures like Resnet18, Resnet 50, Vgg16 unfortunately I could not get any desired accuracy in fact lower , maybe because the dataset we were dealing with was too small for big architectures. For these architecture's implementations library was used.Data Augmentation was done to avoid overfitting as much as possible using Imagelist.

  • Results are as follows:

Vgg16 with batchnorm​ : Maximum accuracy of 39% accuracy kept on oscillating back and forth but did not move above 39% in any of the epochs.


Resnet 18 : It also did not perform well, got an mere accuracy of 20% and did not converge any further,image size was changed to as described in paper to fit the model.


Snapshot From Notebook


Future Work

  1. I think this performance could be increased by using 256X256 pixel inputs with a deeper model than our current architecture but the problem availability of RAM in Google Colab the kernel crashes when I tried to augment 256 X 256 pictures and convert them.

  2. I also tried without augmentation the RAM still is not sufficient and kernel crashes. So with a GPU Machine that has a better RAM capacity we can expect better accuracy by carefully redesigning a better architecture and applying good regularization techniques to avoid overfitting and get a better generalization.

  3. I also tried with 128X128 it was not stable, that is it gave different results every time ,and best result it produced was sometimes equal to the result produced by 64X

  4. Maybe Resnet 18 can also give a good result if we use more data by better augmentation and pre-processing as given in paper.The reason we have got such a less accuracy may be because of incorrect normalization.

  5. Whether Pretrained Network will improve accuracy is debatable because images in this dataset overlap with images in ImageNet. We need to take extreme caution when using networks pre trained as the test set of CUB may overlap with the training set of the original network.

Instrcutions to run it on your Local Machine

If you are using Google Colaboratory then you do not need to install any packages because all the packages are pre installed you are only required to import them which will be step 2, but if you are using local machine then these are the packages you have to install.

Open Python type the following commands to install the respective package.

 Keras     : pip install keras

 Tensorflow: pip3 install --user --upgrade tensorflow

 glob is preinstalled in python


Download latest OpenCV release from sourceforge site and double-click to extract it.

Goto opencv/build/python/2.7 folder.

Copy cv2. pyd to C:/Python27/lib/site-packeges.

Open Python IDLE and type following codes in Python terminal.

           >>> import cv2 
           >>> print cv2. __version__

Numpy, <atplotlib are pre installed in python

After installing all the packages you can import them easily, for detailed imports please go through .ipynb/.py file.


  Images:  (648MB .tgz file)

  Lists :   

Caltech serves are a bit slow it will roughly take more than a hour on a good internet connection, I have already downloaded the dataset and stored it in my drive under the name 'UCSD 200'.You can use it as well. Add it to your drive

  Link :

Add it your drive account and follow the following instructions.


After Placing the Dataset in your Google drive, open Colabarotory, mount your drive and execute the Import Block followed by Extracting All Files Codeblock from my ipynb file.

The Notebook now has different blocks making it flexible to execute. Each block as a seperate funciton. Execute each block one by one or independently the other blocks.Keep in mind some blocks have depenence on other blocks.

If you want to check the perfomance of the model, follow this procedure:

Load the in to your Drive

 Link : folder has the following:

1. Best Model(best_model.h5)

2. Best Model Weights (best_weights.h5)

3. X_test,y_test ( X_test.npy , y_test.npy )

Execute the Import Block, Custom CNN Block which contains the 'create model function' and unzip

The code is as follows






Now you can execute this code on Evaluation Block for checking the model's perfomance.

UPDATE : I have also added .py file incase you are having troubles viewing .ipynb notebook. :)

To run it execute the following code

 !python  or  python


UCSD 200 Dataset Classification Using Custom CNN






No releases published


No packages published