This project is a final project for data science course at Holon Institute of Technology
Dog breed Classification project is made under the guidelines of the ‘Introduction to data science’ course as a part of computer science bachelor studies. The idea for the project came after brainstorming since one of the main requirements of the project was scraping/crawling the data, using pre-made datasets was restricted.
The goal of the project is to create a pipeline that firstly cleans the images data from images without dogs or images with high intensity, and then classifies the breed of the dog in the image.
-- Crawlers: We created 5 different crawlers, but we mainly used the Pond5 and iStocks crawlers.
-- Data cleaning:
- ResNet 50
- IQR cleaning
-- Classifier
- Data acquisition (using scraping, crawling and API)
We started by detecting images without dogs using ResNet50:
After ResNet we found out that our data contains many drawings, which may create problems during classification, thus, we chose to detect the drawings using Entropy. As soon as we started to look for an entropy value threshold from which drawings will be detected, we faced another problem: Images with an uniform background, tend to give as low entropy value as drawings gave. Thus, in the calculation of the entropy, we eliminated the most common value from each image (which, in most cases represents the background in images with a uniform background.
At third step, we calculated the avarage outliers amount for each class. How we did that? 1.Calculate the mean of an image and flatten it. 2.Calculate the IQR 40-60 of that image. 3.Sum all IQRs values of all images of the class and devide by number of images of that class After we had the avarage outliers of a class we decided to eliminate some of the classes due proccessing power limitaion. We took into account the avarages distribution and the distribition of images count of each class.
At sixth step, We calculated and plotted the intensitiy distribution of the images for each class and also for the whole data itself after cleanning to check if we made a good job (the overall intesnitiy decreased)
At first, we hit 67% + overfitting:
Then, we did more epochs and added some augmentation:
- Machine Learning
- Deep learning
- CNN
- Image classification (using Tensorflow)
Project status: Finished.
Author: adids1221, yuvalnsn
In this project all the data came from scrapping and the crawling is used for educational purpose only.
This data is NOT used for distribution of any kind or for any other purpose rather than for educational.
Copyright for the images goes to: Flicker, Pond5, iStock images.
This project is licensed under the terms of the MIT license and protected by Udacity Honor Code and Community Code of Conduct. See license and disclaimer.