Skip to content

adids1221/Dogs-Breed-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dog Breed Classification

This project is a final project for data science course at Holon Institute of Technology

About

Dog breed Classification project is made under the guidelines of the ‘Introduction to data science’ course as a part of computer science bachelor studies. The idea for the project came after brainstorming since one of the main requirements of the project was scraping/crawling the data, using pre-made datasets was restricted.

The goal of the project is to create a pipeline that firstly cleans the images data from images without dogs or images with high intensity, and then classifies the breed of the dog in the image.

Contents of notebooks:

-- Crawlers: We created 5 different crawlers, but we mainly used the Pond5 and iStocks crawlers.

-- Data cleaning:

  1. ResNet 50
  2. IQR cleaning

-- Classifier

Project planing

  • Data acquisition (using scraping, crawling and API)

Data Cleaning

Data cleaning pipeline: Screen Shot 2022-02-12 at 14 19 17

We started by detecting images without dogs using ResNet50:

Screen Shot 2022-02-12 at 14 22 49

After ResNet we found out that our data contains many drawings, which may create problems during classification, thus, we chose to detect the drawings using Entropy. As soon as we started to look for an entropy value threshold from which drawings will be detected, we faced another problem: Images with an uniform background, tend to give as low entropy value as drawings gave. Thus, in the calculation of the entropy, we eliminated the most common value from each image (which, in most cases represents the background in images with a uniform background.

Screen Shot 2022-02-12 at 14 20 32

At third step, we calculated the avarage outliers amount for each class. How we did that? 1.Calculate the mean of an image and flatten it. 2.Calculate the IQR 40-60 of that image. 3.Sum all IQRs values of all images of the class and devide by number of images of that class After we had the avarage outliers of a class we decided to eliminate some of the classes due proccessing power limitaion. We took into account the avarages distribution and the distribition of images count of each class.

Screen Shot 2022-02-12 at 14 20 49

At sixth step, We calculated and plotted the intensitiy distribution of the images for each class and also for the whole data itself after cleanning to check if we made a good job (the overall intesnitiy decreased)

Screen Shot 2022-02-12 at 14 21 08

Screen Shot 2022-02-12 at 14 21 18

Screen Shot 2022-02-12 at 14 21 29

Classification

At first, we hit 67% + overfitting:

Screen Shot 2022-02-12 at 14 36 10

Then, we did more epochs and added some augmentation:

Screen Shot 2022-02-12 at 14 37 17

Screen Shot 2022-02-12 at 15 03 00

**Project's log: ** Screen Shot 2022-02-12 at 14 39 55

  • Machine Learning
  • Deep learning
  • CNN
  • Image classification (using Tensorflow)

Contributing

Project status: Finished.
Author: adids1221, yuvalnsn

In this project all the data came from scrapping and the crawling is used for educational purpose only.
This data is NOT used for distribution of any kind or for any other purpose rather than for educational.
Copyright for the images goes to: Flicker, Pond5, iStock images.

This project is licensed under the terms of the MIT license and protected by Udacity Honor Code and Community Code of Conduct. See license and disclaimer.

About

Data Science, Machine Learning Study Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published