A data preprocessing script for creating your own custom dataset out of OpenImagesDB
OpenImages is an extensive dataset made by google. I have arranged this simple script for those interested in assembling your own dataset, as a subset from OpenImages. As default, I use a subset called BodyParts (BP), but the classes, the split of train/test images, can easily be switched out for what you want.
0- Install the required libraries (OpenCV, Numpy, Jupyter Notebook, Pandas)
1- Download train-annotations-bbox.csv into this repository
2- Edit the Jupyter Notebook script's parameters to suit your needs
3- Run the script
Assuming you use the default parameters, you'll end with your directories and files looking like this:
- Object_Detection_DataPreprocessing.ipynb
- train-annotations-bbox.csv
- train-images-boxable.csv
- class-descriptions-boxable2.csv
- BP
- classCSVs
- arm_img_url.csv
- beard_img_url.csv
- ...
- data
- arm
- (arm images)
- beard
- (beard images)
- ...
- arm
- test
- (test images)
- train
- (train images)
- annotations.txt
- test.csv
- train.csv
- classCSVs
annotations.txt is what you will want to use for training your model(s) on the custom dataset. These annotations are in the format (img_name, x1, y1, x2, y2, label)
which are more acessible for training.
Credit to RockyXu66's work on his Faster RCNN repo for Open Images in which this script is based on.