Skip to content

Object Recognition Datasets and Challenges: A Review

Notifications You must be signed in to change notification settings

AbtinDjavadifar/ORDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 

Repository files navigation

ORDC

Object Recognition Datasets and Challenges: A Review

Table 1 - Early Milestone object recognition datasets

Dataset # of classes # of images Annotation type Year Description
COIL-100 100 7,200 Classification 1996 Single-object images with black background – 72 poses for each object.
FERET 1,199 14,126 Classification 1997 Large-Scale face recognition dataset and testing framework.
BSDS - 500 Segmentation 2001 Category agnostic segmentation of natural context images.
Caltech-101 102 9,144 Bounding Box 2003 101 common object categories.
LabelMe 182 62,197 Polygons 2005 Public Online Annotation Tool Polygons instead of classification annotation.
Caltech-256 257 30,307 Bounding Box 2006 An extension for Caltech-101.
Tiny Images 75,062 80 m Classification 2009 32×32 images hierarchically annotated based on the Wordnet Lexical database.

Table 2 - Dataset statistic for PASCAL VOC, ImageNet, MS COCO, and Open Images

Dataset # of Classes # of Images Average Objects Per Image First Introduced
PASCAL VOC 20 22,591 2.3 2005
ImageNet 21,841 14,197,122 3 2009
Microsoft COCO 91 328,000 7.7 2014
Open Images 600 9,178,275 8.1 2017

Table 3 - Challenge Description for PASCAL VOC, ILSVRC, MS COCO, and Open Images

Challenge Tasks Covered # of Classes # of Images # of Annotated Objects Years active Task Description Evaluation Metric
PASCAL VOC Image Classification 20 11,540 27,450 2005 - 2012 Absence/presence prediction of at least one instance of every class in each image Average Precision
Detection 20 11,540 27,450 2005 - 2012 Bounding box prediction for every instance of the challenge classes present in images Average Precision with IoU > 0.5
Segmentation 20 2913 6929 2007 - 2012 Semantic segmentation for the object classes IoU
Action Classification 10 4588 6278 2010 - 2012 bounding box prediction or single points for persons performing an action and annotate with the corresponding action label AP over action class classification
Person Layout Taster 3 609 850 2007 - 2012 Body part (hands, head, feet) detection with bounding boxes AP calculated separately for parts, with IoU > 0.5
ILSVRC Image Classification 1000 1,331,167 1,331,167 2010 - 2014 Classification for one annotated class per image Binary class error over the top 5 predictions per image
Object Localization 1000 573,966 657,231 2011 - 2017 Bounding box detection for only one object per image Binary class and bounding box IoU error over the top 5 predictions
Object Detection 200 476,688 534,309 2013 - 2017 Bounding box prediction for all instances per image AP flexible recall threshold varied proportional to bounding box size
Object Detection from Video 30 5,314 (video snippets) - 2015 - 2017 Continuous bounding box prediction throughout video sequences AP flexible recall threshold varied proportional to bounding box size
Microsoft COCO Detection 80 123,000+ 500,000+ 2015 - present Instance Segmentation over object classes (things) AP at IoU = [0.5:0.05:0.95]
Keypoints 17 123,000+ 250,000+ 2017 - present Simultaneous object detection and keypoint localization AP based on Object Keypoint Similarity (OKS)
Stuff 91 123,000+ - 2017 – present Pixelwise segmentation of background categories Mean IoU
Panoptic 171 123,000+ 500,000+ 2018 - present Full segmentation of images (stuff and things) Panoptic Quality
DensePose - 39,000 56,000 2019-present Human body segmentation and mapping all the pixels of the body to a template 3D model AP based on Geodesic Point Similarity (GPS)
Open Images Object Detection 500 1,743,042 12,421,955 2018 - present Hierarchical-based bounding box detection mAP
Instance Segmentation 300 ~ 848,000 2,148,896 2018 - present Instance Segmentation over object classes, negative labels included to refine training mAP at IoU>0.5
Visual Relationship Detection 57 1,743,042 380,000 relationship triplets 2018 - present Labeling images with relationship triplets containing the interacting objects and the action class A weighted sum of mAP and recall of number of relationships at IoU>0.5

Table 4 – Generic object detection datasets

Dataset # of Images # of Classes # of Bounding Boxes Year
Caltech 101 9,144 102 9144 2003
MIT CSAIL 2,500 21 2500 2004
Caltech 256 30,307 257 30,307 2006
Visual Genome 108,000 76,340 4,102,818 2016
YouTube BB 5.6 m 23 5.6 m 2017
Objects 365 638,000 365 10.1 m 2019

Table 5 – Object Segmentation datasets

Dataset # of Images # of Classes # of Objects Year Challenge Description
SUN 130,519 3819 313,884 2010 No The main purpose of the dataset is scene recognition, however instance-level segmentation masks have also been provided
SBD 10,000 20 20,000 2011 No Object contours on the train/validation images of PASCAL VOC
Pascal Part 11,540 191 27,450 2014 No Object part segmentations for all the 20 class in the PASCAL VOC dataset
DAVIS 150 (videos) 4 449 2016 Yes A video object segmentation dataset and challenge focused on semi-supervised and unsupervised segmentation tasks
YouTube-VOS 4,453 (videos) 94 7,755 2018 Yes videos object segmentation dataset collected of short (3s-6s) video snippets
LVIS 164,000 1000 2 m 2019 Yes Instance segmentation annotations for a long-tail of classes with few samples
LabelMe 62,197 182 250,250 2005 No Instance-level segmentations, some of the background classes have also been annotated

Table 6 – Popular scene recognition datasets

Dataset # of Images # of Classes Additional Annotations Year Description
15-Scene 4,485 15 - 2006 One of the earliest major scene classification datasets
MIT Indoor67 15,620 67 - 2009 Indoor scene classification in 5 main groups: Store, Home, Public Space, Leisure, and Working Place
SUN 130,519 899 313,844 SM (Objects) 2010 Classification dataset of navigable scenes with additional object recognition annotations
SUN Attribute 14,000 700 102 binary attributes per image 2012 attribute-based representation of scenes for a subset of the original SUN database
Open Surfaces 25,357 160 71,460 SM (Surfaces) 2013 Segmented surfaces in interior scenes with texture and material information
Places2 10 m 476 - 2017 Classification of scenes bounded by spaces a human body would fit, with binary attributes

Table 7 – Scene parsing datasets

Dataset # of Images Stuff Classes Object Classes Year Challenge Highlights
MSRC 21 591 6 15 2006 No One of the earliest semantic scene parsing datasets, Images were later used in [71], [101]
Stanford Background 715 7 1 2009 No Outdoor scene parsing dataset collected from LabelMe, MSRC, and PASCAL VOC. Geometric features also included
SiftFlow 2688 18 15 2009 No An early dataset on outdoor environment scene parsing labeled using LabelMe
Barcelona 15,150 31 139 2010 No A subset of the LabelMe dataset
NYU Depth V2 1,449 26 893 2012 No Parsing of 464 cluttered indoor scenes, depth maps also included. Semantic segmentation for objects
SUN+LM 45,676 52 180 2013 No A fully annotated subset of LabelMe and SUN datasets with both indoor and outdoor images
PASCAL Context 10,103 152 388 2014 No Pixel-wise semantic segmentation on the PASCAL VOC dataset. 520 new object and stuff categories were added to the original dataset.
SUN RGB-D 10,335 47 800 2015 Yes Indoor scene parsing dataset and benchmark, 3D bounding boxes also provided
Cityscapes 25,000 14 13 2016 No Images captured from a vehicle driving in urban environments across 50 cities in different weather conditions in Europe. Instance-level segmentations
ADE20K 25,210 1,242 1,451 2017 Yes Includes object part labels, and attributes. Instance-level segmentations
Synscapes 25,000 14 13 2018 No Photo-realistic synthetic scene parsing of urban environments. Annotation categories are the same as Cityscapes. Instance-level segmentations
MS COCO Stuff 163,957 91 80 2018 Yes Pixel-wise semantic segmentation for the entire MS COCO dataset

Table 8 – Popular Street-view autonomous driving datasets

Dataset Year Location Annotated frames ** # of Classes ** Object Annotations Highlights
KITTI 2012 Karlsruhe, Germany 15k 8 200k 3D BB Pioneer benchmark dataset for 3D object detection, multimodal
Cityscapes 2016 50 cities in EU 25k 27 65k SM annotation richness, scene variability and complexity Provided with depth information with stereo image and sensors
BDD 100k 2017 NY, SF 100k 40 Objects 8 Lanes 1.8M BB Diversified in location and weather conditions, Instance segmentation masks provided for 10k images of the dataset
KAIST 2018 Seoul 8.9k 3 308k BB All-day capture conditions (e.g., sunrise, morning, noon, etc.), multimodal
ApolloScape 2018 4x China 144k 25 Object28 Lanes 70k 3D BB Contains lane markings based on the lane colours and styles, Instance level annotations are available , Tricycles are also annotated
A*3D 2019 Singapore 39k 7 230k 3D BB Focused on pedestrian detection High driving speed and low annotation speed
Argoverse 2019 Miami, Pittsburgh 22k 15 993k 3D BB Focused on 3D object tracking and motion forecasting, Annotated HD semantic maps included
Automative RADAR 2019 Germany 500 7 3000 3D BB RADAR data and object detection based on RADAR data
H3D 2019 SF 27k 8 1.1M 3D BB to stimulate research on full-surround 3D multi object detection and tracking
nuScenes 2019 Boston, SG 40k 23 1.4M 3D BB First dataset provided 3D dataset with attribute annotations, first to provide RADAR data, rich multimodal information
Waymo 2019 3x USA 200k 4 9.9M BB, 12M 3D BB 15 times diverse than any available data, First dataset- such low-level synchronized info available, making it easier to conduct research on LiDAR input representation other than the popular 3D point set format
Mapillary Vistas 2017 Global 25k 152 8M SM Scene-parsing with instance-level object segmentation with a diverse geographic, weather, season and daytime extent
Lyft L5 2019 Paolo Alto 46k 9 1.3M 3D BB Multimodal captured by a fleet of vehicles, an annotated LiDAR semantic map is provided,
D²-City 2019 China 700k 12 50k BB Sampled from dashcam video sequences, Bounding cube annotations, Tricycles are also annotated

Table 9 – Pedestrian Detection Datasets. Number of images does not include unannotated images. Unique pedestrians are considered for the number of pedestrians.

Dataset Year # of Cities # of Images # of Pedestrians Highlights
CityPersons 2017 27 cities in EU 5000 35016 Built on top of the Cityscapes dataset
INRIA 2005 - 614 902 Occlusion labels included
Caltech 2009 1 250,000 2300 Temporal correspondence and occlusion labels included, Sampled from 10 hours of video
MIT Ped. 2000 - 1800 1800 Labelled using the LabelMe annotation tool
EuroCity 2018 31 cities in EU 47,000 238,000 Largest pedestrian detection dataset to date
NightOwls 2018 7 32 55,000 Pedestrian detection at night time, detailed annotations attributes: pose, occlusion, and height
Daimler 2009 1 21,790 56,492 Occlusion attributes provided, monocular images

Table 10 – Bird’s eye view datasets

Dataset Year Location Road span/Area Size of data Highlights
NGSIM 2005 USA 500-640m Span of road 90 min Video cameras attached to the adjacent buildings Speed levels more than 75km/h are not included in the dataset Very less amount of truck class
HighD 2017 Germany 420m Span of road 16.5 hours Drone based dataset with five scenario description layers, the first 3 layers include static scenario description, 4th layer includes dynamic description,5th layer includes environment conditions
The inD (Intersection Drone Dataset) 2017 4 locations in Aachen, Germany Altitude 100m 80x40 meters to 140x70 meters 10 hours Of video recording dataset contains more than 11500 road users including vehicles, bicyclists and pedestrians at intersections
INTERACTION 2019 USA, China, Bulgaria, Germany n/a 365min+ 433min+ 133min+ 60min Data collected from drones and traffic cameras Multimodal, driving behavior
AU-AIR 2019 Aarhus, Denmark Flight altitude (5m to 30m) and camera angle 45 to 90 degree 2 hours multi-modal sensor data (i.e., visual, time, location, altitude, IMU, velocity) differences between natural and aerial images in the context of object detection task

Table 11 - AV-related object recognition and scene understanding challenges

Challenge/Benchmark Year Task Dataset Metric Highlight
CVPR 2018 - Video Segmentation Challenge 2018 Video Segmentation - mAP & IoU Segmentation of movable object from video frames.
CVPR 2018 - Berkeley DeepDrive challenges 2018 Road Object Detection & Drivable Area Segmentation & Domain adaptation BDD 100K dataset AP & IoU Multi-tasks.
nuScenecs 3D detection challenge 2019 3D model generation nuScenes dataset mAP & TP Generate 3D model of the environment. Using sensor data retrieved from camera, lidar, and radar.
Lyft 3D Detection for Autonomous Vehicle 2019 Object detection Lyft Level 5 dataset IoU 3D object detection over semantic maps.
NightOwls Pedestrian Detection Challenge 2019 Pedestrian detection NightOwls dataset Standard average missing rate RGB pictures of pedestrians in dim environment.
D²-City Detection Domain Adaptation Challenge 2019 Object detection & Domain adaptation Image-Net & BDD 100K datasets AP & IoU Transfer learning. Domain adaptation for datasets from two different countries.
WIDER Face & Person Challenge 2019 Pedestrian detection WIDER dataset mAP & IoU Detection of pedestrians and cyclist in unconstrained environment.
CVPR 2019 - Beyond Single-frame Perception 2019 3D object detection - mAP & IoU Using 3D lidar scanned point clouds. High quality dataset with different environment conditions.
The KITTI 3D Object Evaluation Benchmark 2017 Object detection KITTI dataset precision-recall curve & AP Dataset consists of images with their point clouds.
GM-ATCI Rear-view pedestrians dataset Benchmark 2016 Pedestrian detection GM-ATCI Rear-view pedestrians dataset IoU Study of position and occlusion pattern of pedestrian
Caltech Pedestrian Detection Benchmark 2012 Pedestrian detection Caltech Pedestrian dataset IoU -
The KITTI 2D Object Evaluation Benchmark 2012 Object detection & Object Orientation KITTI dataset precision-recall curve & AP & average orientation similarity Objection detection from 2D RGB images

Table 12 - Medical imaging datasets

Dataset Size Year Target disease/organ Content Challenge/ Benchmark Description
NLM's MedPix Database 59000 images - - Integrated images no A free online dataset contains more than 12000 patient cases
STARE Database ~400 cases - Eye retinal images no Blood vessel segmentation images
SMIR 350425 images - - CT scans yes 51 subjects of whole-body postmortem CT scans
EchoNet-Dynamic 10030 images 2020 Heart Echocardiographic video frames yes An expert labeled dataset for the study of cardiac motion and chamber size.
Atlas of Digital Pathology 17668 images 2020 Radiological diagnosis Histological patch images yes Images of different organs with 57 types of hierarchical tissue annotated
COVID-CT Dataset 349 images 2020 COVID19 CT scans no Specifically targeting the worldwide pandemic virus.
SARAS-ESAD Dataset 22601 frames 2020 Prostatectomy procedure Video frames yes A dataset of videos showing the full prostatectomy procedure by surgeons
The StructSeg 2019 Dataset 120 cases 2019 Radiotherapy planning CT scans yes A dataset for the treatment of cancers
ODIR-5K 5000 images 2019 Eye fundus photographs yes Fundus images taken by various cameras with different resolutions
DRIVE 400 cases 2019 Eye Retinal images yes Images of 400 different patients between 25-90 years of age.
The RSNA Brain Hemorrhage CT Dataset 874035 images 2019 Brain Hemorrhage CT scans yes Images gathered from 2 medical societies and 60 neuroradiologists
The KiTs19 Challenge Dataset 300 cases 2019 Kidney tumor CT scans yes A dataset of multi-phase CT imaging with segmentation masks
SegTHOR 60 scans 2019 Lung CT scans No A dataset focused on the segmentation of organs at risk in the thorax
The EAD Challenge Dataset 2700 images 2019 Hollow organs Endoscopic video frames yes Images collected from 6 different data centers
Oasis Brains Dataset ~1000 cases 2019 Brain MRI & PET images no A dataset collected over 30 years
CheXpert 224316 images 2019 Chest Chest radiographs yes A dataset labeled by an automatic labeler
LERA 182 patients 2019 Musculoskeletal disorder Radiographs yes Images of hip, foot, ankle and knee of patients for the study of musculoskeletal disorders
CAMEL colorectal adenoma Dataset 177 cases 2019 Cancer Histology images no A dataset for segmentation of cancerous parts in organ
BACH Dataset 430 images 2019 Breast cancer Microscopy & whole-slide images yes Microscopy images labelled by 2 experts
MRNet 1370 patients 2018 Knee MRI yes A dataset for autonomous MRI diagnosis
The REFUGE Challenge Dataset 1200 images 2018 Glaucoma Fundus photographs yes The dataset was collected using two types of devices.
MURA 40561 images 2018 Bone musculoskeletal radiographs yes A manually labeled dataset by board-certificated Stanford radiologists, containing 7 body types: finger, hand, elbow, forearm, humerus, wrist and shoulder
Calgary-Campinas Public Brain MR Dataset 167 scans 2018 Brain MRI no A dataset for analysis of brain MRI
HAM 10000 Dataset 10015 images 2018 Skin lesions Dermatoscopic images yes A multi-modal and multi-population dataset
NIH Chest X-ray Dataset 100000 images 2017 Chest X-ray images no A dataset of x-ray images
RESECT 23 patients 2017 Cerebral Tumor MRI & intra-operative ultrasound yes A dataset of homologous landmarks
Cancer Digital Slide Archive - 2017 Cancers Glass slides of histologic images no High resolution detailed images of tissue microenvironments and cytologic details
609 Spinal Anterior-posterior X-ray Dataset 609 images 2017 Spine X-ray images No Each vertebra was located by a landmark and the landmark is used to calculate Cobb angles.
Cholec80 80 videos 2016 Surgery Video frames no A dataset containing 80 videos of surgeries performed by 13 different surgeons
CRCHistoPhenotypes - Labeled Cell Nuclei Data 100 images 2016 Cell Histology images no 100 H&E stained histology images of colorectal adenocarcinomas
CSI 2014 Vertebra Segmentation Challenge Dataset 10 scans 2016 Spine CT scan yes Entire thoracic and lumbar spine were covered by the images. The in-plane resolution is from 0.31 to 0.45mm. The slice-thickness is 1mm or 2mm.
Multi-Modality Vertebra Dataset 20 cases 2015 Vertebra MRI & CT scan no The 3D vertebra centre location and orientation are annotated.
CVC colon DB 1200 frames 2012 colon & rectum Colonoscopy video frames no The dataset's region of interest has been annotated. The video frames were specifically chosen for maximum visual distinction among them.
LIDC-IDRI Database 1018 cases 2011 Lung nodule CT scans yes A database created by 7 academic centers and 8 medical imaging companies
Computed Tomography Emphysema Dataset 115 slices 2010 COPD CT scans no High-resolution CT scans
DIARETDB1 89 images 2007 Diabetic retinopathy fundus photographs no A database for benchmarking the detection of diabetic retinography
ELCAP Public Lung Image Database 50 sets 2003 Lung CT scans no 50 low-dose documented CT scans for lungs containing nodules
The Digital Database for Screening Mammography 2620 cases 1998 Breast Mammography images no The database has the function for user to search classes among normal, benign and cancer.

Table 13 - Medical Imaging challenges

Challenge Year Task Dataset Metric Highlight
SARAS 2020 Detection SARAS-ESAD Dataset mAP Promote the AI integrated minimally invasive surgery. It starts with the detection of surgeons' actions.
REFUGE 2020 Detection & Segmentation REFUGE Challenge Dataset - Promote automated segmentation and detection for glaucoma.
StructSeg 2019 Segmentation The StructSeg 2019 Dataset DSC & 95% Hausdoff Distance Targeting both lung cancer and nasopharynx cancer. Evaluation of gross target volume and organs at risk.
DRIVE 2019 Segmentation DRIVE Overall prediction accuracy and s score Promote the implementation of screening programs for diabetic retinopathy, Promote the diagnosis of hypertension and computer-assisted laser surgery
ODIR 2019 Classification ODIR-5K Precision, accuracy and dice similarity Promote the implementation of AI in retinal image analysis,
RSNA Intracranial Hemorrhage Detection 2019 Detection The RSNA Brain Hemorrhage CT Dataset Weighted multi-label log loss Promote the detection of acute intracranial hemorrhage and respective subtypes.
KiTS 2019 Segmentation The KiTs19 Challenge Dataset FROC Promote kidney tumor semantic segmentation.
EAD 2019 Classification Detection Segmentation The EAD Challenge Dataset average Dice coefficient Promote the diagnosis and treatment for diseases in hollow organs.
AASCE 2019 Regression AASCE Challenge Dataset Symmetric mean absolute percentage error Promote methologies for automated spinal curvature estimation and correction of error from x-ray images
CuRIOUS 2019 Registration RESECT Threshold Jaccard Index and normalized multi-class accuracy Promote the implementation of AI to surgery.
ISIC 2018 Classification The HAM 10000 mAP, IoU, Dice coefficient, Jaccard Index, F2 score and deviation score. Promote the automated diagnosis of melanoma.
Data Science Bowl - Find the nuclei in divergent images to advance medical discovery 2018 Classification - IoU Promote the detection of nucleus. Further drive the development of cures for various diseases.
ICIAR 2018 Classification Segmentation BACH Mean target registration errors Promote the early diagnosis of breast cancer to increase the cure rate significantly.
LUNA 2016 Classification Detection LIDC-IDRI Kappa score, F1 score and AUC The challenge focuses on large-scale evaluation of automatic detection of lung nodule algorithms.

Table 14 – Well-known face recognition datasets. Abbreviations in the table: Oclusion (O), Pose (P), Age (A), Expression (E), Skin color (S), Gender (G), Bounding Boxes (BB), Keypoints (KP), V (video)

Dataset Year # of Subjects # of Images Additional Information Highlights
VGGFace 2015 2,622 2.6 M A Large-scale celebrity recognition with high intra-class variations
VGGFace2 2018 9,131 3.31M A, P Diversified pose, age, and ethnicity of celebrity faces
LFW 2007 5,749 13,233 - The first unconstrained FR dataset
MegaFace 2016 672,052 4.7 M - Raised difficulty by including 1 M distractors, non-celebrity subjects
YTF 2011 1,595 3,425 V - Designed for face verification in videos; same format as LFW
CASIA-WebFace 2014 10,577 494,414 - First publicly available large-scale FR dataset
IJB-A 2015 500 5,712 BB, KP Manually verified bounding boxes for face detection, nose and eye keypoints included
MS-Celeb-1M 2016 100,000 10 M - Celebrity identification dataset and benchmark with a linked celebrity knowledge base
Pubfig 2009 200 60,000 A, E, G, P, BB 73 automatically generated attributes provided, same format as LFW
CelebA 2015 10,177 202,599 KP Designed for face attribute prediction in the wild, 40 binary attributes included
DiF 2019 - 0.97 M A, P, S, BB, KP Quantitative facial features included to reduce recognition bias across different demographics
IMDB-Face 2015 100,000 460,723 A, G Age and gender prediction on a set of celebrities collected from IMDB
UMDFaces 2016 8,501 367,920 A, P, G, BB, KP Detailed human-verified attributes and annotations
IJB-B 2015 1,845 21,798 A, G, P, S A superset of IJB-A with additional occlusion, illumination
IJB-C 2018 3,531 31,334 A, G, P, S An improvement upon IJB-B with a focus on diversifying the geographic coverage of subjects
FaceScrub 2014 695 141,130 G A broad dataset of movie celebrities gathered from IMDB
CACD 2014 2,000 163,446 A Images include age variations for each subject for cross-age face recognition and retrieval, only 200 subjects are manually annotated.

Table 15 – Remote sensing object detection datasets. Dataset size is the number of images unless states otherwise

Dataset Year Annotation Size Spatial Resolution (cm per pixel) Description
SpaceNet C.1&C.2 2019 685,000 buildings 5,555 30-50 building footprints annotated using polygons, 5 cities
SpaceNet C.3 2019 8,676 km road 5,555 30-50 Road centerlines labeled based on the OpenStreetMap scheme
COWC 2016 32,716 vehicles - 15 Car detection dataset gathered from 6 cities in North America and Europe, cars annotated with points on centroids
xView 2018 1M objects 1,400 30 Large-scale object overhead object detection dataset with bounding box annotations
FMoW 2017 132,700 objects 1M - Temporal image sequences from over 200 countries with the purpose visual reasoning about location, time, and sun angles. Bounding box annotations
NWPU-RESISC45 2017 31,500 scenes 31,500 20-3000 Aerial scene classification dataset with variations in spatial resolution, illumination, object pose, occlusion
TorontoCity 2016 400,000 buildings 712 10 RGB and LiDAR Aerial imagery of the greater Toronto area augmented with and street-view stereo and LiDAR
DOTA 2018 188,282 objects 2,806 - Rotated bounding box annotations verified by expert annotators, 15 common classes
TAS 2008 1,319 vehicles 30 - An early annotated remote sensing dataset from collected from google earth, bounding boxes
DLR3K 2013 3,472 vehicles 20 13 Rotated bounding boxes with additional orientation annotations
NWPU VHR-10 2016 3,775 objects 715 50-200 generic object detection dataset with 10 classes, bounding box annotations
LEVIR 2018 11,000 objects 22,000 20-100 Bounding boxes, annotations provided for airplanes, ships, and oilpots
VEDAI 2016 3,600 vehicles 1,210 12.5 Small vehicle detection consisting of 9 vehicle classes, rotated bounding boxes
UCAS-AOD 2015 6,000 objects 910 - Rotated bounding box annotations, vehicle and airplane detection, taken from Google Earth
AID 2016 10,000 scenes 10,000 50-800 Aerial scene classification with 30 classes

Table 16 – Remote sensing challenges. * the number of classes for the land cover classification task.

Challenge Year Dataset size # of Classes Evaluation Metric Task
SpaceNet C.1&C.2 2019 5,555 2 score Building footprint detection in 5 cities
SpaceNet C.3 2019 5,555 1 Average Path Length Similarity Road network extraction
FMoW 2017 1 M 63 score Object Classification
DSTL 2017 57 10 IoU Semantic Segmentation
NWPU- RESISC45 2017 31,500 45 Accuracy Scene Classification
DIUx xView 2018 1,400 60 IoU Object detection
DeepGlobe 2018 10,000 7 * IoU, score Building segmentation, road extraction, land cover classification

Table 17 – Species Recognition Datasets

Dataset # of Images # of Classes # of Annotations Year Challenge Description
Flower 102 8,189 103 8,189 SM 2008 No Flower recognition dataset of 103 flower categories common in the United Kingdom
Caltech-Birds 2011 11,788 200 11,788 BB 2011 No 15 part locations and 28 attributes for each bird
Stanford Dogs 22,000 120 22,000 BB 2011 No Single-object per image dataset for dog breed recognition
F4K 27,370 23 27,370 CL 2012 No Fish recognition dataset annotated by following marine biologists
Snapshot Serengeti 1.2 m 61 406,433 CL, 150,000 BB 2014 No Wild animal classification dataset gathered using 225 camera-traps in Serengeti National Park in Africa
NABirds 48,562 555 48,562 BB 2015 No Expert-curated dataset of North American birds, 11 bird parts annotated in every image
PlantCLEF 434,251 10,000 10,000 CL 2015 Yes Plant classification dataset gathered in the Amazon rainforest
iNat 675,175 5,089 561,767 BB 2017 Yes Manually collected dataset of 13 super-class and 5k sub-class species, organized in a hierarchical taxonomy, highly imbalanced
Dogs-in-the-Wild 300,000 362 300,000 CL 2018 No A large dataset for dog breed classification in natural environments
AnimalWeb 21,900 334 198k KP 2019 No Hierarchically categorized dataset for animal face recognition with 9 keypoint annotations per face
IP102 75,000 102 75,000 CL, 19,000 BB 2019 No Hierarchically categorized dataset for insect pest recognition

Table 18 - Clothing Detection Datasets

Dataset # of Images # of Classes # of Annotated Clothing instances Annotation Type Year Challenge/Benchmark # of Attributes
DARN 182,000 20 182,000 BB 2015 No 9
Street2Shop 404,000 11 20,357 BB 2015 No -
DeepFashion 800,000 50 180,000 KP 2016 Yes 5
ModaNet 55,000 13 240,000 BB, SM 2018 No -
FashionAI 324,000 41 324,000 KP 2018 No 68
Deepfashion2 491,000 13 801,000 BB, SM, KP 2019 Yes 4

About

Object Recognition Datasets and Challenges: A Review

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published