This repository contains the code, configuration files and dataset statistics used for the paper
Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark submitted
to IROS 2021.
The repository is organized as follows:
- datasets (dir): Contains the COCO annotation files used for each dataset
- detection (dir): This directory contains the configuration files for detection, the log files and some scripts used for setting up the detection dataset.
- PyTorch 1.7.1
- OpenCV 4.5
- MMCV 1.2.4
- MMDet 2.8.0
- MMTrack 0.5.1
Detection and tracking was carried using the OpenMMLab frameworks for each task. In this section, we give a summary on how to setup the frameworks for each task
-
Install MMDetection using the Getting Started guide.
-
Create a directory under the
configs
folder (e.g.,configs/uavbenchmark
) and copy the config files. -
Create the
data
directory in the root folder of the project and create the dataset folders (you can also use symbolic links):- anti-uav
- anti-uav/images
- drone-vs-bird
- drone-vs-bird/images
- mav-vid
- mav-vid/images
-
Under each dataset folder copy the annotation files to its corresponding folder.
-
Copy all the images for each dataset to
<dataset-folder>/images
(see below for each dataset details). -
Create a
checkpoints
folder under the root of the project and download the weight files (see below) to this folder. -
Run the following script:
python tools/test.py <PATH TO CONFIG FILE> <PATH TO WEIGHT FILE>
Three datasets were in our benchmark. An example of each dataset is shown next, with (a) MAV-VID, (b) Drone-vs-Bird, (c) Anti-UAV Visual and (d) Anti-UAV Infrared.
This dataset consists on videos at different setups of single UAV. It contains videos captured from other drones, ground based surveillance cameras and handheld mobile devices. It can be downloaded in its kaggle site.
This dataset is composed of images with YOLO annotations divided in two directories: train and val. In order to
use this dataset in this benchmark kit, create the COCO annotation files for each data partition, using the
convert_mav_vid_to_coco.py, rename them to train.json and val.json and
move them to the data/mav-vid
directory created in the installation steps. Then, copy all images of both partitions to
the data/mav-vid/images
directory.
As part of the International Workshop on Small-Drone Surveillance, Detection and Counteraction techniques of IEEE AVSS 2020, the main goal of this challenge is to reduce the high false positive rates that vision-based methods usually suffer. This dataset comprises videos of UAV captured at long distances and often surrounded by small objects, such as birds.
The videos can be downloaded upon request and the annotations can be downloaded via their GitHub site.
The annotations follow a custom format, where a a .txt file is given for each video. Each annotation file has a line
for each video frame and the annotation is given in the format <Frame number> <Number of Objects> <x> <y> <width> <height> [<x> <y> ...]
.
To use this dataset in this benchmark, first you need to convert the video to images via video_to_images.py
and then you need to create the COCO annotations using the convert_drone_vs_bird_to_coco.py script.
Just as in the MAV-VID dataset, copy the images to the data/drone-vs-bird/images
directory and the annotations to data/drone-vs-bird
.
This multi-modal dataset comprises fully-annotated RGB and IR unaligned videos. Anti-UAV dataset is intended to provide a real-case benchmark for evaluating object tracker algorithms in the context of UAV. It contains recordings of 6 UAV models flying at different lightning and background conditions. This dataset can be downloaded in their website.
This dataset is also comprised of videos and custom annotations. Once downlaoded and extracted, the videos are organised
in folders containing the RGB and IR versions, with their corresponding JSON annotations. To convert this dataset to
images and COCO annotations, use the convert_anti_uav_to_coco.py script and copy
the images generated annotations to data/anti-uav
and the images to data/anti-uav/images
. The images folder will contain
the images for both modalities and the full (both modalities), RGB and IR annotations will be generated.
Dataset object size
Dataset | Size | Average Object Size |
---|---|---|
MAV-VID | Training: 53 videos (29,500 images) Validation: 11 videos (10,732 images) |
215 x 128 pxs (3.28% of image size) |
Drone-vs-Bird | Training: 61 videos (85,904 images) Validation: 16 videos (18,856 images) |
34 x 23 pxs (0.10% of image size) |
Anti-UAV | Training: 60 videos (149,478 images) Validation: 40 videos (37,016 images) |
RGB: 125 x 59 pxs (0.40% image size) IR: 52 x 29 pxs (0.50% image size) |
Location, size and image composition statistics
Four detection architectures were used for our analysis: Faster RCNN, SSD512, YOLOv3 and DETR. For the implementation details, refer to our paper. The results are as follows:
Dataset | Model | AP | AP0.5 | AP0.75 | APS | APM | APL | AR | ARS | ARM | ARL |
---|---|---|---|---|---|---|---|---|---|---|---|
MAV-VID | Faster RCNN log weights | 0.592 | 0.978 | 0.672 | 0.154 | 0.541 | 0.656 | 0.659 | 0.369 | 0.621 | 0.721 |
SSD512 log weights | 0.535 | 0.967 | 0.536 | 0.083 | 0.499 | 0.587 | 0.612 | 0.377 | 0.578 | 0.666 | |
YOLOv3 log weights | 0.537 | 0.963 | 0.542 | 0.066 | 0.471 | 0.636 | 0.612 | 0.208 | 0.559 | 0.696 | |
DETR log weights | 0.545 | 0.971 | 0.560 | 0.044 | 0.490 | 0.612 | 0.692 | 0.346 | 0.661 | 0.742 | |
Drone-vs-Bird | Faster RCNN log weights | 0.283 | 0.632 | 0.197 | 0.218 | 0.473 | 0.506 | 0.356 | 0.298 | 0.546 | 0.512 |
SSD512 log weights | 0.629 | 0.134 | 0.199 | 0.422 | 0.052 | 0.379 | 0.327 | 0.549 | 0.556 | ||
YOLOv3 log weights | 0.210 | 0.546 | 0.105 | 0.158 | 0.395 | 0.356 | 0.302 | 0.238 | 0.512 | 0.637 | |
DETR log weights | 0.251 | 0.667 | 0.123 | 0.190 | 0.444 | 0.533 | 0.473 | 0.425 | 0.631 | 0.550 | |
Anti-UAV-Full | Faster RCNN log weights | 0.612 | 0.974 | 0.701 | 0.517 | 0.619 | 0.737 | 0.666 | 0.601 | 0.670 | 0.778 |
SSD512 log weights | 0.613 | 0.982 | 0.697 | 0.527 | 0.619 | 0.712 | 0.678 | 0.616 | 0.682 | 0.780 | |
YOLOv3 log weights | 0.604 | 0.977 | 0.676 | 0.529 | 0.619 | 0.708 | 0.667 | 0.618 | 0.668 | 0.760 | |
DETR log weights | 0.586 | 0.977 | 0.648 | 0.509 | 0.589 | 0.692 | 0.649 | 0.598 | 0.649 | 0.752 | |
Anti-UAV-RGB | Faster RCNN log weights | 0.642 | 0.982 | 0.770 | 0.134 | 0.615 | 0.718 | 0.694 | 0.135 | 0.677 | 0.760 |
SSD512 log weights | 0.627 | 0.979 | 0.747 | 0.124 | 0.593 | 0.718 | 0.703 | 0.156 | 0.682 | 0.785 | |
YOLOv3 log weights | 0.617 | 0.986 | 0.717 | 0.143 | 0.595 | 0.702 | 0.684 | 0.181 | 0.664 | 0.758 | |
DETR log weights | 0.628 | 0.978 | 0.740 | 0.129 | 0.590 | 0.734 | 0.700 | 0.144 | 0.675 | 0.794 | |
Anti-UAV-IR | Faster RCNN log weights | 0.581 | 0.977 | 0.641 | 0.523 | 0.623 | - | 0.636 | 0.602 | 0.663 | - |
SSD512 log weights | 0.590 | 0.975 | 0.639 | 0.518 | 0.636 | - | 0.649 | 0.609 | 0.681 | - | |
YOLOv3 log weights | 0.591 | 0.976 | 0.643 | 0.533 | 0.638 | - | 0.651 | 0.620 | 0.675 | - | |
DETR log weights | 0.599 | 0.980 | 0.655 | 0.525 | 0.642 | - | 0.671 | 0.633 | 0.701 | - |
@article{uavbenchmark,
title={Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark},
author={Isaac-Medina, Brian K. S. and Poyser, Matt and Organisciak, Daniel and Willcocks, Chris G. and Breckon, Toby P. and Shum, Hubert P. H.},
journal = {arXiv},
year={2021}
}