This is the pytorch implementation of our paper "Detection in Crowded Scenes: One Proposal, Multiple Predictions", https://arxiv.org/abs/2003.09163, published in CVPR 2020.
Our method aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of instances that might be highly overlapped rather than a single one in previous proposal-based frameworks. With this scheme, the predictions of nearby proposals are expected to infer the same set of instances, rather than distinguishing individuals, which is much easy to be learned. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects.
The network structure and results are shown here:
If you use the code in your research, please cite:
@InProceedings{Chu_2020_CVPR,
author = {Chu, Xuangeng and Zheng, Anlin and Zhang, Xiangyu and Sun, Jian},
title = {Detection in Crowded Scenes: One Proposal, Multiple Predictions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
-
Requirements:
- python 3.6.8, pytorch 1.5.0, torchvision 0.6.0, cuda 10.1
-
CrowdHuman data:
- CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The dataset can be downloaded from http://www.crowdhuman.org/. The path of the dataset is set in
config.py
.
- CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The dataset can be downloaded from http://www.crowdhuman.org/. The path of the dataset is set in
-
Steps to run:
- Step1: training. More training and testing settings can be set in
config.py
.
cd tools python3 train.py -md rcnn_fpn_baseline
- Step2: testing. If you have four GPUs, you can use
-d 0-3
to use all of your GPUs. The result json file will be evaluated automatically.
cd tools python3 test.py -md rcnn_fpn_baseline -r 40
- Step3: evaluating json, inference one picture and visulization json file.
-r
means resume epoch,-n
means number of visulization pictures.
cd tools python3 eval_json.py -f your_json_path.json python3 inference.py -md rcnn_fpn_baseline -r 40 -i your_image_path.png python3 visulize_json.py -f your_json_path.json -n 3
- Step1: training. More training and testing settings can be set in
We use MegEngine in the research (https://github.com/megvii-model/CrowdDetection), this proiect is a re-implementation based on Pytorch.
We use pre-trained model from MegEngine Model Hub and convert this model to pytorch. You can get this model from here.
Model | Top1 acc | Top5 acc |
---|---|---|
ResNet50 | 76.254 | 93.056 |
All models are based on ResNet-50 FPN.
AP | MR | JI | Model | |
---|---|---|---|---|
RCNN FPN Baseline (convert from MegEngine) | 0.8718 | 0.4239 | 0.7949 | rcnn_fpn_baseline_mge.pth |
RCNN EMD Simple (convert from MegEngine) | 0.9052 | 0.4196 | 0.8209 | rcnn_emd_simple_mge.pth |
RCNN EMD with RM (convert from MegEngine) | 0.9097 | 0.4102 | 0.8271 | rcnn_emd_refine_mge.pth |
RCNN FPN Baseline (trained with PyTorch) | 0.8665 | 0.4243 | 0.7949 | rcnn_fpn_baseline.pth |
RCNN EMD Simple (trained with PyTorch) | 0.8997 | 0.4167 | 0.8225 | rcnn_emd_simple.pth |
RCNN EMD with RM (trained with PyTorch) | 0.9030 | 0.4128 | 0.8263 | rcnn_emd_refine.pth |
RetinaNet FPN Baseline | 0.8188 | 0.5644 | 0.7316 | retina_fpn_baseline.pth |
RetinaNet EMD Simple | 0.8292 | 0.5481 | 0.7393 | retina_emd_simple.pth |
If you have any questions, please do not hesitate to contact Xuangeng Chu ([email protected]).