This repo hosts the code for implementing UMOP (Unified Multi-level Optimization Paradigm).
In object detection, multi-level prediction (e.g., FPN, YOLO) and resampling skills (e.g., focal loss, ATSS) have drastically improved one-stage detector performance. However, how to improve the performance by optimizing the feature pyramid level-by-level remains unexplored. We find that, during training, the ratio of positive over negative samples varies across pyramid levels (Level Imbalance), which is not addressed by current one-stage detectors.
To mediate the influence of level imbalance, we propose a Unified Multi-level Optimization Paradigm (UMOP) consisting of two components: 1) an independent classification loss supervising each pyramid level with individual resampling considerations; 2) a progressive hard-case mining loss defining all losses across the pyramid levels without extra level-wise settings. With UMOP as a plug-and-play scheme, modern one-stage detectors can attain a ~1.5 AP improvement with fewer training iterations and no additional computation overhead. Our best model achieves 55.1 AP on COCO test-dev.
- 2021.09.22 Bugfix for num_class not equal to 80, thanks CarryHJR.
- 2021.09.17 We committed the coco pretrain model and train logs.
- 2021.09.14 This repo has been committed, based on MMDetection and Swin-Transformer-Object-Detection.
- The installation is the same as MMDetection and Swin-Transformer-Object-Detection.
- Please check get_started.md for installation, our recommended installation is
Pytorch=1.7.1, torcivision=0.8.2, mmcv=1.3.11
- Please install
timm
andapex
for swin backbones,timm=0.4.12
is recommended. - To install
apex
, please run:git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- The pretrained
Swin
backbone could be obtained at Swin-Transformer. We also provide a third party pack (Google Drive, Baidu with access codeio5i
). Please download and unzip it. - If you download from the official link, please move these models according to the config file (Maybe you should rename them).
For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU), except that the largest backbone (Swin-L) is trained on 8 Nvidia P40 GPUs with apex.
Backbone | DCN | MS train |
MS test |
Lr schd |
box AP (val) |
box AP (test-dev) |
Download |
---|---|---|---|---|---|---|---|
R-50 | N | N | N | 1.5x | 40.4 | 40.5 | model | log | JSON |
R-101 | N | N | N | 1.5x | 42.1 | 42.3 | model | log | JSON |
R-101 | Y | N | N | 1.5x | 45.2 | 45.4 | model | log | JSON |
R-101 | Y | Y | N | 1.5x | 47.6 | 47.7 | model | log | JSON |
X-101-64x4d | Y | Y | N | 1.5x | 48.8 | 49.1 | model | log | JSON |
R2-101 | Y | Y | N | 2x | 50.0 | 50.3 | model | log | JSON |
Swin-S | N | Y | N | 2x | 49.9 | 50.3 | model | log | JSON |
Swin-S | N | Y | Y | 2x | 51.9 | 52.3 | model | - | JSON |
Swin-B | N | Y | N | 2x | 51.6 | 51.9 | model | log | JSON |
Swin-B | N | Y | Y | 2x | 53.4 | 53.9 | model | - | JSON |
Swin-L | N | Y | N | 2x | 52.8 | 53.1 | model | log | JSON |
Swin-L | N | Y | Y | 2x | 54.7 | 55.1 | model | - | JSON |