Skip to content

Latest commit

 

History

History
43 lines (37 loc) · 7.54 KB

README.md

File metadata and controls

43 lines (37 loc) · 7.54 KB

Unified Multi-level Optimization Paradigm

This repo hosts the code for implementing UMOP (Unified Multi-level Optimization Paradigm).

Introduction

In object detection, multi-level prediction (e.g., FPN, YOLO) and resampling skills (e.g., focal loss, ATSS) have drastically improved one-stage detector performance. However, how to improve the performance by optimizing the feature pyramid level-by-level remains unexplored. We find that, during training, the ratio of positive over negative samples varies across pyramid levels (Level Imbalance), which is not addressed by current one-stage detectors.

To mediate the influence of level imbalance, we propose a Unified Multi-level Optimization Paradigm (UMOP) consisting of two components: 1) an independent classification loss supervising each pyramid level with individual resampling considerations; 2) a progressive hard-case mining loss defining all losses across the pyramid levels without extra level-wise settings. With UMOP as a plug-and-play scheme, modern one-stage detectors can attain a ~1.5 AP improvement with fewer training iterations and no additional computation overhead. Our best model achieves 55.1 AP on COCO test-dev.

Updates

Installation

  • The installation is the same as MMDetection and Swin-Transformer-Object-Detection.
  • Please check get_started.md for installation, our recommended installation is Pytorch=1.7.1, torcivision=0.8.2, mmcv=1.3.11
  • Please install timm and apex for swin backbones, timm=0.4.12 is recommended.
  • To install apex, please run:
    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    
  • The pretrained Swin backbone could be obtained at Swin-Transformer. We also provide a third party pack (Google Drive, Baidu with access code io5i). Please download and unzip it.
  • If you download from the official link, please move these models according to the config file (Maybe you should rename them).

Main Results

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU), except that the largest backbone (Swin-L) is trained on 8 Nvidia P40 GPUs with apex.

Backbone DCN MS
train
MS
test
Lr
schd
box AP
(val)
box AP
(test-dev)
Download
R-50 N N N 1.5x 40.4 40.5 model | log | JSON
R-101 N N N 1.5x 42.1 42.3 model | log | JSON
R-101 Y N N 1.5x 45.2 45.4 model | log | JSON
R-101 Y Y N 1.5x 47.6 47.7 model | log | JSON
X-101-64x4d Y Y N 1.5x 48.8 49.1 model | log | JSON
R2-101 Y Y N 2x 50.0 50.3 model | log | JSON
Swin-S N Y N 2x 49.9 50.3 model | log | JSON
Swin-S N Y Y 2x 51.9 52.3 model | - | JSON
Swin-B N Y N 2x 51.6 51.9 model | log | JSON
Swin-B N Y Y 2x 53.4 53.9 model | - | JSON
Swin-L N Y N 2x 52.8 53.1 model | log | JSON
Swin-L N Y Y 2x 54.7 55.1 model | - | JSON