Skip to content

Latest commit

 

History

History

object_detection

Unsupervised Domain Adaptation for Object Detection

Updates

  • 04/2022: Provide CycleGAN translated datasets.

Installation

Our code is based on Detectron latest(v0.6), please install it before usage.

The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to the official website of PyTorch and Detectron.

# create environment
conda create -n detection python=3.8.3
# activate environment
conda activate detection
# install pytorch 
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# install detectron
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# install other requirements
pip install -r requirements.txt

Dataset

Following datasets can be downloaded automatically:

You need to prepare following datasets manually if you want to use them:

Cityscapes, Foggy Cityscapes

  • Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
  • Unzip them under the directory like
object_detction/datasets/cityscapes
├── gtFine
├── leftImg8bit
├── leftImg8bit_foggy
└── ...

Then run

python prepare_cityscapes_to_voc.py 

This will automatically generate dataset in VOC format.

object_detction/datasets/cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
object_detction/datasets/foggy_cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages

Sim10k

  • Download Sim10k dataset from the following links: Sim10k. Particularly, we use repro_10k_images.tgz , repro_image_sets.tgz and repro_10k_annotations.tgz for Sim10k.
  • Extract the training set from repro_10k_images.tgz, repro_image_sets.tgz and repro_10k_annotations.tgz, then rename directory VOC2012/ to sim10k/.

After preparation, there should exist following files:

object_detction/datasets/
├── VOC2007
│   ├── Annotations
│   ├──ImageSets
│   └──JPEGImages
├── VOC2012
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── clipart
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── watercolor
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── comic
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
├── foggy_cityscapes_in_voc
│   ├── Annotations
│   ├── ImageSets
│   └── JPEGImages
└── sim10k
    ├── Annotations
    ├── ImageSets
    └── JPEGImages

Note: The above is a tutorial for using standard datasets. To use your own datasets, you need to convert them into corresponding format.

CycleGAN translated dataset

The following command use CycleGAN to translate VOC (with directory datasets/VOC2007 and datasets/VOC2012) to Clipart (with directory datasets/VOC2007_to_clipart and datasets/VOC2012_to_clipart).

mkdir datasets/VOC2007_to_clipart
cp -r datasets/VOC2007/* datasets/VOC2007_to_clipart
mkdir datasets/VOC2012_to_clipart
cp -r datasets/VOC2012/* datasets/VOC2012_to_clipart

CUDA_VISIBLE_DEVICES=0 python cycle_gan.py \
  -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
  --translated-source datasets/VOC2007_to_clipart datasets/VOC2012_to_clipart \
  --log logs/cyclegan_resnet9/translation/voc2clipart --netG resnet_9

You can also download and use datasets that are translated by us.

  • PASCAL_VOC to Clipart [07]+[12] (with directory datasets/VOC2007_to_clipart and datasets/VOC2012_to_clipart)
  • PASCAL_VOC to Comic [07]+[12] (with directory datasets/VOC2007_to_comic and datasets/VOC2012_to_comic)
  • PASCAL_VOC to WaterColor [07]+[12] (with directory datasets/VOC2007_to_watercolor and datasets/VOC2012_to_watercolor)
  • Cityscapes to Foggy Cityscapes [Part1] [Part2] [Part3] [Part4] (with directory datasets/cityscapes_to_foggy_cityscapes). Note that you need to use cat to merge the downloaded files.
  • Sim10k to Cityscapes (Car) [Download] (with directory datasets/sim10k2cityscapes_car).

Supported Methods

Supported methods include:

Experiment and Results

The shell files give the script to reproduce the benchmarks with specified hyper-parameters. The basic training pipeline is as follows.

The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data.

CUDA_VISIBLE_DEVICES=0 python source_only.py \
  --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
  --test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \
  OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart

Explanation of some arguments

  • --config-file: path to config file that specifies training hyper-parameters.
  • -s: a list that specifies source datasets, for each dataset you should pass in a (name, path) pair, in the above command, there are two source datasets VOC2007 and VOC2012.
  • -t: a list that specifies target datasets, same format as above.
  • --test: a list that specifiers test datasets, same format as above.

VOC->Clipart

AP AP50 AP75 aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor
Faster RCNN (ResNet101) Source 14.9 29.3 12.6 29.6 38.0 24.7 21.7 31.9 48.0 30.8 15.9 32.0 19.2 18.2 12.1 28.2 48.8 38.3 34.6 3.8 22.5 43.7 44.0
CycleGAN 20.0 37.7 18.3 37.1 41.9 29.9 26.5 40.9 65.1 37.8 23.8 40.7 48.9 12.7 14.4 27.8 63.0 55.1 40.1 8.0 30.7 54.1 55.7
D-adapt 24.8 49.0 21.5 56.4 63.2 42.3 40.9 45.3 77.0 48.7 25.4 44.3 58.4 31.4 24.5 47.1 75.3 69.3 43.5 27.9 34.1 60.7 64.0
RetinaNet Source 18.3 32.2 17.6 34.2 42.4 27.0 21.6 36.8 48.4 35.9 16.4 38.9 22.6 27.0 15.1 27.1 46.7 42.1 36.2 8.3 29.5 42.1 46.2
D-adapt 25.1 46.3 23.9 47.4 65.0 33.1 37.5 56.8 61.2 55.1 27.3 45.5 51.8 29.1 29.6 38.0 74.5 66.7 46.0 24.2 29.3 54.2 53.8

VOC->WaterColor

AP AP50 AP75 bicycle bird car cat dog person
Faster RCNN (ResNet101) 23.0 45.9 18.5 71.1 48.3 48.6 23.7 23.3 60.3
CycleGAN 24.9 50.8 22.4 75.8 52.1 49.8 30.1 33.4 63.6
D-adapt 28.5 57.5 23.6 77.4 54.0 52.8 43.9 48.1 68.9
Target 23.8 51.3 17.4 48.5 54.7 41.3 36.2 52.6 74.6

VOC->Comic

AP AP50 AP75 bicycle bird car cat dog person
Faster RCNN (ResNet101) 13.0 25.5 11.4 33.0 15.8 28.9 16.8 19.6 39.0
CycleGAN 16.9 34.6 14.2 28.1 25.7 37.7 28.0 33.8 54.1
D-adapt 20.8 41.1 18.5 49.4 25.7 43.3 36.9 32.7 58.5
Target 21.9 44.6 16.0 40.7 32.3 38.3 43.9 41.3 71.0

Cityscapes->Foggy Cityscapes

AP AP50 AP75 bicycle bus car motorcycle person rider train truck
Faster RCNN (VGG16) Source 14.3 25.9 13.2 33.6 27.0 40.0 22.3 31.3 38.5 2.3 12.2
CycleGAN 22.5 41.6 20.7 46.5 41.5 62.0 33.8 45.0 54.5 21.7 27.7
D-adapt 19.4 38.1 17.5 42.0 36.8 58.1 32.2 43.1 51.8 14.6 26.3
Target 24.0 45.3 21.3 45.9 47.4 67.3 39.7 49.0 53.2 30.0 29.6
Faster RCNN (ResNet101) Source 18.8 33.3 19.0 36.1 34.5 43.8 24.0 36.3 39.9 29.1 22.8
CycleGAN 22.9 41.8 21.9 42.0 44.5 57.6 36.3 40.9 48.0 30.8 34.3
D-adapt 22.7 42.4 21.6 41.8 44.4 56.6 31.4 41.8 48.6 42.3 32.4
Target 25.5 45.3 24.3 41.9 53.2 63.4 36.1 42.6 47.9 42.4 35.3

Sim10k->Cityscapes Car

AP AP50 AP75
Faster RCNN (VGG16) Source 24.8 43.4 23.6
CycleGAN 29.3 51.9 28.6
D-adapt 23.6 48.5 18.7
Target 24.8 43.4 23.6
Faster RCNN (ResNet101) Source 24.6 44.4 23.0
CycleGAN 26.5 47.4 24.0
D-adapt 27.4 51.9 25.7
Target 24.6 44.4 23.0

Visualization

We provide code for visualization in visualize.py. For example, suppose you have trained the source only model of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the detector on Clipart.

CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \
  --test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \
  MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth

Explanation of some arguments

  • --test: a list that specifiers test datasets for visualization.
  • --save-path: where to save visualization results.
  • MODEL.WEIGHTS: path to the model.

TODO

Support methods: SWDA, Global/Local Alignment

Citation

If you use these methods in your research, please consider citing.

@inproceedings{jiang2021decoupled,
  title     = {Decoupled Adaptation for Cross-Domain Object Detection},
  author    = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long},
  booktitle = {ICLR},
  year      = {2022}
}

@inproceedings{CycleGAN,
    title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
    author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
    booktitle={ICCV},
    year={2017}
}