diff --git a/.gitignore b/.gitignore
index 892731d7..c9a653c6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -116,9 +116,18 @@ data
 *.log.json
 docs/modelzoo_statistics.md
 mmdet/.mim
+
 work_dirs/
+work_dirs
+tmp/
+bstool/
+wwtool/
+
 
 # Pytorch
 *.pth
 *.py~
 *.sh~
+
+# local history
+.history
\ No newline at end of file
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
deleted file mode 100644
index 6ea250c8..00000000
--- a/.pre-commit-config.yaml
+++ /dev/null
@@ -1,50 +0,0 @@
-repos:
-  - repo: https://github.com/PyCQA/flake8
-    rev: 5.0.4
-    hooks:
-      - id: flake8
-  - repo: https://github.com/PyCQA/isort
-    rev: 5.11.5
-    hooks:
-      - id: isort
-  - repo: https://github.com/pre-commit/mirrors-yapf
-    rev: v0.32.0
-    hooks:
-      - id: yapf
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.3.0
-    hooks:
-      - id: trailing-whitespace
-      - id: check-yaml
-      - id: end-of-file-fixer
-      - id: requirements-txt-fixer
-      - id: double-quote-string-fixer
-      - id: check-merge-conflict
-      - id: fix-encoding-pragma
-        args: ["--remove"]
-      - id: mixed-line-ending
-        args: ["--fix=lf"]
-  - repo: https://github.com/codespell-project/codespell
-    rev: v2.2.1
-    hooks:
-      - id: codespell
-  - repo: https://github.com/executablebooks/mdformat
-    rev: 0.7.9
-    hooks:
-      - id: mdformat
-        args: ["--number"]
-        additional_dependencies:
-          - mdformat-openmmlab
-          - mdformat_frontmatter
-          - linkify-it-py
-  - repo: https://github.com/myint/docformatter
-    rev: v1.3.1
-    hooks:
-      - id: docformatter
-        args: ["--in-place", "--wrap-descriptions", "79"]
-  - repo: https://github.com/open-mmlab/pre-commit-hooks
-    rev: v0.2.0  # Use the ref you want to point at
-    hooks:
-      - id: check-algo-readme
-      - id: check-copyright
-        args: ["mmdet"]  # replace the dir_to_check with your expected directory to check
diff --git a/.readthedocs.yml b/.readthedocs.yml
deleted file mode 100644
index 6cfbf5d3..00000000
--- a/.readthedocs.yml
+++ /dev/null
@@ -1,9 +0,0 @@
-version: 2
-
-formats: all
-
-python:
-  version: 3.7
-  install:
-    - requirements: requirements/docs.txt
-    - requirements: requirements/readthedocs.txt
diff --git a/README.md b/README.md
index e0cd4a15..6b53c08b 100644
--- a/README.md
+++ b/README.md
@@ -1,395 +1,92 @@
 <div align="center">
-  <img src="resources/mmdet-logo.png" width="600"/>
-  <div>&nbsp;</div>
-  <div align="center">
-    <b><font size="5">OpenMMLab website</font></b>
-    <sup>
-      <a href="https://openmmlab.com">
-        <i><font size="4">HOT</font></i>
-      </a>
-    </sup>
-    &nbsp;&nbsp;&nbsp;&nbsp;
-    <b><font size="5">OpenMMLab platform</font></b>
-    <sup>
-      <a href="https://platform.openmmlab.com">
-        <i><font size="4">TRY IT OUT</font></i>
-      </a>
-    </sup>
-  </div>
-  <div>&nbsp;</div>
-
-[![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet)
-[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/)
-[![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions)
-[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection)
-[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE)
-[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues)
-[![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues)
-
-[📘Documentation](https://mmdetection.readthedocs.io/en/stable/) |
-[🛠️Installation](https://mmdetection.readthedocs.io/en/stable/get_started.html) |
-[👀Model Zoo](https://mmdetection.readthedocs.io/en/stable/model_zoo.html) |
-[🆕Update News](https://mmdetection.readthedocs.io/en/stable/changelog.html) |
-[🚀Ongoing Projects](https://github.com/open-mmlab/mmdetection/projects) |
-[🤔Reporting Issues](https://github.com/open-mmlab/mmdetection/issues/new/choose)
-
-</div>
-
-<div align="center">
-
-English | [简体中文](README_zh-CN.md)
-
-</div>
-
-## Introduction
-
-MMDetection is an open source object detection toolbox based on PyTorch. It is
-a part of the [OpenMMLab](https://openmmlab.com/) project.
-
-The master branch works with **PyTorch 1.5+**.
-
-<img src="https://user-images.githubusercontent.com/12907710/137271636-56ba1cd2-b110-4812-8221-b4c120320aa9.png"/>
-
-<details open>
-<summary>Major features</summary>
-
-- **Modular Design**
-
-  We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules.
-
-- **Support of multiple frameworks out of box**
-
-  The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc.
-
-- **High efficiency**
-
-  All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including [Detectron2](https://github.com/facebookresearch/detectron2), [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) and [SimpleDet](https://github.com/TuSimple/simpledet).
-
-- **State of the art**
-
-  The toolbox stems from the codebase developed by the *MMDet* team, who won [COCO Detection Challenge](http://cocodataset.org/#detection-leaderboard) in 2018, and we keep pushing it forward.
-
-</details>
-
-Apart from MMDetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research, which is heavily depended on by this toolbox.
-
-## What's New
-
-### 💎 Stable version
-
-**2.28.1** was released in 1/2/2023:
-
-- Support Objects365 Dataset, and Separated and Occluded COCO metric
-- Support acceleration of RetinaNet and SSD on Ascend
-- Deprecate the support of Python 3.6 and fix some bugs of 2.28.0
-
-Please refer to [changelog.md](docs/en/changelog.md) for details and release history.
-
-For compatibility changes between different versions of MMDetection, please refer to [compatibility.md](docs/en/compatibility.md).
-
-### 🌟 Preview of 3.x version
-
-#### Highlight
-
-We are excited to announce our latest work on real-time object recognition tasks, **RTMDet**, a family of fully convolutional single-stage detectors. RTMDet not only achieves the best parameter-accuracy trade-off on object detection from tiny to extra-large model sizes but also obtains new state-of-the-art performance on instance segmentation and rotated object detection tasks. Details can be found in the [technical report](https://arxiv.org/abs/2212.07784). Pre-trained models are [here](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet).
-
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/real-time-instance-segmentation-on-mscoco)](https://paperswithcode.com/sota/real-time-instance-segmentation-on-mscoco?p=rtmdet-an-empirical-study-of-designing-real)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=rtmdet-an-empirical-study-of-designing-real)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-hrsc2016)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-hrsc2016?p=rtmdet-an-empirical-study-of-designing-real)
-
-| Task                     | Dataset | AP                                   | FPS(TRT FP16 BS1 3090) |
-| ------------------------ | ------- | ------------------------------------ | ---------------------- |
-| Object Detection         | COCO    | 52.8                                 | 322                    |
-| Instance Segmentation    | COCO    | 44.6                                 | 188                    |
-| Rotated Object Detection | DOTA    | 78.9(single-scale)/81.3(multi-scale) | 121                    |
-
-<div align=center>
-<img src="https://user-images.githubusercontent.com/12907710/208044554-1e8de6b5-48d8-44e4-a7b5-75076c7ebb71.png"/>
+	<h1>3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions</h1>
+	<a href="https://arxiv.org/abs/2404.04823"><img src='https://img.shields.io/badge/arXiv-2404.04823-red?logo=arXiv' alt='arXiv'></a>
+	<a href=""><img src='https://img.shields.io/badge/python-3.8-blue.svg' alt='Python'></a>
+	<a href=""><img src='https://img.shields.io/badge/License-Apache%202.0-yellow' alt='Python'></a>
 </div>
 
-A brand new version of **MMDetection v3.0.0rc5** was released in 26/12/2022:
+<img src="docs/fig-overview.jpg">
 
-- Support [RTMDet](https://arxiv.org/abs/2212.07784) instance segmentation models. The technical report of RTMDet is on [arxiv](https://arxiv.org/abs/2212.07784)
-- Support SSHContextModule in paper [SSH: Single Stage Headless Face Detector](https://arxiv.org/abs/1708.03979)
+## 📜 Introduction
 
-Find more new features in [3.x branch](https://github.com/open-mmlab/mmdetection/tree/3.x). Issues and PRs are welcome!
+This repository contains the official codes for MLS-BRN (CVPR 2024), our multi-level supervised building reconstruction network that can flexibly utilize training samples with different annotation levels.
 
-## Installation
+- We design MLS-BRN, a multi-level supervised building reconstruction network, which consists of new tasks and modules to enhance the relation between different components of a building instance and alleviate the demand on 3D annotations.
+- We propose a multi-level training strategy that enables the training of MLS-BRN with different supervision levels to further improve the 3D reconstruction performance.
+- We extend the monocular building reconstruction datasets to more cities. Comprehensive experiments under different settings demonstrate the potential of MLS-BRN in large-scale cross-city scenarios.
 
-Please refer to [Installation](docs/en/get_started.md/#Installation) for installation instructions.
+Please check out our [paper](https://openaccess.thecvf.com/content/CVPR2024/html/Li_3D_Building_Reconstruction_from_Monocular_Remote_Sensing_Images_with_Multi-level_CVPR_2024_paper.html) for further details.
 
-## Getting Started
+## 🔧Installation
 
-Please see [get_started.md](docs/en/get_started.md) for the basic usage of MMDetection. We provide [colab tutorial](demo/MMDet_Tutorial.ipynb) and [instance segmentation colab tutorial](demo/MMDet_InstanceSeg_Tutorial.ipynb), and other tutorials for:
+We inherit the environement of [BONAI](https://github.com/jwwangchn/BONAI/tree/master), and here is a reference to deploy it:
 
-- [with existing dataset](docs/en/1_exist_data_model.md)
-- [with new dataset](docs/en/2_new_data_model.md)
-- [with existing dataset_new_model](docs/en/3_exist_data_new_model.md)
-- [learn about configs](docs/en/tutorials/config.md)
-- [customize_datasets](docs/en/tutorials/customize_dataset.md)
-- [customize data pipelines](docs/en/tutorials/data_pipeline.md)
-- [customize_models](docs/en/tutorials/customize_models.md)
-- [customize runtime settings](docs/en/tutorials/customize_runtime.md)
-- [customize_losses](docs/en/tutorials/customize_losses.md)
-- [finetuning models](docs/en/tutorials/finetune.md)
-- [export a model to ONNX](docs/en/tutorials/pytorch2onnx.md)
-- [export ONNX to TRT](docs/en/tutorials/onnx2tensorrt.md)
-- [weight initialization](docs/en/tutorials/init_cfg.md)
-- [how to xxx](docs/en/tutorials/how_to.md)
+```bash
+# create & activate environment
+conda create -n mlsbrn python=3.8
+conda activate mlsbrn
 
-## Overview of Benchmark and Model Zoo
+# install pytorch-1.11.0 
+pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
 
-Results and models are available in the [model zoo](docs/en/model_zoo.md).
+# install dependency packages
+pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
+cd MLS-BRN/
+pip install -v -e .
+pip install yapf==0.40.1
 
-<div align="center">
-  <b>Architectures</b>
-</div>
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Object Detection</b>
-      </td>
-      <td>
-        <b>Instance Segmentation</b>
-      </td>
-      <td>
-        <b>Panoptic Segmentation</b>
-      </td>
-      <td>
-        <b>Other</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-        <ul>
-            <li><a href="configs/fast_rcnn">Fast R-CNN (ICCV'2015)</a></li>
-            <li><a href="configs/faster_rcnn">Faster R-CNN (NeurIPS'2015)</a></li>
-            <li><a href="configs/rpn">RPN (NeurIPS'2015)</a></li>
-            <li><a href="configs/ssd">SSD (ECCV'2016)</a></li>
-            <li><a href="configs/retinanet">RetinaNet (ICCV'2017)</a></li>
-            <li><a href="configs/cascade_rcnn">Cascade R-CNN (CVPR'2018)</a></li>
-            <li><a href="configs/yolo">YOLOv3 (ArXiv'2018)</a></li>
-            <li><a href="configs/cornernet">CornerNet (ECCV'2018)</a></li>
-            <li><a href="configs/grid_rcnn">Grid R-CNN (CVPR'2019)</a></li>
-            <li><a href="configs/guided_anchoring">Guided Anchoring (CVPR'2019)</a></li>
-            <li><a href="configs/fsaf">FSAF (CVPR'2019)</a></li>
-            <li><a href="configs/centernet">CenterNet (ArXiv'2019)</a></li>
-            <li><a href="configs/libra_rcnn">Libra R-CNN (CVPR'2019)</a></li>
-            <li><a href="configs/tridentnet">TridentNet (ICCV'2019)</a></li>
-            <li><a href="configs/fcos">FCOS (ICCV'2019)</a></li>
-            <li><a href="configs/reppoints">RepPoints (ICCV'2019)</a></li>
-            <li><a href="configs/free_anchor">FreeAnchor (NeurIPS'2019)</a></li>
-            <li><a href="configs/cascade_rpn">CascadeRPN (NeurIPS'2019)</a></li>
-            <li><a href="configs/foveabox">Foveabox (TIP'2020)</a></li>
-            <li><a href="configs/double_heads">Double-Head R-CNN (CVPR'2020)</a></li>
-            <li><a href="configs/atss">ATSS (CVPR'2020)</a></li>
-            <li><a href="configs/nas_fcos">NAS-FCOS (CVPR'2020)</a></li>
-            <li><a href="configs/centripetalnet">CentripetalNet (CVPR'2020)</a></li>
-            <li><a href="configs/autoassign">AutoAssign (ArXiv'2020)</a></li>
-            <li><a href="configs/sabl">Side-Aware Boundary Localization (ECCV'2020)</a></li>
-            <li><a href="configs/dynamic_rcnn">Dynamic R-CNN (ECCV'2020)</a></li>
-            <li><a href="configs/detr">DETR (ECCV'2020)</a></li>
-            <li><a href="configs/paa">PAA (ECCV'2020)</a></li>
-            <li><a href="configs/vfnet">VarifocalNet (CVPR'2021)</a></li>
-            <li><a href="configs/sparse_rcnn">Sparse R-CNN (CVPR'2021)</a></li>
-            <li><a href="configs/yolof">YOLOF (CVPR'2021)</a></li>
-            <li><a href="configs/yolox">YOLOX (ArXiv'2021)</a></li>
-            <li><a href="configs/deformable_detr">Deformable DETR (ICLR'2021)</a></li>
-            <li><a href="configs/tood">TOOD (ICCV'2021)</a></li>
-            <li><a href="configs/ddod">DDOD (ACM MM'2021)</a></li>
-      </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/mask_rcnn">Mask R-CNN (ICCV'2017)</a></li>
-          <li><a href="configs/cascade_rcnn">Cascade Mask R-CNN (CVPR'2018)</a></li>
-          <li><a href="configs/ms_rcnn">Mask Scoring R-CNN (CVPR'2019)</a></li>
-          <li><a href="configs/htc">Hybrid Task Cascade (CVPR'2019)</a></li>
-          <li><a href="configs/yolact">YOLACT (ICCV'2019)</a></li>
-          <li><a href="configs/instaboost">InstaBoost (ICCV'2019)</a></li>
-          <li><a href="configs/solo">SOLO (ECCV'2020)</a></li>
-          <li><a href="configs/point_rend">PointRend (CVPR'2020)</a></li>
-          <li><a href="configs/detectors">DetectoRS (CVPR'2021)</a></li>
-          <li><a href="configs/solov2">SOLOv2 (NeurIPS'2020)</a></li>
-          <li><a href="configs/scnet">SCNet (AAAI'2021)</a></li>
-          <li><a href="configs/queryinst">QueryInst (ICCV'2021)</a></li>
-          <li><a href="configs/mask2former">Mask2Former (CVPR'2022)</a></li>
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/panoptic_fpn">Panoptic FPN (CVPR'2019)</a></li>
-          <li><a href="configs/maskformer">MaskFormer (NeurIPS'2021)</a></li>
-          <li><a href="configs/mask2former">Mask2Former (CVPR'2022)</a></li>
-        </ul>
-      </td>
-      <td>
-        </ul>
-          <li><b>Contrastive Learning</b></li>
-        <ul>
-        <ul>
-          <li><a href="configs/selfsup_pretrain">SwAV (NeurIPS'2020)</a></li>
-          <li><a href="configs/selfsup_pretrain">MoCo (CVPR'2020)</a></li>
-          <li><a href="configs/selfsup_pretrain">MoCov2 (ArXiv'2020)</a></li>
-        </ul>
-        </ul>
-        </ul>
-          <li><b>Distillation</b></li>
-        <ul>
-        <ul>
-          <li><a href="configs/ld">Localization Distillation (CVPR'2022)</a></li>
-          <li><a href="configs/lad">Label Assignment Distillation (WACV'2022)</a></li>
-        </ul>
-        </ul>
-      </ul>
-        <li><b>Receptive Field Search</b></li>
-      <ul>
-        <ul>
-          <li><a href="configs/rfnext">RF-Next (TPAMI'2022)</a></li>
-        </ul>
-        </ul>
-      </ul>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
+# install wwtool package for evaluate code
+git clone https://github.com/jwwangchn/wwtool.git
+cd wwtool
+python setup.py develop
+# install bstool package for evaluate code
+git clone https://github.com/Hoteryoung/bstool.git
+cd bstool
+git pull origin modify_for_loft-foa-fro
+git checkout modify_for_loft-foa-fro
+python setup.py develop
+```
 
-<div align="center">
-  <b>Components</b>
-</div>
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Backbones</b>
-      </td>
-      <td>
-        <b>Necks</b>
-      </td>
-      <td>
-        <b>Loss</b>
-      </td>
-      <td>
-        <b>Common</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-      <ul>
-        <li>VGG (ICLR'2015)</li>
-        <li>ResNet (CVPR'2016)</li>
-        <li>ResNeXt (CVPR'2017)</li>
-        <li>MobileNetV2 (CVPR'2018)</li>
-        <li><a href="configs/hrnet">HRNet (CVPR'2019)</a></li>
-        <li><a href="configs/empirical_attention">Generalized Attention (ICCV'2019)</a></li>
-        <li><a href="configs/gcnet">GCNet (ICCVW'2019)</a></li>
-        <li><a href="configs/res2net">Res2Net (TPAMI'2020)</a></li>
-        <li><a href="configs/regnet">RegNet (CVPR'2020)</a></li>
-        <li><a href="configs/resnest">ResNeSt (CVPRW'2022)</a></li>
-        <li><a href="configs/pvt">PVT (ICCV'2021)</a></li>
-        <li><a href="configs/swin">Swin (ICCV'2021)</a></li>
-        <li><a href="configs/pvt">PVTv2 (CVMJ'2022)</a></li>
-        <li><a href="configs/resnet_strikes_back">ResNet strikes back (NeurIPSW'2021)</a></li>
-        <li><a href="configs/efficientnet">EfficientNet (ICML'2019)</a></li>
-        <li><a href="configs/convnext">ConvNeXt (CVPR'2022)</a></li>
-      </ul>
-      </td>
-      <td>
-      <ul>
-        <li><a href="configs/pafpn">PAFPN (CVPR'2018)</a></li>
-        <li><a href="configs/nas_fpn">NAS-FPN (CVPR'2019)</a></li>
-        <li><a href="configs/carafe">CARAFE (ICCV'2019)</a></li>
-        <li><a href="configs/fpg">FPG (ArXiv'2020)</a></li>
-        <li><a href="configs/groie">GRoIE (ICPR'2020)</a></li>
-        <li><a href="configs/dyhead">DyHead (CVPR'2021)</a></li>
-      </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/ghm">GHM (AAAI'2019)</a></li>
-          <li><a href="configs/gfl">Generalized Focal Loss (NeurIPS'2020)</a></li>
-          <li><a href="configs/seesaw_loss">Seasaw Loss (CVPR'2021)</a></li>
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/faster_rcnn/faster_rcnn_r50_fpn_ohem_1x_coco.py">OHEM (CVPR'2016)</a></li>
-          <li><a href="configs/gn">Group Normalization (ECCV'2018)</a></li>
-          <li><a href="configs/dcn">DCN (ICCV'2017)</a></li>
-          <li><a href="configs/dcnv2">DCNv2 (CVPR'2019)</a></li>
-          <li><a href="configs/gn+ws">Weight Standardization (ArXiv'2019)</a></li>
-          <li><a href="configs/pisa">Prime Sample Attention (CVPR'2020)</a></li>
-          <li><a href="configs/strong_baselines">Strong Baselines (CVPR'2021)</a></li>
-          <li><a href="configs/resnet_strikes_back">Resnet strikes back (NeurIPSW'2021)</a></li>
-          <li><a href="configs/rfnext">RF-Next (TPAMI'2022)</a></li>
-        </ul>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
+## ⬇️Data Preparation
 
-Some other methods are also supported in [projects using MMDetection](./docs/en/projects.md).
+Please download [BONAI](https://github.com/jwwangchn/BONAI/tree/master) and our proposed [dataset](https://opendatalab.com/OpenDataLab/MLS-BRN)，then put the datasets into one directory and specify the directory as `data_root` variable in `configs/_base_/datasets/bonai_instance_hfm_ssl.py`.
 
-## FAQ
+## 🔥Train & Test
 
-Please refer to [FAQ](docs/en/faq.md) for frequently asked questions.
+The config files are defined in ``configs/_base_/models/bonai_loft_foahfm_r50_fpn_basic.py`` and `configs/_base_/schedules/schedule_2x_bonai.py`. We provide shell scripts for training and test in `tools/`.
 
-## Contributing
+To train or test the model in different environments, modify the given shell script and config files accordingly.
 
-We appreciate all contributions to improve MMDetection. Ongoing projects can be found in out [GitHub Projects](https://github.com/open-mmlab/mmdetection/projects). Welcome community users to participate in these projects. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
+Note: you need to specify the dataset as `CITY` variable in `tools/dist_test.sh` when testing.
 
-## Acknowledgement
+```bash
+cd MLS-BRN/
+# for non-slurm system
+# train
+./tools/dist_train.sh loft_foahfm_ssl loft_foahfm_r50_fpn_2x_bonai_ssl
+# resume training from a checkpoint
+./tools/dist_train.sh loft_foahfm loft_foahfm_r50_fpn_2x_bonai --resume-from='path to checkpoint'
+# test & evaluate, <timestamp> refers to the timestamp of the training results folder in ./work_dirs/
+./tools/dist_test.sh loft_foahfm_r50_fpn_2x_bonai_ssl <timestamp>
 
-MMDetection is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.
-We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.
+# for slurm system
+# train
+./tools/slurm_train.sh loft_foahfm loft_foahfm_r50_fpn_2x_bonai
+# test & evaluate 
+./tools/slurm_test.sh loft_foahfm_r50_fpn_2x_bonai_ssl <timestamp>
+```
 
-## Citation
+## 😊Citation
 
-If you use this toolbox or benchmark in your research, please cite this project.
+If you use our dataset, codebase or models in your research, please consider cite.
 
 ```
-@article{mmdetection,
-  title   = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark},
-  author  = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and
-             Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and
-             Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and
-             Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and
-             Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong
-             and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua},
-  journal= {arXiv preprint arXiv:1906.07155},
-  year={2019}
+@InProceedings{Li_2024_CVPR,
+    author    = {Li, Weijia and Yang, Haote and Hu, Zhenghao and Zheng, Juepeng and Xia, Gui-Song and He, Conghui},
+    title     = {3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month     = {June},
+    year      = {2024},
+    pages     = {27728-27737}
 }
 ```
-
-## License
-
-This project is released under the [Apache 2.0 license](LICENSE).
-
-## Projects in OpenMMLab
-
-- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab foundational library for training deep learning models.
-- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
-- [MMEval](https://github.com/open-mmlab/mmeval): A unified evaluation library for multiple machine learning libraries.
-- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages.
-- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
-- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
-- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
-- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
-- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
-- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
-- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
-- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
-- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
-- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
-- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
-- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
-- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
-- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
-- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
-- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
-- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
diff --git a/README_assets/benchmark-explanation.png b/README_assets/benchmark-explanation.png
new file mode 100644
index 00000000..81fb4d54
Binary files /dev/null and b/README_assets/benchmark-explanation.png differ
diff --git a/README_assets/loft-foa-eval-bbox.png b/README_assets/loft-foa-eval-bbox.png
new file mode 100644
index 00000000..31841761
Binary files /dev/null and b/README_assets/loft-foa-eval-bbox.png differ
diff --git a/README_assets/loft-foa-eval-segm.png b/README_assets/loft-foa-eval-segm.png
new file mode 100644
index 00000000..9f38e639
Binary files /dev/null and b/README_assets/loft-foa-eval-segm.png differ
diff --git a/README_zh-CN.md b/README_zh-CN.md
deleted file mode 100644
index a7e0c844..00000000
--- a/README_zh-CN.md
+++ /dev/null
@@ -1,413 +0,0 @@
-<div align="center">
-  <img src="resources/mmdet-logo.png" width="600"/>
-  <div>&nbsp;</div>
-  <div align="center">
-    <b><font size="5">OpenMMLab 官网</font></b>
-    <sup>
-      <a href="https://openmmlab.com">
-        <i><font size="4">HOT</font></i>
-      </a>
-    </sup>
-    &nbsp;&nbsp;&nbsp;&nbsp;
-    <b><font size="5">OpenMMLab 开放平台</font></b>
-    <sup>
-      <a href="https://platform.openmmlab.com">
-        <i><font size="4">TRY IT OUT</font></i>
-      </a>
-    </sup>
-  </div>
-  <div>&nbsp;</div>
-
-[![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet)
-[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/)
-[![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions)
-[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection)
-[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE)
-[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues)
-[![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues)
-
-[📘使用文档](https://mmdetection.readthedocs.io/zh_CN/stable/) |
-[🛠️安装教程](https://mmdetection.readthedocs.io/zh_CN/stable/get_started.html) |
-[👀模型库](https://mmdetection.readthedocs.io/zh_CN/stable/model_zoo.html) |
-[🆕更新日志](https://mmdetection.readthedocs.io/en/stable/changelog.html) |
-[🚀进行中的项目](https://github.com/open-mmlab/mmdetection/projects) |
-[🤔报告问题](https://github.com/open-mmlab/mmdetection/issues/new/choose)
-
-</div>
-
-<div align="center">
-
-[English](README.md) | 简体中文
-
-</div>
-
-## 简介
-
-MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [OpenMMLab](https://openmmlab.com/) 项目的一部分。
-
-主分支代码目前支持 PyTorch 1.5 以上的版本。
-
-<img src="https://user-images.githubusercontent.com/12907710/137271636-56ba1cd2-b110-4812-8221-b4c120320aa9.png"/>
-
-<details open>
-<summary>主要特性</summary>
-
-- **模块化设计**
-
-  MMDetection 将检测框架解耦成不同的模块组件，通过组合不同的模块组件，用户可以便捷地构建自定义的检测模型
-
-- **丰富的即插即用的算法和模型**
-
-  MMDetection 支持了众多主流的和最新的检测算法，例如 Faster R-CNN，Mask R-CNN，RetinaNet 等。
-
-- **速度快**
-
-  基本的框和 mask 操作都实现了 GPU 版本，训练速度比其他代码库更快或者相当，包括 [Detectron2](https://github.com/facebookresearch/detectron2), [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) 和 [SimpleDet](https://github.com/TuSimple/simpledet)。
-
-- **性能高**
-
-  MMDetection 这个算法库源自于 COCO 2018 目标检测竞赛的冠军团队 *MMDet* 团队开发的代码，我们在之后持续进行了改进和提升。
-
-</details>
-
-除了 MMDetection 之外，我们还开源了计算机视觉基础库 [MMCV](https://github.com/open-mmlab/mmcv)，MMCV 是 MMDetection 的主要依赖。
-
-## 最新进展
-
-### 💎 稳定版本
-
-最新的 **2.28.1** 版本已经在 2023.2.1 发布:
-
-- 支持 Object365 数据集和遮挡物检测的 benchmark
-- 支持 SSD 和 RetinaNet 算法在昇腾芯片上的加速
-- 不再保证对 Python 3.6 的支持并修复了 2.28.0 的一些 bug
-
-如果想了解更多版本更新细节和历史信息，请阅读[更新日志](docs/en/changelog.md)。
-
-如果想了解 MMDetection 不同版本之间的兼容性, 请参考[兼容性说明文档](docs/zh_cn/compatibility.md)。
-
-### 🌟 3.x 预览版本
-
-#### 亮点
-
-我们很高兴向大家介绍我们在实时目标识别任务方面的最新成果 RTMDet，包含了一系列的全卷积单阶段检测模型。 RTMDet 不仅在从 tiny 到 extra-large 尺寸的目标检测模型上实现了最佳的参数量和精度的平衡，而且在实时实例分割和旋转目标检测任务上取得了最先进的成果。 更多细节请参阅[技术报告](https://arxiv.org/abs/2212.07784)。 预训练模型可以在[这里](https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet)找到。
-
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/real-time-instance-segmentation-on-mscoco)](https://paperswithcode.com/sota/real-time-instance-segmentation-on-mscoco?p=rtmdet-an-empirical-study-of-designing-real)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=rtmdet-an-empirical-study-of-designing-real)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-hrsc2016)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-hrsc2016?p=rtmdet-an-empirical-study-of-designing-real)
-
-| Task                     | Dataset | AP                                   | FPS(TRT FP16 BS1 3090) |
-| ------------------------ | ------- | ------------------------------------ | ---------------------- |
-| Object Detection         | COCO    | 52.8                                 | 322                    |
-| Instance Segmentation    | COCO    | 44.6                                 | 188                    |
-| Rotated Object Detection | DOTA    | 78.9(single-scale)/81.3(multi-scale) | 121                    |
-
-<div align=center>
-<img src="https://user-images.githubusercontent.com/12907710/208044554-1e8de6b5-48d8-44e4-a7b5-75076c7ebb71.png"/>
-</div>
-
-全新的 **v3.0.0rc5** 版本已经在 2022.12.26 发布：
-
-- 支持了 [RTMDet](https://arxiv.org/abs/2212.07784) 的实例分割模型。RTMDet 的技术报告发布在了 [arxiv](https://arxiv.org/abs/2212.07784) 上。
-- 支持了 [SSH: Single Stage Headless Face Detector](https://arxiv.org/abs/1708.03979) 论文中的 SSHContextModule
-
-## 安装
-
-请参考[安装指令](docs/zh_cn/get_started.md/#Installation)进行安装。
-
-## 教程
-
-请参考[快速入门文档](docs/zh_cn/get_started.md)学习 MMDetection 的基本使用。
-我们提供了 [检测的 colab 教程](demo/MMDet_Tutorial.ipynb) 和 [实例分割的 colab 教程](demo/MMDet_InstanceSeg_Tutorial.ipynb)，也为新手提供了完整的运行教程，其他教程如下
-
-- [使用已有模型在标准数据集上进行推理](docs/zh_cn/1_exist_data_model.md)
-- [在自定义数据集上进行训练](docs/zh_cn/2_new_data_model.md)
-- [在标准数据集上训练自定义模型](docs/zh_cn/3_exist_data_new_model.md)
-- [学习配置文件](docs/zh_cn/tutorials/config.md)
-- [自定义数据集](docs/zh_cn/tutorials/customize_dataset.md)
-- [自定义数据预处理流程](docs/zh_cn/tutorials/data_pipeline.md)
-- [自定义模型](docs/zh_cn/tutorials/customize_models.md)
-- [自定义训练配置](docs/zh_cn/tutorials/customize_runtime.md)
-- [自定义损失函数](docs/zh_cn/tutorials/customize_losses.md)
-- [模型微调](docs/zh_cn/tutorials/finetune.md)
-- [Pytorch 到 ONNX 的模型转换](docs/zh_cn/tutorials/pytorch2onnx.md)
-- [ONNX 到 TensorRT 的模型转换](docs/zh_cn/tutorials/onnx2tensorrt.md)
-- [权重初始化](docs/zh_cn/tutorials/init_cfg.md)
-- [how to xxx](docs/zh_cn/tutorials/how_to.md)
-
-同时，我们还提供了 [MMDetection 中文解读文案汇总](docs/zh_cn/article.md)
-
-## 基准测试和模型库
-
-测试结果和模型可以在[模型库](docs/zh_cn/model_zoo.md)中找到。
-
-<div align="center">
-  <b>算法架构</b>
-</div>
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Object Detection</b>
-      </td>
-      <td>
-        <b>Instance Segmentation</b>
-      </td>
-      <td>
-        <b>Panoptic Segmentation</b>
-      </td>
-      <td>
-        <b>Other</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-        <ul>
-            <li><a href="configs/fast_rcnn">Fast R-CNN (ICCV'2015)</a></li>
-            <li><a href="configs/faster_rcnn">Faster R-CNN (NeurIPS'2015)</a></li>
-            <li><a href="configs/rpn">RPN (NeurIPS'2015)</a></li>
-            <li><a href="configs/ssd">SSD (ECCV'2016)</a></li>
-            <li><a href="configs/retinanet">RetinaNet (ICCV'2017)</a></li>
-            <li><a href="configs/cascade_rcnn">Cascade R-CNN (CVPR'2018)</a></li>
-            <li><a href="configs/yolo">YOLOv3 (ArXiv'2018)</a></li>
-            <li><a href="configs/cornernet">CornerNet (ECCV'2018)</a></li>
-            <li><a href="configs/grid_rcnn">Grid R-CNN (CVPR'2019)</a></li>
-            <li><a href="configs/guided_anchoring">Guided Anchoring (CVPR'2019)</a></li>
-            <li><a href="configs/fsaf">FSAF (CVPR'2019)</a></li>
-            <li><a href="configs/centernet">CenterNet (ArXiv'2019)</a></li>
-            <li><a href="configs/libra_rcnn">Libra R-CNN (CVPR'2019)</a></li>
-            <li><a href="configs/tridentnet">TridentNet (ICCV'2019)</a></li>
-            <li><a href="configs/fcos">FCOS (ICCV'2019)</a></li>
-            <li><a href="configs/reppoints">RepPoints (ICCV'2019)</a></li>
-            <li><a href="configs/free_anchor">FreeAnchor (NeurIPS'2019)</a></li>
-            <li><a href="configs/cascade_rpn">CascadeRPN (NeurIPS'2019)</a></li>
-            <li><a href="configs/foveabox">Foveabox (TIP'2020)</a></li>
-            <li><a href="configs/double_heads">Double-Head R-CNN (CVPR'2020)</a></li>
-            <li><a href="configs/atss">ATSS (CVPR'2020)</a></li>
-            <li><a href="configs/nas_fcos">NAS-FCOS (CVPR'2020)</a></li>
-            <li><a href="configs/centripetalnet">CentripetalNet (CVPR'2020)</a></li>
-            <li><a href="configs/autoassign">AutoAssign (ArXiv'2020)</a></li>
-            <li><a href="configs/sabl">Side-Aware Boundary Localization (ECCV'2020)</a></li>
-            <li><a href="configs/dynamic_rcnn">Dynamic R-CNN (ECCV'2020)</a></li>
-            <li><a href="configs/detr">DETR (ECCV'2020)</a></li>
-            <li><a href="configs/paa">PAA (ECCV'2020)</a></li>
-            <li><a href="configs/vfnet">VarifocalNet (CVPR'2021)</a></li>
-            <li><a href="configs/sparse_rcnn">Sparse R-CNN (CVPR'2021)</a></li>
-            <li><a href="configs/yolof">YOLOF (CVPR'2021)</a></li>
-            <li><a href="configs/yolox">YOLOX (ArXiv'2021)</a></li>
-            <li><a href="configs/deformable_detr">Deformable DETR (ICLR'2021)</a></li>
-            <li><a href="configs/tood">TOOD (ICCV'2021)</a></li>
-            <li><a href="configs/ddod">DDOD (ACM MM'2021)</a></li>
-      </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/mask_rcnn">Mask R-CNN (ICCV'2017)</a></li>
-          <li><a href="configs/cascade_rcnn">Cascade Mask R-CNN (CVPR'2018)</a></li>
-          <li><a href="configs/ms_rcnn">Mask Scoring R-CNN (CVPR'2019)</a></li>
-          <li><a href="configs/htc">Hybrid Task Cascade (CVPR'2019)</a></li>
-          <li><a href="configs/yolact">YOLACT (ICCV'2019)</a></li>
-          <li><a href="configs/instaboost">InstaBoost (ICCV'2019)</a></li>
-          <li><a href="configs/solo">SOLO (ECCV'2020)</a></li>
-          <li><a href="configs/point_rend">PointRend (CVPR'2020)</a></li>
-          <li><a href="configs/detectors">DetectoRS (CVPR'2021)</a></li>
-          <li><a href="configs/solov2">SOLOv2 (NeurIPS'2020)</a></li>
-          <li><a href="configs/scnet">SCNet (AAAI'2021)</a></li>
-          <li><a href="configs/queryinst">QueryInst (ICCV'2021)</a></li>
-          <li><a href="configs/mask2former">Mask2Former (CVPR'2022)</a></li>
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/panoptic_fpn">Panoptic FPN (CVPR'2019)</a></li>
-          <li><a href="configs/maskformer">MaskFormer (NeurIPS'2021)</a></li>
-          <li><a href="configs/mask2former">Mask2Former (CVPR'2022)</a></li>
-        </ul>
-      </td>
-      <td>
-        </ul>
-          <li><b>Contrastive Learning</b></li>
-        <ul>
-        <ul>
-          <li><a href="configs/selfsup_pretrain">SwAV (NeurIPS'2020)</a></li>
-          <li><a href="configs/selfsup_pretrain">MoCo (CVPR'2020)</a></li>
-          <li><a href="configs/selfsup_pretrain">MoCov2 (ArXiv'2020)</a></li>
-        </ul>
-        </ul>
-        </ul>
-          <li><b>Distillation</b></li>
-        <ul>
-        <ul>
-          <li><a href="configs/ld">Localization Distillation (CVPR'2022)</a></li>
-          <li><a href="configs/lad">Label Assignment Distillation (WACV'2022)</a></li>
-        </ul>
-        </ul>
-      </ul>
-        <li><b>Receptive Field Search</b></li>
-      <ul>
-        <ul>
-          <li><a href="configs/rfnext">RF-Next (TPAMI'2022)</a></li>
-        </ul>
-        </ul>
-      </ul>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
-
-<div align="center">
-  <b>模块组件</b>
-</div>
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Backbones</b>
-      </td>
-      <td>
-        <b>Necks</b>
-      </td>
-      <td>
-        <b>Loss</b>
-      </td>
-      <td>
-        <b>Common</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-      <ul>
-        <li>VGG (ICLR'2015)</li>
-        <li>ResNet (CVPR'2016)</li>
-        <li>ResNeXt (CVPR'2017)</li>
-        <li>MobileNetV2 (CVPR'2018)</li>
-        <li><a href="configs/hrnet">HRNet (CVPR'2019)</a></li>
-        <li><a href="configs/empirical_attention">Generalized Attention (ICCV'2019)</a></li>
-        <li><a href="configs/gcnet">GCNet (ICCVW'2019)</a></li>
-        <li><a href="configs/res2net">Res2Net (TPAMI'2020)</a></li>
-        <li><a href="configs/regnet">RegNet (CVPR'2020)</a></li>
-        <li><a href="configs/resnest">ResNeSt (CVPRW'2022)</a></li>
-        <li><a href="configs/pvt">PVT (ICCV'2021)</a></li>
-        <li><a href="configs/swin">Swin (ICCV'2021)</a></li>
-        <li><a href="configs/pvt">PVTv2 (CVMJ'2022)</a></li>
-        <li><a href="configs/resnet_strikes_back">ResNet strikes back (NeurIPSW'2021)</a></li>
-        <li><a href="configs/efficientnet">EfficientNet (ICML'2019)</a></li>
-        <li><a href="configs/convnext">ConvNeXt (CVPR'2022)</a></li>
-      </ul>
-      </td>
-      <td>
-      <ul>
-        <li><a href="configs/pafpn">PAFPN (CVPR'2018)</a></li>
-        <li><a href="configs/nas_fpn">NAS-FPN (CVPR'2019)</a></li>
-        <li><a href="configs/carafe">CARAFE (ICCV'2019)</a></li>
-        <li><a href="configs/fpg">FPG (ArXiv'2020)</a></li>
-        <li><a href="configs/groie">GRoIE (ICPR'2020)</a></li>
-        <li><a href="configs/dyhead">DyHead (CVPR'2021)</a></li>
-      </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/ghm">GHM (AAAI'2019)</a></li>
-          <li><a href="configs/gfl">Generalized Focal Loss (NeurIPS'2020)</a></li>
-          <li><a href="configs/seesaw_loss">Seasaw Loss (CVPR'2021)</a></li>
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li><a href="configs/faster_rcnn/faster_rcnn_r50_fpn_ohem_1x_coco.py">OHEM (CVPR'2016)</a></li>
-          <li><a href="configs/gn">Group Normalization (ECCV'2018)</a></li>
-          <li><a href="configs/dcn">DCN (ICCV'2017)</a></li>
-          <li><a href="configs/dcnv2">DCNv2 (CVPR'2019)</a></li>
-          <li><a href="configs/gn+ws">Weight Standardization (ArXiv'2019)</a></li>
-          <li><a href="configs/pisa">Prime Sample Attention (CVPR'2020)</a></li>
-          <li><a href="configs/strong_baselines">Strong Baselines (CVPR'2021)</a></li>
-          <li><a href="configs/resnet_strikes_back">Resnet strikes back (NeurIPSW'2021)</a></li>
-          <li><a href="configs/rfnext">RF-Next (TPAMI'2022)</a></li>
-        </ul>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
-
-我们在[基于 MMDetection 的项目](./docs/zh_cn/projects.md)中列举了一些其他的支持的算法。
-
-## 常见问题
-
-请参考 [FAQ](docs/zh_cn/faq.md) 了解其他用户的常见问题。
-
-## 贡献指南
-
-我们感谢所有的贡献者为改进和提升 MMDetection 所作出的努力。我们将正在进行中的项目添加进了[GitHub Projects](https://github.com/open-mmlab/mmdetection/projects)页面，非常欢迎社区用户能参与进这些项目中来。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
-
-## 致谢
-
-MMDetection 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者，以及提供宝贵反馈的用户。 我们希望这个工具箱和基准测试可以为社区提供灵活的代码工具，供用户复现已有算法并开发自己的新模型，从而不断为开源社区提供贡献。
-
-## 引用
-
-如果你在研究中使用了本项目的代码或者性能基准，请参考如下 bibtex 引用 MMDetection。
-
-```
-@article{mmdetection,
-  title   = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark},
-  author  = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and
-             Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and
-             Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and
-             Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and
-             Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong
-             and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua},
-  journal= {arXiv preprint arXiv:1906.07155},
-  year={2019}
-}
-```
-
-## 开源许可证
-
-该项目采用 [Apache 2.0 开源许可证](LICENSE)。
-
-## OpenMMLab 的其他项目
-
-- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab 深度学习模型训练基础库
-- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
-- [MMEval](https://github.com/open-mmlab/mmeval): 统一开放的跨框架算法评测库
-- [MIM](https://github.com/open-mmlab/mim): MIM 是 OpenMMlab 项目、算法、模型的统一入口
-- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱
-- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab 目标检测工具箱
-- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab 新一代通用 3D 目标检测平台
-- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab 旋转框检测工具箱与测试基准
-- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab 语义分割工具箱
-- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab 全流程文字检测识别理解工具包
-- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab 姿态估计工具箱
-- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 人体参数化模型工具箱与测试基准
-- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab 自监督学习工具箱与测试基准
-- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab 模型压缩工具箱与测试基准
-- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab 少样本学习工具箱与测试基准
-- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab 新一代视频理解工具箱
-- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab 一体化视频目标感知平台
-- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab 光流估计工具箱与测试基准
-- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab 图像视频编辑工具箱
-- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab 图片视频生成模型工具箱
-- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架
-
-## 欢迎加入 OpenMMLab 社区
-
-扫描下方的二维码可关注 OpenMMLab 团队的 [知乎官方账号](https://www.zhihu.com/people/openmmlab)，加入 OpenMMLab 团队的官方交流 QQ 群
-
-<div align="center">
-<img src="resources/zhihu_qrcode.jpg" height="400" />  <img src="https://cdn.vansin.top/OpenMMLab/q3.png" height="400" />
-</div>
-
-我们会在 OpenMMLab 社区为大家
-
-- 📢 分享 AI 框架的前沿核心技术
-- 💻 解读 PyTorch 常用模块源码
-- 📰 发布 OpenMMLab 的相关新闻
-- 🚀 介绍 OpenMMLab 开发的前沿算法
-- 🏃 获取更高效的问题答疑和意见反馈
-- 🔥 提供与各行各业开发者充分交流的平台
-
-干货满满 📘，等你来撩 💗，OpenMMLab 社区期待您的加入 👬
diff --git a/configs/_base_/datasets/bonai_instance.py b/configs/_base_/datasets/bonai_instance.py
new file mode 100644
index 00000000..19bc0fb4
--- /dev/null
+++ b/configs/_base_/datasets/bonai_instance.py
@@ -0,0 +1,73 @@
+dataset_type = "BONAI"
+data_root = "data/BONAI/"
+img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type="LoadImageFromFile"),
+    dict(
+        type="LoadAnnotations",
+        with_bbox=True,
+        with_mask=True,
+        with_offset=True,
+        with_height=True,
+    ),
+    dict(type="Resize", img_scale=(1024, 1024), keep_ratio=True),
+    dict(type="RandomFlip", flip_ratio=0.5, direction=["horizontal", "vertical"]),
+    dict(type="Normalize", **img_norm_cfg),
+    dict(type="Pad", size_divisor=32),
+    dict(type="DefaultFormatBundle"),
+    dict(
+        type="Collect",
+        keys=["img", "gt_bboxes", "gt_labels", "gt_masks", "gt_offsets"],
+    ),
+]
+test_pipeline = [
+    dict(type="LoadImageFromFile"),
+    dict(
+        type="MultiScaleFlipAug",
+        img_scale=(1024, 1024),
+        flip=False,
+        transforms=[
+            dict(type="Resize", keep_ratio=True),
+            dict(type="RandomFlip", flip_ratio=0.5),
+            dict(type="Normalize", **img_norm_cfg),
+            dict(type="Pad", size_divisor=32),
+            dict(type="ImageToTensor", keys=["img"]),
+            dict(type="Collect", keys=["img"]),
+        ],
+    ),
+]
+cities = ["shanghai", "beijing", "jinan", "haerbin", "chengdu"]
+train_ann_file = []
+img_prefix = []
+for city in cities:
+    train_ann_file.append(data_root + "coco/bonai_{}_trainval.json".format(city))
+    img_prefix.append(data_root + "trainval/images/")
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        mask_type="roof",
+        pipeline=train_pipeline,
+    ),
+    val=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        gt_footprint_csv_file="",
+        pipeline=test_pipeline,
+    ),
+    test=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        gt_footprint_csv_file="",
+        pipeline=test_pipeline,
+    ),
+)
+evaluation = dict(interval=1, metric=["bbox", "segm"])
diff --git a/configs/_base_/datasets/bonai_instance_hfm_ssl.py b/configs/_base_/datasets/bonai_instance_hfm_ssl.py
new file mode 100644
index 00000000..c53733a7
--- /dev/null
+++ b/configs/_base_/datasets/bonai_instance_hfm_ssl.py
@@ -0,0 +1,175 @@
+dataset_type = "BONAI_SSL"
+data_root = "data"
+img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type="LoadImageFromFile"),
+    dict(
+        type="LoadAnnotations",
+        with_bbox=True,
+        with_mask=True,
+        with_offset=True,
+        with_height=True,
+        with_height_mask=True,
+        with_footprint_mask=True,
+        with_image_scale_footprint_mask=True,
+        with_footprint_bbox=True,
+        with_offset_angle=True,
+        with_nadir_angle=True,
+        with_semi_supervised_learning=True,
+        with_valid_height_flag=True,
+    ),
+    dict(type="Resize", img_scale=(1024, 1024), keep_ratio=True),
+    dict(type="RandomFlip", flip_ratio=0.5, direction=["horizontal", "vertical"]),
+    dict(type="Normalize", **img_norm_cfg),
+    dict(type="Pad", size_divisor=32),
+    dict(type="LOFTFormatBundle"),
+    dict(
+        type="Collect",
+        keys=[
+            "img",
+            "gt_bboxes",
+            "gt_labels",
+            "gt_masks",
+            "gt_offsets",
+            "gt_heights",
+            "gt_height_masks",
+            "gt_footprint_masks",
+            "gt_image_scale_footprint_masks",
+            "gt_footprint_bboxes",
+            "gt_is_semi_supervised_sample",
+            "gt_is_valid_height_sample",
+            "gt_offset_angles",
+            "gt_nadir_angles",
+            "height_mask_shape",
+            "image_scale_footprint_mask_shape",
+        ],
+    ),
+]
+test_pipeline = [
+    dict(type="LoadImageFromFile"),
+    dict(
+        type="MultiScaleFlipAug",
+        img_scale=(1024, 1024),
+        flip=False,
+        transforms=[
+            dict(type="Resize", keep_ratio=True),
+            dict(type="RandomFlip", flip_ratio=0.5),
+            dict(type="Normalize", **img_norm_cfg),
+            dict(type="Pad", size_divisor=32),
+            dict(type="ImageToTensor", keys=["img"]),
+            dict(type="Collect", keys=["img"]),
+        ],
+    ),
+]
+train_ann_file = []
+img_prefix = []
+
+dataset_dirs = {
+    "bonai_shanghai": "BONAI",
+    "bonai_beijing": "BONAI",
+    "bonai_jinan": "BONAI",
+    "bonai_haerbin": "BONAI",
+    "bonai_chengdu": "BONAI",
+    "OmniCityView3WithOffset": "OmniCityView3WithOffset",
+    "hongkong": "hongkong",
+}
+
+versions_to_ann_dirs = {
+    "30oh": "coco_30oh",
+    "30oh/30h": "coco_30oh_30h",
+    "30oh/30h/40n": "coco_30oh_30h_40n",
+    "30oh/70h": "coco_30oh_70h",
+    "100oh": "coco",
+    "30oh/70n": "coco_30oh_70n",
+}
+
+# ==================== control chosen datasets ====================
+datasets = {
+    # "bonai_shanghai": "30oh/70h",
+    # "bonai_beijing": "30oh/70h",
+    # "bonai_jinan": "30oh/70h",
+    # "bonai_haerbin": "30oh/70h",
+    # "bonai_chengdu": "30oh/70h",
+    #
+    # "bonai_shanghai": "30oh",
+    # "bonai_beijing": "30oh",
+    # "bonai_jinan": "30oh",
+    # "bonai_haerbin": "30oh",
+    # "bonai_chengdu": "30oh",
+    #
+    "bonai_shanghai": "100oh",
+    "bonai_beijing": "100oh",
+    "bonai_jinan": "100oh",
+    "bonai_haerbin": "100oh",
+    "bonai_chengdu": "100oh",
+    #
+    # "OmniCityView3WithOffset": "100oh",
+    # "OmniCityView3WithOffset": "30oh",
+    # "OmniCityView3WithOffset": "30oh/30h",
+    # "OmniCityView3WithOffset": "30oh/70h",
+    # "OmniCityView3WithOffset": "30oh/30h/40n",
+    #
+    # "hongkong": "100oh",
+    # "hongkong": "30oh/70n",
+    # "hongkong": "30oh",
+}
+# =================================================================
+
+for dataset, version in datasets.items():
+    dataset_dir = f"{data_root}/{dataset_dirs[dataset]}"
+    ann_path = f"{dataset_dir}/{versions_to_ann_dirs[version]}/{dataset}_trainval.json"
+    train_ann_file.append(ann_path)
+    img_prefix.append(f"{dataset_dir}/trainval/images")
+
+# Should be aligned with one output layer of FPN.
+height_mask_shape = [256, 256]
+image_scale_footprint_mask_shape = [256, 256]
+resolution = 0.6
+data = dict(
+    train=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        mask_type="roof",
+        height_mask_shape=height_mask_shape,
+        image_scale_footprint_mask_shape=image_scale_footprint_mask_shape,
+        pipeline=train_pipeline,
+    ),
+    train_dataloader=dict(
+        samples_per_gpu=2,
+        workers_per_gpu=2,
+    ),
+    val=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        mask_type="footprint",
+        height_mask_shape=height_mask_shape,
+        image_scale_footprint_mask_shape=image_scale_footprint_mask_shape,
+        gt_footprint_csv_file="",
+        pipeline=test_pipeline,
+    ),
+    val_dataloader=dict(
+        samples_per_gpu=1,
+        workers_per_gpu=2,
+    ),
+    test=dict(
+        type=dataset_type,
+        ann_file=train_ann_file,
+        img_prefix=img_prefix,
+        bbox_type="building",
+        mask_type="footprint",
+        gt_footprint_csv_file="",
+        height_mask_shape=height_mask_shape,
+        image_scale_footprint_mask_shape=image_scale_footprint_mask_shape,
+        pipeline=test_pipeline,
+    ),
+    test_dataloader=dict(
+        samples_per_gpu=1,
+        workers_per_gpu=2,
+    ),
+)
+evaluation = dict(interval=1, metric=["segm"])
+# evaluation = dict(start=20,interval=1, metric=["bbox", "segm"])
diff --git a/configs/_base_/default_runtime.py b/configs/_base_/default_runtime.py
index 5b0b1452..97594eac 100644
--- a/configs/_base_/default_runtime.py
+++ b/configs/_base_/default_runtime.py
@@ -1,24 +1,25 @@
-checkpoint_config = dict(interval=1)
+checkpoint_config = dict(interval=10)
 # yapf:disable
 log_config = dict(
-    interval=50,
+    interval=10,
     hooks=[
-        dict(type='TextLoggerHook'),
+        dict(type="TextLoggerHook"),
         # dict(type='TensorboardLoggerHook')
-    ])
+    ],
+)
 # yapf:enable
-custom_hooks = [dict(type='NumClassCheckHook')]
+custom_hooks = [dict(type="NumClassCheckHook")]
 
-dist_params = dict(backend='nccl')
-log_level = 'INFO'
+dist_params = dict(backend="nccl")
+log_level = "INFO"
 load_from = None
 resume_from = None
-workflow = [('train', 1)]
+workflow = [("train", 1)]
 
 # disable opencv multithreading to avoid system being overloaded
 opencv_num_threads = 0
 # set multi-process start method as `fork` to speed up the training
-mp_start_method = 'fork'
+mp_start_method = "fork"
 
 # Default setting for scaling LR automatically
 #   - `enable` means enable scaling LR automatically
diff --git a/configs/_base_/models/bonai_loft_foa_r50_fpn_basic.py b/configs/_base_/models/bonai_loft_foa_r50_fpn_basic.py
new file mode 100644
index 00000000..ecdd5ba6
--- /dev/null
+++ b/configs/_base_/models/bonai_loft_foa_r50_fpn_basic.py
@@ -0,0 +1,161 @@
+# model settings
+model = dict(
+    type="LOFT",
+    backbone=dict(
+        type="ResNet",
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        norm_cfg=dict(type="BN", requires_grad=True),
+        norm_eval=True,
+        style="pytorch",
+        init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"),
+    ),
+    neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5),
+    rpn_head=dict(
+        type="RPNHead",
+        in_channels=256,
+        feat_channels=256,
+        anchor_generator=dict(
+            type="AnchorGenerator",
+            scales=[8],
+            ratios=[0.5, 1.0, 2.0],
+            strides=[4, 8, 16, 32, 64],
+        ),
+        bbox_coder=dict(
+            type="DeltaXYWHBBoxCoder",
+            target_means=[0.0, 0.0, 0.0, 0.0],
+            target_stds=[1.0, 1.0, 1.0, 1.0],
+        ),
+        loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0),
+        loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+    ),
+    roi_head=dict(
+        type="LoftRoIHead",
+        bbox_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        bbox_head=dict(
+            type="Shared2FCBBoxHead",
+            in_channels=256,
+            fc_out_channels=1024,
+            roi_feat_size=7,
+            num_classes=1,
+            bbox_coder=dict(
+                type="DeltaXYWHBBoxCoder",
+                target_means=[0.0, 0.0, 0.0, 0.0],
+                target_stds=[0.1, 0.1, 0.2, 0.2],
+            ),
+            reg_class_agnostic=False,
+            loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0),
+            loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+        ),
+        mask_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        mask_head=dict(
+            type="FCNMaskHead",
+            num_convs=4,
+            in_channels=256,
+            conv_out_channels=256,
+            num_classes=1,
+            loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0),
+        ),
+        offset_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        offset_head=dict(
+            type="OffsetHeadExpandFeature",
+            expand_feature_num=4,
+            share_expand_fc=True,
+            rotations=[0, 90, 180, 270],
+            num_fcs=2,
+            fc_out_channels=1024,
+            num_convs=10,
+            loss_offset=dict(type="SmoothL1Loss", loss_weight=8 * 2.0),
+        ),
+    ),
+)
+# model training and testing settings
+train_cfg = dict(
+    rpn=dict(
+        assigner=dict(
+            type="MaxIoUAssigner",
+            pos_iou_thr=0.7,
+            neg_iou_thr=0.3,
+            min_pos_iou=0.3,
+            match_low_quality=True,
+            ignore_iof_thr=-1,
+            gpu_assign_thr=512,
+        ),
+        sampler=dict(
+            type="RandomSampler",
+            num=512,
+            pos_fraction=0.5,
+            neg_pos_ub=-1,
+            add_gt_as_proposals=False,
+        ),
+        allowed_border=-1,
+        pos_weight=-1,
+        debug=False,
+    ),
+    rpn_proposal=dict(
+        nms_pre=3000,
+        # max_num=3000,
+        max_per_img=3000,
+        nms=dict(type="nms", iou_threshold=0.7),
+        min_bbox_size=0,
+        nms_across_levels=False,
+        nms_post=3000,
+        # nms_thr=0.7,
+    ),
+    rcnn=dict(
+        assigner=dict(
+            type="MaxIoUAssigner",
+            pos_iou_thr=0.5,
+            neg_iou_thr=0.5,
+            min_pos_iou=0.5,
+            match_low_quality=True,
+            ignore_iof_thr=-1,
+            gpu_assign_thr=512,
+        ),
+        sampler=dict(
+            type="RandomSampler",
+            num=1024,
+            pos_fraction=0.25,
+            neg_pos_ub=-1,
+            add_gt_as_proposals=True,
+        ),
+        mask_size=28,
+        pos_weight=-1,
+        debug=False,
+    ),
+)
+test_cfg = dict(
+    rpn=dict(
+        nms_pre=3000,
+        # max_num=3000,
+        max_per_img=3000,
+        nms=dict(type="nms", iou_threshold=0.7),
+        min_bbox_size=0,
+        nms_across_levels=False,
+        nms_post=3000,
+        # nms_thr=0.7,
+    ),
+    rcnn=dict(
+        score_thr=0.05,
+        nms=dict(type="soft_nms", iou_threshold=0.5),
+        max_per_img=2000,
+        mask_thr_binary=0.5,
+    ),
+)
diff --git a/configs/_base_/models/bonai_loft_foahfm_r50_fpn_basic.py b/configs/_base_/models/bonai_loft_foahfm_r50_fpn_basic.py
new file mode 100644
index 00000000..217e02b6
--- /dev/null
+++ b/configs/_base_/models/bonai_loft_foahfm_r50_fpn_basic.py
@@ -0,0 +1,237 @@
+# model settings
+model = dict(
+    type="LOFT",
+    backbone=dict(
+        type="ResNet",
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        norm_cfg=dict(type="BN", requires_grad=True),
+        norm_eval=True,
+        style="pytorch",
+        init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"),
+    ),
+    neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5),
+    offset_angle_head=dict(
+        type="OffsetAngleHead",
+        gt_footprint_mask_as_condition=False,
+        # gt_footprint_mask_as_condition=True,
+        # gt_footprint_mask_repeat_num=32,
+        num_convs=4,
+        conv_out_channels=[256, 256, 256, 256],
+        strides=[2, 2, 2, 2],
+        kernel_size=3,
+        num_fcs=2,
+        fc_out_channels=[64, 16],
+        in_size=256,
+        in_channels=256,
+        with_tanh=True,
+        regular_lambda=0.1,
+        loss_angle=dict(type="SmoothL1Loss", loss_weight=1.0),
+        loss_method="loss",
+    ),
+    nadir_angle_head=dict(
+        type="NadirAngleHead",
+        gt_height_mask_as_condition=False,
+        # gt_height_mask_as_condition=True,
+        # gt_height_mask_repeat_num=32,
+        num_convs=8,
+        conv_out_channels=[512, 512, 512, 512, 1024, 1024, 2048, 2048],
+        strides=[1, 2, 1, 2, 1, 2, 1, 2],
+        kernel_size=3,
+        num_fcs=5,
+        fc_out_channels=[512, 256, 128, 64, 32],
+        in_size=256,
+        in_channels=256,
+        reg_num=1,
+        loss_angle=dict(type="SmoothL1Loss", loss_weight=4 * 2.0),
+    ),
+    # loss_offset_angle_consistency=dict(
+    #     type="SmoothL1Loss",
+    #     loss_weight=0.05,
+    #     regular_lambda=(0.01, 100.0),
+    # ),
+    rpn_head=dict(
+        type="RPNHead",
+        in_channels=256,
+        feat_channels=256,
+        anchor_generator=dict(
+            type="AnchorGenerator",
+            scales=[8],
+            ratios=[0.5, 1.0, 2.0],
+            strides=[4, 8, 16, 32, 64],
+        ),
+        bbox_coder=dict(
+            type="DeltaXYWHBBoxCoder",
+            target_means=[0.0, 0.0, 0.0, 0.0],
+            target_stds=[1.0, 1.0, 1.0, 1.0],
+        ),
+        loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0),
+        loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+    ),
+    roi_head=dict(
+        type="LoftHFMRoIHead",
+        bbox_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        bbox_head=dict(
+            type="Shared2FCBBoxHead",
+            in_channels=256,
+            fc_out_channels=1024,
+            roi_feat_size=7,
+            num_classes=1,
+            bbox_coder=dict(
+                type="DeltaXYWHBBoxCoder",
+                target_means=[0.0, 0.0, 0.0, 0.0],
+                target_stds=[0.1, 0.1, 0.2, 0.2],
+            ),
+            reg_class_agnostic=False,
+            loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0),
+            loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+        ),
+        mask_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        mask_head=dict(
+            type="FCNMaskHead",
+            num_convs=4,
+            roi_feat_size=14,
+            in_channels=256,
+            conv_out_channels=256,
+            num_classes=1,
+            loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0),
+        ),
+        offset_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        offset_head=dict(
+            type="OffsetHeadExpandFeature",
+            expand_feature_num=4,
+            share_expand_fc=True,
+            rotations=[0, 90, 180, 270],
+            num_fcs=2,
+            fc_out_channels=1024,
+            num_convs=10,
+            loss_offset=dict(type="SmoothL1Loss", loss_weight=8 * 2.0),
+        ),
+        footprint_mask_from_roof_offset_head=dict(
+            type="FootprintMaskFromRoofOffsetHead",
+            num_convs=8,
+            roi_feat_size=28,
+            upsample_cfg=dict(type="deconv", scale_factor=1),
+            in_channels=3,
+            conv_out_channels=256,
+            num_classes=1,
+            loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0),
+        ),
+        height_roi_extractor=dict(
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32],
+        ),
+        height_head=dict(
+            type="HeightHead",
+            num_fcs=2,
+            fc_out_channels=1024,
+            num_convs=10,
+            loss_height=dict(type="SmoothL1Loss", loss_weight=16 * 2.0),
+        ),
+    ),
+)
+train_cfg = dict(
+    rpn=dict(
+        assigner=dict(
+            type="MaxIoUAssigner",
+            pos_iou_thr=0.7,
+            neg_iou_thr=0.3,
+            min_pos_iou=0.3,
+            match_low_quality=True,
+            ignore_iof_thr=-1,
+            gpu_assign_thr=512,
+        ),
+        sampler=dict(
+            type="RandomSampler",
+            num=512,
+            pos_fraction=0.5,
+            neg_pos_ub=-1,
+            add_gt_as_proposals=False,
+        ),
+        allowed_border=-1,
+        pos_weight=-1,
+        debug=False,
+    ),
+    rpn_proposal=dict(
+        nms_pre=3000,
+        max_per_img=3000,
+        nms=dict(type="nms", iou_threshold=0.7),
+        min_bbox_size=0,
+        nms_across_levels=False,
+        nms_post=3000,
+    ),
+    rcnn=dict(
+        assigner=dict(
+            type="MaxIoUAssigner",
+            pos_iou_thr=0.5,
+            neg_iou_thr=0.5,
+            min_pos_iou=0.5,
+            match_low_quality=True,
+            ignore_iof_thr=-1,
+            gpu_assign_thr=512,
+        ),
+        sampler=dict(
+            type="RandomSampler",
+            num=1024,
+            pos_fraction=0.25,
+            neg_pos_ub=-1,
+            add_gt_as_proposals=True,
+        ),
+        mask_size=28,
+        pos_weight=-1,
+        debug=False,
+    ),
+    # pseudo_rpn_bboxes_wh_ratio=[1.5, 3],
+    # pseudo_rpn_bboxes_wh_ratio=[1.5, 2],
+    pseudo_rpn_bboxes_wh_ratio=[1.2, 2],
+    offset_scale=1.0,
+    pseudo_rpn_bbox_scale=1.0,
+    resolution=0.6,
+    shrunk_losses={
+        # "rpn_cls",
+        # "rpn_bbox",
+        "bbox",
+        "cls",
+        "mask",
+        "offset",
+    },
+    shrunk_factor=0.0,
+    # use_pred_for_offset_angle_consistency=True,
+    use_pred_for_offset_angle_consistency=False,
+    footprint_mask_fro_loss_lambda=1,
+)
+test_cfg = dict(
+    rpn=dict(
+        nms_pre=3000,
+        max_per_img=3000,
+        nms=dict(type="nms", iou_threshold=0.7),
+        min_bbox_size=0,
+        nms_across_levels=False,
+        nms_post=3000,
+    ),
+    rcnn=dict(
+        score_thr=0.05,
+        nms=dict(type="soft_nms", iou_threshold=0.5),
+        max_per_img=2000,
+        mask_thr_binary=0.5,
+    ),
+)
diff --git a/configs/_base_/models/mask_rcnn_r50_fpn.py b/configs/_base_/models/mask_rcnn_r50_fpn.py
index d903e55e..3115b544 100644
--- a/configs/_base_/models/mask_rcnn_r50_fpn.py
+++ b/configs/_base_/models/mask_rcnn_r50_fpn.py
@@ -1,120 +1,137 @@
 # model settings
 model = dict(
-    type='MaskRCNN',
+    type="MaskRCNN",
     backbone=dict(
-        type='ResNet',
+        type="ResNet",
         depth=50,
         num_stages=4,
         out_indices=(0, 1, 2, 3),
         frozen_stages=1,
-        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_cfg=dict(type="BN", requires_grad=True),
         norm_eval=True,
-        style='pytorch',
-        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
+        style="pytorch",
+        init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"),
+    ),
     neck=dict(
-        type='FPN',
-        in_channels=[256, 512, 1024, 2048],
-        out_channels=256,
-        num_outs=5),
+        type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5
+    ),
     rpn_head=dict(
-        type='RPNHead',
+        type="RPNHead",
         in_channels=256,
         feat_channels=256,
         anchor_generator=dict(
-            type='AnchorGenerator',
+            type="AnchorGenerator",
             scales=[8],
             ratios=[0.5, 1.0, 2.0],
-            strides=[4, 8, 16, 32, 64]),
+            strides=[4, 8, 16, 32, 64],
+        ),
         bbox_coder=dict(
-            type='DeltaXYWHBBoxCoder',
-            target_means=[.0, .0, .0, .0],
-            target_stds=[1.0, 1.0, 1.0, 1.0]),
-        loss_cls=dict(
-            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
-        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
+            type="DeltaXYWHBBoxCoder",
+            target_means=[0.0, 0.0, 0.0, 0.0],
+            target_stds=[1.0, 1.0, 1.0, 1.0],
+        ),
+        loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0),
+        loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+    ),
     roi_head=dict(
-        type='StandardRoIHead',
+        type="StandardRoIHead",
         bbox_roi_extractor=dict(
-            type='SingleRoIExtractor',
-            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
             out_channels=256,
-            featmap_strides=[4, 8, 16, 32]),
+            featmap_strides=[4, 8, 16, 32],
+        ),
         bbox_head=dict(
-            type='Shared2FCBBoxHead',
+            type="Shared2FCBBoxHead",
             in_channels=256,
             fc_out_channels=1024,
             roi_feat_size=7,
             num_classes=80,
             bbox_coder=dict(
-                type='DeltaXYWHBBoxCoder',
-                target_means=[0., 0., 0., 0.],
-                target_stds=[0.1, 0.1, 0.2, 0.2]),
+                type="DeltaXYWHBBoxCoder",
+                target_means=[0.0, 0.0, 0.0, 0.0],
+                target_stds=[0.1, 0.1, 0.2, 0.2],
+            ),
             reg_class_agnostic=False,
-            loss_cls=dict(
-                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
-            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
+            loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0),
+            loss_bbox=dict(type="L1Loss", loss_weight=1.0),
+        ),
         mask_roi_extractor=dict(
-            type='SingleRoIExtractor',
-            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
+            type="SingleRoIExtractor",
+            roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0),
             out_channels=256,
-            featmap_strides=[4, 8, 16, 32]),
+            featmap_strides=[4, 8, 16, 32],
+        ),
         mask_head=dict(
-            type='FCNMaskHead',
+            type="FCNMaskHead",
             num_convs=4,
             in_channels=256,
             conv_out_channels=256,
             num_classes=80,
-            loss_mask=dict(
-                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
+            loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0),
+        ),
+    ),
     # model training and testing settings
     train_cfg=dict(
         rpn=dict(
             assigner=dict(
-                type='MaxIoUAssigner',
+                type="MaxIoUAssigner",
                 pos_iou_thr=0.7,
                 neg_iou_thr=0.3,
                 min_pos_iou=0.3,
                 match_low_quality=True,
-                ignore_iof_thr=-1),
+                ignore_iof_thr=-1,
+            ),
             sampler=dict(
-                type='RandomSampler',
+                type="RandomSampler",
                 num=256,
                 pos_fraction=0.5,
                 neg_pos_ub=-1,
-                add_gt_as_proposals=False),
+                add_gt_as_proposals=False,
+            ),
             allowed_border=-1,
             pos_weight=-1,
-            debug=False),
+            debug=False,
+        ),
         rpn_proposal=dict(
             nms_pre=2000,
             max_per_img=1000,
-            nms=dict(type='nms', iou_threshold=0.7),
-            min_bbox_size=0),
+            nms=dict(type="nms", iou_threshold=0.7),
+            min_bbox_size=0,
+        ),
         rcnn=dict(
             assigner=dict(
-                type='MaxIoUAssigner',
+                type="MaxIoUAssigner",
                 pos_iou_thr=0.5,
                 neg_iou_thr=0.5,
                 min_pos_iou=0.5,
                 match_low_quality=True,
-                ignore_iof_thr=-1),
+                ignore_iof_thr=-1,
+            ),
             sampler=dict(
-                type='RandomSampler',
+                type="RandomSampler",
                 num=512,
                 pos_fraction=0.25,
                 neg_pos_ub=-1,
-                add_gt_as_proposals=True),
+                add_gt_as_proposals=True,
+            ),
             mask_size=28,
             pos_weight=-1,
-            debug=False)),
+            debug=False,
+        ),
+    ),
     test_cfg=dict(
         rpn=dict(
             nms_pre=1000,
             max_per_img=1000,
-            nms=dict(type='nms', iou_threshold=0.7),
-            min_bbox_size=0),
+            nms=dict(type="nms", iou_threshold=0.7),
+            min_bbox_size=0,
+        ),
         rcnn=dict(
             score_thr=0.05,
-            nms=dict(type='nms', iou_threshold=0.5),
+            nms=dict(type="nms", iou_threshold=0.5),
             max_per_img=100,
-            mask_thr_binary=0.5)))
+            mask_thr_binary=0.5,
+        ),
+    ),
+)
diff --git a/configs/_base_/schedules/schedule_2x_bonai.py b/configs/_base_/schedules/schedule_2x_bonai.py
new file mode 100644
index 00000000..3b9a75ed
--- /dev/null
+++ b/configs/_base_/schedules/schedule_2x_bonai.py
@@ -0,0 +1,26 @@
+# optimizer for 4 GPUs
+optimizer = dict(
+    type="SGD",
+    lr=0.01,
+    momentum=0.9,
+    weight_decay=0.0001,
+)
+optimizer_config = dict(
+    grad_clip=dict(
+        max_norm=35,
+        norm_type=2,
+    ),
+)
+# learning policy
+lr_config = dict(
+    policy="step",
+    warmup="linear",
+    warmup_iters=300,
+    warmup_ratio=0.001,
+    step=[16, 22],
+)
+runner = dict(
+    type="StagedEpochBasedRunner",
+    max_epochs=24,
+    supervised_epochs=0,
+)
diff --git a/configs/_base_/schedules/schedule_4x.py b/configs/_base_/schedules/schedule_4x.py
new file mode 100644
index 00000000..2bda2e61
--- /dev/null
+++ b/configs/_base_/schedules/schedule_4x.py
@@ -0,0 +1,8 @@
+# optimizer
+optimizer = dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=None)
+# learning policy
+lr_config = dict(
+    policy="step", warmup="linear", warmup_iters=500, warmup_ratio=0.001, step=[32, 44]
+)
+total_epochs = 48
diff --git a/configs/loft_foa/loft_foa_r50_fpn_2x_bonai.py b/configs/loft_foa/loft_foa_r50_fpn_2x_bonai.py
new file mode 100644
index 00000000..5d6e218a
--- /dev/null
+++ b/configs/loft_foa/loft_foa_r50_fpn_2x_bonai.py
@@ -0,0 +1,6 @@
+_base_ = [
+    "../_base_/models/bonai_loft_foa_r50_fpn_basic.py",
+    "../_base_/datasets/bonai_instance.py",
+    "../_base_/schedules/schedule_2x_bonai.py",
+    "../_base_/default_runtime.py",
+]
diff --git a/configs/loft_foah/loft_foah_r50_fpn_2x_bonai.py b/configs/loft_foah/loft_foah_r50_fpn_2x_bonai.py
new file mode 100644
index 00000000..00ab8af4
--- /dev/null
+++ b/configs/loft_foah/loft_foah_r50_fpn_2x_bonai.py
@@ -0,0 +1,6 @@
+_base_ = [
+    "../_base_/models/bonai_loft_foah_r50_fpn_basic.py",
+    "../_base_/datasets/bonai_instance_h.py",
+    "../_base_/schedules/schedule_2x_bonai.py",
+    "../_base_/default_runtime.py",
+]
diff --git a/configs/loft_foahfm/loft_foahfm_r50_fpn_2x_bonai.py b/configs/loft_foahfm/loft_foahfm_r50_fpn_2x_bonai.py
new file mode 100644
index 00000000..8b314fc2
--- /dev/null
+++ b/configs/loft_foahfm/loft_foahfm_r50_fpn_2x_bonai.py
@@ -0,0 +1,6 @@
+_base_ = [
+    "../_base_/models/bonai_loft_foahfm_r50_fpn_basic.py",
+    "../_base_/datasets/bonai_instance_hfm.py",
+    "../_base_/schedules/schedule_2x_bonai.py",
+    "../_base_/default_runtime.py",
+]
diff --git a/configs/loft_foahfm_ssl/loft_foahfm_r50_fpn_2x_bonai_ssl.py b/configs/loft_foahfm_ssl/loft_foahfm_r50_fpn_2x_bonai_ssl.py
new file mode 100644
index 00000000..6c7086f2
--- /dev/null
+++ b/configs/loft_foahfm_ssl/loft_foahfm_r50_fpn_2x_bonai_ssl.py
@@ -0,0 +1,6 @@
+_base_ = [
+    "../_base_/models/bonai_loft_foahfm_r50_fpn_basic.py",
+    "../_base_/datasets/bonai_instance_hfm_ssl.py",
+    "../_base_/schedules/schedule_2x_bonai.py",
+    "../_base_/default_runtime.py",
+]
diff --git a/docs/fig-overview.jpg b/docs/fig-overview.jpg
new file mode 100644
index 00000000..b2377121
Binary files /dev/null and b/docs/fig-overview.jpg differ
diff --git a/mmdet/apis/inference.py b/mmdet/apis/inference.py
index f0858a7c..effcdcd0 100644
--- a/mmdet/apis/inference.py
+++ b/mmdet/apis/inference.py
@@ -156,10 +156,7 @@ def inference_detector(model, imgs):
     with torch.no_grad():
         results = model(return_loss=False, rescale=True, **data)
 
-    if not is_batch:
-        return results[0]
-    else:
-        return results
+    return results
 
 
 async def async_inference_detector(model, imgs):
diff --git a/mmdet/apis/test.py b/mmdet/apis/test.py
index 973d3623..4ae109d7 100644
--- a/mmdet/apis/test.py
+++ b/mmdet/apis/test.py
@@ -14,39 +14,36 @@
 from mmdet.core import encode_mask_results
 
 
-def single_gpu_test(model,
-                    data_loader,
-                    show=False,
-                    out_dir=None,
-                    show_score_thr=0.3):
+def single_gpu_test(model, data_loader, show=False, out_dir=None, show_score_thr=0.3):
     model.eval()
     results = []
     dataset = data_loader.dataset
-    PALETTE = getattr(dataset, 'PALETTE', None)
+    PALETTE = getattr(dataset, "PALETTE", None)
     prog_bar = mmcv.ProgressBar(len(dataset))
     for i, data in enumerate(data_loader):
         with torch.no_grad():
-            result = model(return_loss=False, rescale=True, **data)
+            result = _align_result(model(return_loss=False, rescale=True, **data))
 
         batch_size = len(result)
+
         if show or out_dir:
-            if batch_size == 1 and isinstance(data['img'][0], torch.Tensor):
-                img_tensor = data['img'][0]
+            if batch_size == 1 and isinstance(data["img"][0], torch.Tensor):
+                img_tensor = data["img"][0]
             else:
-                img_tensor = data['img'][0].data[0]
-            img_metas = data['img_metas'][0].data[0]
-            imgs = tensor2imgs(img_tensor, **img_metas[0]['img_norm_cfg'])
+                img_tensor = data["img"][0].data[0]
+            img_metas = data["img_metas"][0].data[0]
+            imgs = tensor2imgs(img_tensor, **img_metas[0]["img_norm_cfg"])
             assert len(imgs) == len(img_metas)
 
             for i, (img, img_meta) in enumerate(zip(imgs, img_metas)):
-                h, w, _ = img_meta['img_shape']
+                h, w, _ = img_meta["img_shape"]
                 img_show = img[:h, :w, :]
 
-                ori_h, ori_w = img_meta['ori_shape'][:-1]
+                ori_h, ori_w = img_meta["ori_shape"][:-1]
                 img_show = mmcv.imresize(img_show, (ori_w, ori_h))
 
                 if out_dir:
-                    out_file = osp.join(out_dir, img_meta['ori_filename'])
+                    out_file = osp.join(out_dir, img_meta["ori_filename"])
                 else:
                     out_file = None
 
@@ -58,26 +55,65 @@ def single_gpu_test(model,
                     mask_color=PALETTE,
                     show=show,
                     out_file=out_file,
-                    score_thr=show_score_thr)
-
-        # encode mask results
-        if isinstance(result[0], tuple):
-            result = [(bbox_results, encode_mask_results(mask_results))
-                      for bbox_results, mask_results in result]
-        # This logic is only used in panoptic segmentation test.
-        elif isinstance(result[0], dict) and 'ins_results' in result[0]:
-            for j in range(len(result)):
-                bbox_results, mask_results = result[j]['ins_results']
-                result[j]['ins_results'] = (bbox_results,
-                                            encode_mask_results(mask_results))
+                    score_thr=show_score_thr,
+                )
 
-        results.extend(result)
+        results += result
 
         for _ in range(batch_size):
             prog_bar.update()
+
     return results
 
 
+def _align_result(result):
+    (
+        bbox_results,
+        offset_results,
+        mask_results,
+        height_results,
+        footprint_mask_results,
+        footprint_mask_fro_results,
+        offset_angle,
+        nadir_angle,
+    ) = result
+    encoded_mask_results = (
+        [encode_mask_results(mask_result) for mask_result in mask_results]
+        if mask_results[0] is not None
+        else encoded_mask_results
+    )
+    encoded_footprint_mask_results = (
+        [
+            encode_mask_results(footprint_mask_result)
+            for footprint_mask_result in footprint_mask_results
+        ]
+        if footprint_mask_results[0] is not None
+        else footprint_mask_results
+    )
+    encoded_footprint_mask_fro_results = (
+        [
+            encode_mask_results(footprint_mask_fro_result)
+            for footprint_mask_fro_result in footprint_mask_fro_results
+        ]
+        if footprint_mask_fro_results[0] is not None
+        else footprint_mask_fro_results
+    )
+
+    result = list(
+        zip(
+            bbox_results,
+            encoded_mask_results,
+            offset_results,
+            height_results,
+            encoded_footprint_mask_results,
+            encoded_footprint_mask_fro_results,
+            offset_angle,
+            nadir_angle,
+        )
+    )
+    return result
+
+
 def multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
     """Test model with multiple gpus.
 
@@ -104,28 +140,18 @@ def multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
     if rank == 0:
         prog_bar = mmcv.ProgressBar(len(dataset))
     time.sleep(2)  # This line can prevent deadlock problem in some cases.
+
     for i, data in enumerate(data_loader):
         with torch.no_grad():
-            result = model(return_loss=False, rescale=True, **data)
-            # encode mask results
-            if isinstance(result[0], tuple):
-                result = [(bbox_results, encode_mask_results(mask_results))
-                          for bbox_results, mask_results in result]
-            # This logic is only used in panoptic segmentation test.
-            elif isinstance(result[0], dict) and 'ins_results' in result[0]:
-                for j in range(len(result)):
-                    bbox_results, mask_results = result[j]['ins_results']
-                    result[j]['ins_results'] = (
-                        bbox_results, encode_mask_results(mask_results))
-
-        results.extend(result)
+            result = _align_result(model(return_loss=False, rescale=True, **data))
+
+        results += result
 
         if rank == 0:
             batch_size = len(result)
             for _ in range(batch_size * world_size):
                 prog_bar.update()
 
-    # collect results from all ranks
     if gpu_collect:
         results = collect_results_gpu(results, len(dataset))
     else:
@@ -139,22 +165,18 @@ def collect_results_cpu(result_part, size, tmpdir=None):
     if tmpdir is None:
         MAX_LEN = 512
         # 32 is whitespace
-        dir_tensor = torch.full((MAX_LEN, ),
-                                32,
-                                dtype=torch.uint8,
-                                device='cuda')
+        dir_tensor = torch.full((MAX_LEN,), 32, dtype=torch.uint8, device="cuda")
         if rank == 0:
-            mmcv.mkdir_or_exist('.dist_test')
-            tmpdir = tempfile.mkdtemp(dir='.dist_test')
-            tmpdir = torch.tensor(
-                bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
-            dir_tensor[:len(tmpdir)] = tmpdir
+            mmcv.mkdir_or_exist(".dist_test")
+            tmpdir = tempfile.mkdtemp(dir=".dist_test")
+            tmpdir = torch.tensor(bytearray(tmpdir.encode()), dtype=torch.uint8, device="cuda")
+            dir_tensor[: len(tmpdir)] = tmpdir
         dist.broadcast(dir_tensor, 0)
         tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
     else:
         mmcv.mkdir_or_exist(tmpdir)
     # dump the part result to the dir
-    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
+    mmcv.dump(result_part, osp.join(tmpdir, f"part_{rank}.pkl"))
     dist.barrier()
     # collect all parts
     if rank != 0:
@@ -163,7 +185,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
         # load results of all parts from tmp dir
         part_list = []
         for i in range(world_size):
-            part_file = osp.join(tmpdir, f'part_{i}.pkl')
+            part_file = osp.join(tmpdir, f"part_{i}.pkl")
             part_list.append(mmcv.load(part_file))
         # sort the results
         ordered_results = []
@@ -180,26 +202,24 @@ def collect_results_gpu(result_part, size):
     rank, world_size = get_dist_info()
     # dump result part to tensor with pickle
     part_tensor = torch.tensor(
-        bytearray(pickle.dumps(result_part)), dtype=torch.uint8, device='cuda')
+        bytearray(pickle.dumps(result_part)), dtype=torch.uint8, device="cuda"
+    )
     # gather all result part tensor shape
-    shape_tensor = torch.tensor(part_tensor.shape, device='cuda')
+    shape_tensor = torch.tensor(part_tensor.shape, device="cuda")
     shape_list = [shape_tensor.clone() for _ in range(world_size)]
     dist.all_gather(shape_list, shape_tensor)
     # padding result part tensor to max length
     shape_max = torch.tensor(shape_list).max()
-    part_send = torch.zeros(shape_max, dtype=torch.uint8, device='cuda')
-    part_send[:shape_tensor[0]] = part_tensor
-    part_recv_list = [
-        part_tensor.new_zeros(shape_max) for _ in range(world_size)
-    ]
+    part_send = torch.zeros(shape_max, dtype=torch.uint8, device="cuda")
+    part_send[: shape_tensor[0]] = part_tensor
+    part_recv_list = [part_tensor.new_zeros(shape_max) for _ in range(world_size)]
     # gather all result part
     dist.all_gather(part_recv_list, part_send)
 
     if rank == 0:
         part_list = []
         for recv, shape in zip(part_recv_list, shape_list):
-            part_list.append(
-                pickle.loads(recv[:shape[0]].cpu().numpy().tobytes()))
+            part_list.append(pickle.loads(recv[: shape[0]].cpu().numpy().tobytes()))
         # sort the results
         ordered_results = []
         for res in zip(*part_list):
diff --git a/mmdet/apis/train.py b/mmdet/apis/train.py
index f51f862a..11465c3b 100644
--- a/mmdet/apis/train.py
+++ b/mmdet/apis/train.py
@@ -5,18 +5,21 @@
 import numpy as np
 import torch
 import torch.distributed as dist
-from mmcv.runner import (DistSamplerSeedHook, EpochBasedRunner,
-                         Fp16OptimizerHook, OptimizerHook, build_runner,
-                         get_dist_info)
+from mmcv.runner import (
+    DistSamplerSeedHook,
+    EpochBasedRunner,
+    Fp16OptimizerHook,
+    OptimizerHook,
+    build_runner,
+    get_dist_info,
+)
 
 from mmdet.core import DistEvalHook, EvalHook, build_optimizer
-from mmdet.datasets import (build_dataloader, build_dataset,
-                            replace_ImageToTensor)
-from mmdet.utils import (build_ddp, build_dp, compat_cfg,
-                         find_latest_checkpoint, get_root_logger)
+from mmdet.datasets import build_dataloader, build_dataset, replace_ImageToTensor
+from mmdet.utils import build_ddp, build_dp, compat_cfg, find_latest_checkpoint, get_root_logger
 
 
-def init_random_seed(seed=None, device='cuda'):
+def init_random_seed(seed=None, device="cuda"):
     """Initialize random seed.
 
     If the seed is not set, the seed will be automatically randomized,
@@ -77,14 +80,12 @@ def auto_scale_lr(cfg, distributed, logger):
         logger (logging.Logger): Logger.
     """
     # Get flag from config
-    if ('auto_scale_lr' not in cfg) or \
-            (not cfg.auto_scale_lr.get('enable', False)):
-        logger.info('Automatic scaling of learning rate (LR)'
-                    ' has been disabled.')
+    if ("auto_scale_lr" not in cfg) or (not cfg.auto_scale_lr.get("enable", False)):
+        logger.info("Automatic scaling of learning rate (LR)" " has been disabled.")
         return
 
     # Get base batch size from config
-    base_batch_size = cfg.auto_scale_lr.get('base_batch_size', None)
+    base_batch_size = cfg.auto_scale_lr.get("base_batch_size", None)
     if base_batch_size is None:
         return
 
@@ -98,38 +99,35 @@ def auto_scale_lr(cfg, distributed, logger):
     # calculate the batch size
     samples_per_gpu = cfg.data.train_dataloader.samples_per_gpu
     batch_size = num_gpus * samples_per_gpu
-    logger.info(f'Training with {num_gpus} GPU(s) with {samples_per_gpu} '
-                f'samples per GPU. The total batch size is {batch_size}.')
+    logger.info(
+        f"Training with {num_gpus} GPU(s) with {samples_per_gpu} "
+        f"samples per GPU. The total batch size is {batch_size}."
+    )
 
     if batch_size != base_batch_size:
         # scale LR with
         # [linear scaling rule](https://arxiv.org/abs/1706.02677)
         scaled_lr = (batch_size / base_batch_size) * cfg.optimizer.lr
-        logger.info('LR has been automatically scaled '
-                    f'from {cfg.optimizer.lr} to {scaled_lr}')
+        logger.info("LR has been automatically scaled " f"from {cfg.optimizer.lr} to {scaled_lr}")
         cfg.optimizer.lr = scaled_lr
     else:
-        logger.info('The batch size match the '
-                    f'base batch size: {base_batch_size}, '
-                    f'will not scaling the LR ({cfg.optimizer.lr}).')
+        logger.info(
+            "The batch size match the "
+            f"base batch size: {base_batch_size}, "
+            f"will not scaling the LR ({cfg.optimizer.lr})."
+        )
 
 
-def train_detector(model,
-                   dataset,
-                   cfg,
-                   distributed=False,
-                   validate=False,
-                   timestamp=None,
-                   meta=None):
-
+def train_detector(
+    model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None
+):
     cfg = compat_cfg(cfg)
     logger = get_root_logger(log_level=cfg.log_level)
 
     # prepare data loaders
     dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
 
-    runner_type = 'EpochBasedRunner' if 'runner' not in cfg else cfg.runner[
-        'type']
+    runner_type = "EpochBasedRunner" if "runner" not in cfg else cfg.runner["type"]
 
     train_dataloader_default_args = dict(
         samples_per_gpu=2,
@@ -139,26 +137,25 @@ def train_detector(model,
         dist=distributed,
         seed=cfg.seed,
         runner_type=runner_type,
-        persistent_workers=False)
+        persistent_workers=False,
+    )
 
-    train_loader_cfg = {
-        **train_dataloader_default_args,
-        **cfg.data.get('train_dataloader', {})
-    }
+    train_loader_cfg = {**train_dataloader_default_args, **cfg.data.get("train_dataloader", {})}
 
     data_loaders = [build_dataloader(ds, **train_loader_cfg) for ds in dataset]
 
     # put model on gpus
     if distributed:
-        find_unused_parameters = cfg.get('find_unused_parameters', False)
+        find_unused_parameters = cfg.get("find_unused_parameters", False)
         # Sets the `find_unused_parameters` parameter in
         # torch.nn.parallel.DistributedDataParallel
         model = build_ddp(
             model,
             cfg.device,
-            device_ids=[int(os.environ['LOCAL_RANK'])],
+            device_ids=[int(os.environ["LOCAL_RANK"])],
             broadcast_buffers=False,
-            find_unused_parameters=find_unused_parameters)
+            find_unused_parameters=find_unused_parameters,
+        )
     else:
         model = build_dp(model, cfg.device, device_ids=cfg.gpu_ids)
 
@@ -169,23 +166,22 @@ def train_detector(model,
     runner = build_runner(
         cfg.runner,
         default_args=dict(
-            model=model,
-            optimizer=optimizer,
-            work_dir=cfg.work_dir,
-            logger=logger,
-            meta=meta))
+            model=model, optimizer=optimizer, work_dir=cfg.work_dir, logger=logger, meta=meta
+        ),
+    )
 
     # an ugly workaround to make .log and .log.json filenames the same
     runner.timestamp = timestamp
 
     # fp16 setting
-    fp16_cfg = cfg.get('fp16', None)
-    if fp16_cfg is None and cfg.get('device', None) == 'npu':
-        fp16_cfg = dict(loss_scale='dynamic')
+    fp16_cfg = cfg.get("fp16", None)
+    if fp16_cfg is None and cfg.get("device", None) == "npu":
+        fp16_cfg = dict(loss_scale="dynamic")
     if fp16_cfg is not None:
         optimizer_config = Fp16OptimizerHook(
-            **cfg.optimizer_config, **fp16_cfg, distributed=distributed)
-    elif distributed and 'type' not in cfg.optimizer_config:
+            **cfg.optimizer_config, **fp16_cfg, distributed=distributed
+        )
+    elif distributed and "type" not in cfg.optimizer_config:
         optimizer_config = OptimizerHook(**cfg.optimizer_config)
     else:
         optimizer_config = cfg.optimizer_config
@@ -196,8 +192,9 @@ def train_detector(model,
         optimizer_config,
         cfg.checkpoint_config,
         cfg.log_config,
-        cfg.get('momentum_config', None),
-        custom_hooks_config=cfg.get('custom_hooks', None))
+        cfg.get("momentum_config", None),
+        custom_hooks_config=cfg.get("custom_hooks", None),
+    )
 
     if distributed:
         if isinstance(runner, EpochBasedRunner):
@@ -210,31 +207,27 @@ def train_detector(model,
             workers_per_gpu=2,
             dist=distributed,
             shuffle=False,
-            persistent_workers=False)
+            persistent_workers=False,
+        )
 
-        val_dataloader_args = {
-            **val_dataloader_default_args,
-            **cfg.data.get('val_dataloader', {})
-        }
+        val_dataloader_args = {**val_dataloader_default_args, **cfg.data.get("val_dataloader", {})}
         # Support batch_size > 1 in validation
 
-        if val_dataloader_args['samples_per_gpu'] > 1:
+        if val_dataloader_args["samples_per_gpu"] > 1:
             # Replace 'ImageToTensor' to 'DefaultFormatBundle'
-            cfg.data.val.pipeline = replace_ImageToTensor(
-                cfg.data.val.pipeline)
+            cfg.data.val.pipeline = replace_ImageToTensor(cfg.data.val.pipeline)
         val_dataset = build_dataset(cfg.data.val, dict(test_mode=True))
 
         val_dataloader = build_dataloader(val_dataset, **val_dataloader_args)
-        eval_cfg = cfg.get('evaluation', {})
-        eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
+        eval_cfg = cfg.get("evaluation", {})
+        eval_cfg["by_epoch"] = cfg.runner["type"] != "IterBasedRunner"
         eval_hook = DistEvalHook if distributed else EvalHook
         # In this PR (https://github.com/open-mmlab/mmcv/pull/1193), the
         # priority of IterTimerHook has been modified from 'NORMAL' to 'LOW'.
-        runner.register_hook(
-            eval_hook(val_dataloader, **eval_cfg), priority='LOW')
+        runner.register_hook(eval_hook(val_dataloader, **eval_cfg), priority="LOW")
 
     resume_from = None
-    if cfg.resume_from is None and cfg.get('auto_resume'):
+    if cfg.resume_from is None and cfg.get("auto_resume"):
         resume_from = find_latest_checkpoint(cfg.work_dir)
     if resume_from is not None:
         cfg.resume_from = resume_from
diff --git a/mmdet/core/__init__.py b/mmdet/core/__init__.py
index 2a620387..847f94cb 100644
--- a/mmdet/core/__init__.py
+++ b/mmdet/core/__init__.py
@@ -7,4 +7,5 @@
 from .mask import *  # noqa: F401, F403
 from .optimizers import *  # noqa: F401, F403
 from .post_processing import *  # noqa: F401, F403
+from .runner import *  # noqa: F401, F403
 from .utils import *  # noqa: F401, F403
diff --git a/mmdet/core/anchor/__init__.py b/mmdet/core/anchor/__init__.py
index fcc7e4af..a0cb8efc 100644
--- a/mmdet/core/anchor/__init__.py
+++ b/mmdet/core/anchor/__init__.py
@@ -1,14 +1,28 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from .anchor_generator import (AnchorGenerator, LegacyAnchorGenerator,
-                               YOLOAnchorGenerator)
-from .builder import (ANCHOR_GENERATORS, PRIOR_GENERATORS,
-                      build_anchor_generator, build_prior_generator)
+from .anchor_generator import AnchorGenerator, LegacyAnchorGenerator, YOLOAnchorGenerator
+from .builder import (
+    ANCHOR_GENERATORS,
+    PRIOR_GENERATORS,
+    build_anchor_generator,
+    build_prior_generator,
+)
 from .point_generator import MlvlPointGenerator, PointGenerator
-from .utils import anchor_inside_flags, calc_region, images_to_levels
+from .ranchor_generator import RAnchorGenerator
+from .utils import anchor_inside_flags, calc_region, images_to_levels, ranchor_inside_flags
 
 __all__ = [
-    'AnchorGenerator', 'LegacyAnchorGenerator', 'anchor_inside_flags',
-    'PointGenerator', 'images_to_levels', 'calc_region',
-    'build_anchor_generator', 'ANCHOR_GENERATORS', 'YOLOAnchorGenerator',
-    'build_prior_generator', 'PRIOR_GENERATORS', 'MlvlPointGenerator'
+    "AnchorGenerator",
+    "LegacyAnchorGenerator",
+    "anchor_inside_flags",
+    "PointGenerator",
+    "images_to_levels",
+    "calc_region",
+    "build_anchor_generator",
+    "ANCHOR_GENERATORS",
+    "YOLOAnchorGenerator",
+    "build_prior_generator",
+    "PRIOR_GENERATORS",
+    "MlvlPointGenerator",
+    "RAnchorGenerator",
+    "ranchor_inside_flags",
 ]
diff --git a/mmdet/core/anchor/ranchor_generator.py b/mmdet/core/anchor/ranchor_generator.py
new file mode 100644
index 00000000..2f62ac63
--- /dev/null
+++ b/mmdet/core/anchor/ranchor_generator.py
@@ -0,0 +1,200 @@
+import torch
+
+from .anchor_generator import AnchorGenerator
+from .builder import ANCHOR_GENERATORS
+
+
+@ANCHOR_GENERATORS.register_module()
+class RAnchorGenerator(AnchorGenerator):
+    """Standard anchor generator for 2D anchor-based detectors.
+
+    Args:
+        strides (list[int] | list[tuple[int, int]]): Strides of anchors
+            in multiple feature levels.
+        ratios (list[float]): The list of ratios between the height and width
+            of anchors in a single level.
+        scales (list[int] | None): Anchor scales for anchors in a single level.
+            It cannot be set at the same time if `octave_base_scale` and
+            `scales_per_octave` are set.
+        angles (list[int] | None): Anchor angles for anchors in a single level
+        base_sizes (list[int] | None): The basic sizes
+            of anchors in multiple levels.
+            If None is given, strides will be used as base_sizes.
+            (If strides are non square, the shortest stride is taken.)
+        scale_major (bool): Whether to multiply scales first when generating
+            base anchors. If true, the anchors in the same row will have the
+            same scales. By default it is True in V2.0
+        octave_base_scale (int): The base scale of octave.
+        scales_per_octave (int): Number of scales for each octave.
+            `octave_base_scale` and `scales_per_octave` are usually used in
+            retinanet and the `scales` should be None when they are set.
+        centers (list[tuple[float, float]] | None): The centers of the anchor
+            relative to the feature grid center in multiple feature levels.
+            By default it is set to be None and not used. If a list of tuple of
+            float is given, they will be used to shift the centers of anchors.
+        center_offset (float): The offset of center in propotion to anchors'
+            width and height. By default it is 0 in V2.0.
+
+    Examples:
+        >>> from mmdet.core import RAnchorGenerator
+        >>> self = RAnchorGenerator(strides=[16], ratios=[1.], scales=[1.], angles=[0, 45, 90, 135], base_sizes=[9])
+        >>> all_anchors = self.grid_anchors([(2, 2)], device='cpu')
+        >>> print(all_anchors)
+        [tensor([[-4.5000, -4.5000,  4.5000,  4.5000],
+                [11.5000, -4.5000, 20.5000,  4.5000],
+                [-4.5000, 11.5000,  4.5000, 20.5000],
+                [11.5000, 11.5000, 20.5000, 20.5000]])]
+        >>> self = AnchorGenerator([16, 32], [1.], [1.], [9, 18])
+        >>> all_anchors = self.grid_anchors([(2, 2), (1, 1)], device='cpu')
+        >>> print(all_anchors)
+        [tensor([[-4.5000, -4.5000,  4.5000,  4.5000],
+                [11.5000, -4.5000, 20.5000,  4.5000],
+                [-4.5000, 11.5000,  4.5000, 20.5000],
+                [11.5000, 11.5000, 20.5000, 20.5000]]), \
+        tensor([[-9., -9., 9., 9.]])]
+    """
+
+    def __init__(self, angles=None, *arg, **kwarg):
+        self.angles = torch.Tensor(angles) if angles is not None else torch.Tensor([0.0])
+
+        super(RAnchorGenerator, self).__init__(*arg, **kwarg)
+
+    def gen_base_anchors(self):
+        """Generate base anchors.
+
+        Returns:
+            list(torch.Tensor): Base anchors of a feature grid in multiple \
+                feature levels.
+        """
+        multi_level_base_anchors = []
+        for i, base_size in enumerate(self.base_sizes):
+            center = None
+            if self.centers is not None:
+                center = self.centers[i]
+            multi_level_base_anchors.append(
+                self.gen_single_level_base_anchors(
+                    base_size,
+                    scales=self.scales,
+                    ratios=self.ratios,
+                    angles=self.angles,
+                    center=center,
+                )
+            )
+        return multi_level_base_anchors
+
+    def gen_single_level_base_anchors(self, base_size, scales, ratios, angles=None, center=None):
+        """Generate base anchors of a single level.
+
+        Args:
+            base_size (int | float): Basic size of an anchor.
+            scales (torch.Tensor): Scales of the anchor.
+            ratios (torch.Tensor): The ratio between between the height
+                and width of anchors in a single level.
+            center (tuple[float], optional): The center of the base anchor
+                related to a single feature grid. Defaults to None.
+
+        Returns:
+            torch.Tensor: Anchors in a single-level feature maps.
+        """
+        w = base_size
+        h = base_size
+        if center is None:
+            x_center = self.center_offset * w
+            y_center = self.center_offset * h
+        else:
+            x_center, y_center = center
+
+        h_ratios = torch.sqrt(ratios)
+        w_ratios = 1 / h_ratios
+        if self.scale_major:
+            ws = (w * w_ratios[:, None] * scales[None, :]).view(-1)
+            hs = (h * h_ratios[:, None] * scales[None, :]).view(-1)
+        else:
+            ws = (w * scales[:, None] * w_ratios[None, :]).view(-1)
+            hs = (h * scales[:, None] * h_ratios[None, :]).view(-1)
+
+        # use float anchor and the anchor's center is aligned with the
+        # pixel center
+        if angles is not None:
+            ws, _ = self._meshgrid(ws, angles)
+            hs, angles = self._meshgrid(hs, angles)
+            base_anchors = [
+                x_center - 0.5 * ws,
+                y_center - 0.5 * hs,
+                x_center + 0.5 * ws,
+                y_center + 0.5 * hs,
+                angles,
+            ]
+            base_anchors = torch.stack(base_anchors, dim=-1)
+            w = base_anchors[:, 2] - base_anchors[:, 0]
+            h = base_anchors[:, 3] - base_anchors[:, 1]
+            cx = base_anchors[:, 0] + 0.5 * w
+            cy = base_anchors[:, 1] + 0.5 * h
+            angle = base_anchors[:, 4]
+            base_anchors = torch.stack([cx, cy, w, h, angle], dim=-1)
+        else:
+            base_anchors = [
+                x_center - 0.5 * ws,
+                y_center - 0.5 * hs,
+                x_center + 0.5 * ws,
+                y_center + 0.5 * hs,
+            ]
+            base_anchors = torch.stack(base_anchors, dim=-1)
+
+        return base_anchors
+
+    def single_level_grid_anchors(self, base_anchors, featmap_size, stride=(16, 16), device="cuda"):
+        """Generate grid anchors of a single level.
+
+        Note:
+            This function is usually called by method ``self.grid_anchors``.
+
+        Args:
+            base_anchors (torch.Tensor): The base anchors of a feature grid.
+            featmap_size (tuple[int]): Size of the feature maps.
+            stride (tuple[int], optional): Stride of the feature map.
+                Defaults to (16, 16).
+            device (str, optional): Device the tensor will be put on.
+                Defaults to 'cuda'.
+
+        Returns:
+            torch.Tensor: Anchors in the overall feature maps.
+        """
+        feat_h, feat_w = featmap_size
+        shift_x = torch.arange(0, feat_w, device=device) * stride[0]
+        shift_y = torch.arange(0, feat_h, device=device) * stride[1]
+        shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
+        if self.angles is not None:
+            zero_shift = torch.zeros_like(shift_xx, device=device)
+            shifts = torch.stack([shift_xx, shift_yy, zero_shift, zero_shift, zero_shift], dim=-1)
+        else:
+            shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
+        shifts = shifts.type_as(base_anchors)
+        # first feat_w elements correspond to the first row of shifts
+        # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
+        # shifted anchors (K, A, 4), reshape to (K*A, 4)
+
+        all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
+        all_anchors = all_anchors.view(-1, 5 if self.angles is not None else 4)
+        # first A rows correspond to A anchors of (0, 0) in feature map,
+        # then (0, 1), (0, 2), ...
+        return all_anchors
+
+    def __repr__(self):
+        """str: a string that describes the module"""
+        indent_str = "    "
+        repr_str = self.__class__.__name__ + "(\n"
+        repr_str += f"{indent_str}strides={self.strides},\n"
+        repr_str += f"{indent_str}ratios={self.ratios},\n"
+        repr_str += f"{indent_str}scales={self.scales},\n"
+        repr_str += f"{indent_str}angles={self.angles},\n"
+        repr_str += f"{indent_str}base_sizes={self.base_sizes},\n"
+        repr_str += f"{indent_str}scale_major={self.scale_major},\n"
+        repr_str += f"{indent_str}octave_base_scale="
+        repr_str += f"{self.octave_base_scale},\n"
+        repr_str += f"{indent_str}scales_per_octave="
+        repr_str += f"{self.scales_per_octave},\n"
+        repr_str += f"{indent_str}num_levels={self.num_levels}\n"
+        repr_str += f"{indent_str}centers={self.centers},\n"
+        repr_str += f"{indent_str}center_offset={self.center_offset})"
+        return repr_str
diff --git a/mmdet/core/anchor/utils.py b/mmdet/core/anchor/utils.py
index c2f20247..6bb498fa 100644
--- a/mmdet/core/anchor/utils.py
+++ b/mmdet/core/anchor/utils.py
@@ -18,10 +18,7 @@ def images_to_levels(target, num_levels):
     return level_targets
 
 
-def anchor_inside_flags(flat_anchors,
-                        valid_flags,
-                        img_shape,
-                        allowed_border=0):
+def anchor_inside_flags(flat_anchors, valid_flags, img_shape, allowed_border=0):
     """Check whether the anchors are inside the border.
 
     Args:
@@ -37,11 +34,42 @@ def anchor_inside_flags(flat_anchors,
     """
     img_h, img_w = img_shape[:2]
     if allowed_border >= 0:
-        inside_flags = valid_flags & \
-            (flat_anchors[:, 0] >= -allowed_border) & \
-            (flat_anchors[:, 1] >= -allowed_border) & \
-            (flat_anchors[:, 2] < img_w + allowed_border) & \
-            (flat_anchors[:, 3] < img_h + allowed_border)
+        inside_flags = (
+            valid_flags
+            & (flat_anchors[:, 0] >= -allowed_border)
+            & (flat_anchors[:, 1] >= -allowed_border)
+            & (flat_anchors[:, 2] < img_w + allowed_border)
+            & (flat_anchors[:, 3] < img_h + allowed_border)
+        )
+    else:
+        inside_flags = valid_flags
+    return inside_flags
+
+
+def ranchor_inside_flags(flat_ranchors, valid_flags, img_shape, allowed_border=0):
+    """Check whether the ranchors are inside the border.
+
+    Args:
+        flat_ranchors (torch.Tensor): Flatten anchors, shape (n, 5), (cx, cy, w, h, angle).
+        valid_flags (torch.Tensor): An existing valid flags of anchors.
+        img_shape (tuple(int)): Shape of current image.
+        allowed_border (int, optional): The border to allow the valid anchor.
+            Defaults to 0.
+
+    Returns:
+        torch.Tensor: Flags indicating whether the anchors are inside a \
+            valid range.
+    """
+    img_h, img_w = img_shape[:2]
+    if allowed_border >= 0:
+        cx, cy = (flat_ranchors[:, i] for i in range(2))
+        inside_flags = (
+            valid_flags
+            & (cx >= -allowed_border)
+            & (cy >= -allowed_border)
+            & (cx < img_w + allowed_border)
+            & (cy < img_h + allowed_border)
+        )
     else:
         inside_flags = valid_flags
     return inside_flags
diff --git a/mmdet/core/bbox/coder/__init__.py b/mmdet/core/bbox/coder/__init__.py
index e12fd64e..2dfc7926 100644
--- a/mmdet/core/bbox/coder/__init__.py
+++ b/mmdet/core/bbox/coder/__init__.py
@@ -1,6 +1,10 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from .base_bbox_coder import BaseBBoxCoder
 from .bucketing_bbox_coder import BucketingBBoxCoder
+from .delta_height_coder import DeltaHeightCoder
+from .delta_polar_offset_coder import DeltaPolarOffsetCoder
+from .delta_rbbox_coder import DeltaRBBoxCoder
+from .delta_xy_offset_coder import DeltaXYOffsetCoder
 from .delta_xywh_bbox_coder import DeltaXYWHBBoxCoder
 from .distance_point_bbox_coder import DistancePointBBoxCoder
 from .legacy_delta_xywh_bbox_coder import LegacyDeltaXYWHBBoxCoder
@@ -9,7 +13,16 @@
 from .yolo_bbox_coder import YOLOBBoxCoder
 
 __all__ = [
-    'BaseBBoxCoder', 'PseudoBBoxCoder', 'DeltaXYWHBBoxCoder',
-    'LegacyDeltaXYWHBBoxCoder', 'TBLRBBoxCoder', 'YOLOBBoxCoder',
-    'BucketingBBoxCoder', 'DistancePointBBoxCoder'
+    "BaseBBoxCoder",
+    "PseudoBBoxCoder",
+    "DeltaXYWHBBoxCoder",
+    "LegacyDeltaXYWHBBoxCoder",
+    "TBLRBBoxCoder",
+    "YOLOBBoxCoder",
+    "BucketingBBoxCoder",
+    "DistancePointBBoxCoder",
+    "DeltaXYOffsetCoder",
+    "DeltaPolarOffsetCoder",
+    "DeltaRBBoxCoder",
+    "DeltaHeightCoder",
 ]
diff --git a/mmdet/core/bbox/coder/delta_height_coder.py b/mmdet/core/bbox/coder/delta_height_coder.py
new file mode 100644
index 00000000..e76d5ee7
--- /dev/null
+++ b/mmdet/core/bbox/coder/delta_height_coder.py
@@ -0,0 +1,60 @@
+import torch
+
+from ..builder import BBOX_CODERS
+from .base_bbox_coder import BaseBBoxCoder
+
+
+@BBOX_CODERS.register_module()
+class DeltaHeightCoder(BaseBBoxCoder):
+    def __init__(self, target_means=(0.0), target_stds=(0.5)):
+        super(BaseBBoxCoder, self).__init__()
+        self.means = target_means
+        self.stds = target_stds
+
+    def encode(self, bboxes, gt_heights):
+        assert bboxes.size(0) == gt_heights.size(0)
+        assert gt_heights.size(-1) == 1
+        encoded_offsets = height2delta(bboxes, gt_heights, self.means, self.stds)
+        return encoded_offsets
+
+    def decode(self, bboxes, pred_heights):
+        assert pred_heights.size(0) == bboxes.size(0)
+        decoded_heights = delta2height(bboxes, pred_heights, self.means, self.stds)
+
+        return decoded_heights
+
+
+def height2delta(proposals, gt, means=(0.0), stds=(0.5)):
+    assert proposals.size()[0] == gt.size()[0]
+
+    proposals = proposals.float()
+    gt = gt.float()
+    pw = proposals[..., 2] - proposals[..., 0]
+    ph = proposals[..., 3] - proposals[..., 1]
+
+    gh = gt[..., 0]
+
+    pl = torch.sqrt(pw * pw + ph * ph)
+    dh = gh / pl
+    deltas = torch.stack([dh], dim=-1)
+
+    means = deltas.new_tensor(means).unsqueeze(0)
+    stds = deltas.new_tensor(stds).unsqueeze(0)
+    deltas = deltas.sub_(means).div_(stds)
+
+    return deltas
+
+
+def delta2height(rois, deltas, means=(0.0), stds=(1.0)):
+    means = deltas.new_tensor(means).unsqueeze(0)
+    stds = deltas.new_tensor(stds).unsqueeze(0)
+    dh = deltas * stds + means
+
+    # Compute width/height of each roi
+    pw = (rois[:, 2] - rois[:, 0]).unsqueeze(1).expand_as(dh)
+    ph = (rois[:, 3] - rois[:, 1]).unsqueeze(1).expand_as(dh)
+
+    pl = torch.sqrt(pw * pw + ph * ph)
+    heights = dh * pl
+
+    return heights
diff --git a/mmdet/core/bbox/coder/delta_polar_offset_coder.py b/mmdet/core/bbox/coder/delta_polar_offset_coder.py
new file mode 100644
index 00000000..b254ecc2
--- /dev/null
+++ b/mmdet/core/bbox/coder/delta_polar_offset_coder.py
@@ -0,0 +1,96 @@
+# -*- encoding: utf-8 -*-
+"""
+@File    :   delta_polar_offset_coder.py
+@Time    :   2021/01/17 17:30:31
+@Author  :   Jinwang Wang
+@Version :   1.0
+@Contact :   jwwangchn@163.com
+@License :   (C)Copyright 2017-2021
+@Desc    :   encode offset in polar coordinate
+"""
+
+import torch
+
+from ..builder import BBOX_CODERS
+from .base_bbox_coder import BaseBBoxCoder
+
+
+@BBOX_CODERS.register_module()
+class DeltaPolarOffsetCoder(BaseBBoxCoder):
+    def __init__(self, target_means=(0.0, 0.0), target_stds=(0.5, 0.5), with_bbox=True):
+        super(BaseBBoxCoder, self).__init__()
+        self.means = target_means
+        self.stds = target_stds
+        self.with_bbox = with_bbox
+
+    def encode(self, bboxes, gt_offsets):
+        assert bboxes.size(0) == gt_offsets.size(0)
+        assert gt_offsets.size(-1) == 2
+        encoded_offsets = offset2delta(bboxes, gt_offsets, self.means, self.stds, self.with_bbox)
+        return encoded_offsets
+
+    def decode(self, bboxes, pred_offsets, max_shape=None, wh_ratio_clip=16 / 1000):
+        assert pred_offsets.size(0) == bboxes.size(0)
+        decoded_offsets = delta2offset(
+            bboxes, pred_offsets, self.means, self.stds, max_shape, wh_ratio_clip, self.with_bbox
+        )
+
+        return decoded_offsets
+
+
+def offset2delta(proposals, gt, means=(0.0, 0.0), stds=(0.5, 0.5), with_bbox=True):
+    assert proposals.size()[0] == gt.size()[0]
+
+    proposals = proposals.float()
+    gt = gt.float()
+    proposal_w = proposals[..., 2] - proposals[..., 0]
+    proposal_h = proposals[..., 3] - proposals[..., 1]
+
+    gt_length = gt[..., 0]
+    gt_angle = gt[..., 1]
+
+    proposal_length = torch.sqrt(proposal_w**2 + proposal_h**2)
+
+    if with_bbox:
+        delta_length = gt_length / proposal_length
+    else:
+        delta_length = gt_length
+    delta_angle = gt_angle
+    deltas = torch.stack([delta_length, delta_angle], dim=-1)
+
+    means = deltas.new_tensor(means).unsqueeze(0)
+    stds = deltas.new_tensor(stds).unsqueeze(0)
+    deltas = deltas.sub_(means).div_(stds)
+
+    return deltas
+
+
+def delta2offset(
+    rois,
+    deltas,
+    means=(0.0, 0.0),
+    stds=(1.0, 1.0),
+    max_shape=None,
+    wh_ratio_clip=16 / 1000,
+    with_bbox=True,
+):
+    means = deltas.new_tensor(means).repeat(1, deltas.size(1) // 2)
+    stds = deltas.new_tensor(stds).repeat(1, deltas.size(1) // 2)
+    denorm_deltas = deltas * stds + means
+    delta_length = denorm_deltas[:, 0::2]
+    delta_angle = denorm_deltas[:, 1::2]
+    # Compute width/height of each roi
+    proposal_w = (rois[:, 2] - rois[:, 0]).unsqueeze(1).expand_as(delta_length)
+    proposal_h = (rois[:, 3] - rois[:, 1]).unsqueeze(1).expand_as(delta_angle)
+    # Use network energy to shift the center of each roi
+    proposal_length = torch.sqrt(proposal_w**2 + proposal_h**2)
+    if with_bbox:
+        gt_length = proposal_length * delta_length
+    else:
+        gt_length = delta_length
+    gt_angle = delta_angle
+    if max_shape is not None:
+        gt_length = gt_length.clamp(min=-max_shape[1], max=max_shape[1])
+        delta_angle = delta_angle.clamp(min=-max_shape[0], max=max_shape[0])
+    bboxes = torch.stack([gt_length, gt_angle], dim=-1).view_as(deltas)
+    return bboxes
diff --git a/mmdet/core/bbox/coder/delta_rbbox_coder.py b/mmdet/core/bbox/coder/delta_rbbox_coder.py
new file mode 100644
index 00000000..f080189e
--- /dev/null
+++ b/mmdet/core/bbox/coder/delta_rbbox_coder.py
@@ -0,0 +1,130 @@
+# -*- encoding: utf-8 -*-
+"""
+@File    :   delta_rbbox_coder.py
+@Time    :   2021/01/17 17:31:03
+@Author  :   Jinwang Wang
+@Version :   1.0
+@Contact :   jwwangchn@163.com
+@License :   (C)Copyright 2017-2021
+@Desc    :   rbbox encoder
+"""
+
+import numpy as np
+import torch
+
+from ..builder import BBOX_CODERS
+from .base_bbox_coder import BaseBBoxCoder
+
+
+@BBOX_CODERS.register_module()
+class DeltaRBBoxCoder(BaseBBoxCoder):
+    def __init__(
+        self,
+        target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
+        target_stds=(1.0, 1.0, 1.0, 1.0, 1.0),
+        encode_method="thetaobb",
+    ):
+        super(BaseBBoxCoder, self).__init__()
+        self.means = target_means
+        self.stds = target_stds
+        self.encode_method = encode_method
+
+    def encode(self, rbboxes, gt_rbboxes):
+        assert rbboxes.size(0) == gt_rbboxes.size(0)
+        if self.encode_method == "thetaobb":
+            assert rbboxes.size(-1) == gt_rbboxes.size(-1) == 5
+            encoded_rbboxes = thetaobb2delta(rbboxes, gt_rbboxes, self.means, self.stds)
+        else:
+            raise (RuntimeError("do not support the encode mthod: {}".format(self.encode_method)))
+
+        return encoded_rbboxes
+
+    def decode(self, rbboxes, pred_rbboxes, max_shape=None, wh_ratio_clip=16 / 1000):
+        assert pred_rbboxes.size(0) == rbboxes.size(0)
+        if self.encode_method == "thetaobb":
+            decoded_rbboxes = delta2thetaobb(
+                rbboxes, pred_rbboxes, self.means, self.stds, max_shape, wh_ratio_clip
+            )
+        else:
+            raise (RuntimeError("do not support the encode mthod: {}".format(self.encode_method)))
+
+        return decoded_rbboxes
+
+
+def thetaobb2delta(proposals, gt, means=(0.0, 0.0, 0.0, 0.0, 0.0), stds=(1.0, 1.0, 1.0, 1.0, 1.0)):
+    # proposals: (x1, y1, x2, y2)
+    # gt: (cx, cy, w, h, theta)
+    assert proposals.size(0) == gt.size(0)
+
+    proposals = proposals.float()
+    gt = gt.float()
+
+    px = proposals[..., 0]
+    py = proposals[..., 1]
+    pw = proposals[..., 2]
+    ph = proposals[..., 3]
+    pa = proposals[..., 4]
+
+    gx = gt[..., 0]
+    gy = gt[..., 1]
+    gw = gt[..., 2]
+    gh = gt[..., 3]
+    ga = gt[..., 4]
+
+    dx = (gx - px) / pw
+    dy = (gy - py) / ph
+    dw = torch.log(gw / pw)
+    dh = torch.log(gh / ph)
+    da = (ga - pa) * np.pi / 180
+
+    deltas = torch.stack([dx, dy, dw, dh, da], dim=-1)
+
+    means = deltas.new_tensor(means).unsqueeze(0)
+    stds = deltas.new_tensor(stds).unsqueeze(0)
+    deltas = deltas.sub_(means).div_(stds)
+
+    return deltas
+
+
+def delta2thetaobb(
+    rois,
+    deltas,
+    means=[0.0, 0.0, 0.0, 0.0, 0.0],
+    stds=[1.0, 1.0, 1.0, 1.0, 1.0],
+    max_shape=None,
+    wh_ratio_clip=16 / 1000,
+):
+    means = deltas.new_tensor(means).repeat(1, deltas.size(1) // 5)
+    stds = deltas.new_tensor(stds).repeat(1, deltas.size(1) // 5)
+    denorm_deltas = deltas * stds + means
+
+    dx = denorm_deltas[:, 0::5]
+    dy = denorm_deltas[:, 1::5]
+    dw = denorm_deltas[:, 2::5]
+    dh = denorm_deltas[:, 3::5]
+    da = denorm_deltas[:, 4::5] * 180.0 / np.pi
+
+    max_ratio = np.abs(np.log(wh_ratio_clip))
+
+    dw = dw.clamp(min=-max_ratio, max=max_ratio)
+    dh = dh.clamp(min=-max_ratio, max=max_ratio)
+
+    px = (rois[:, 0]).unsqueeze(1).expand_as(dx)
+    py = (rois[:, 1]).unsqueeze(1).expand_as(dy)
+    pw = (rois[:, 2]).unsqueeze(1).expand_as(dw)
+    ph = (rois[:, 3]).unsqueeze(1).expand_as(dh)
+    pa = (rois[:, 4]).unsqueeze(1).expand_as(da)
+
+    gw = pw * dw.exp()
+    gh = ph * dh.exp()
+    gx = torch.addcmul(px, 1, pw, dx)  # gx = px + pw * dx
+    gy = torch.addcmul(py, 1, ph, dy)  # gy = py + ph * dy
+    ga = da + pa
+
+    if max_shape is not None:
+        gx = gx.clamp(min=0, max=max_shape[0])
+        gy = gy.clamp(min=0, max=max_shape[1])
+        gw = gw.clamp(min=0, max=max_shape[0])
+        gh = gh.clamp(min=0, max=max_shape[1])
+    thetaobbs = torch.stack([gx, gy, gw, gh, ga], dim=-1).view_as(deltas)
+    return thetaobbs
diff --git a/mmdet/core/bbox/coder/delta_xy_offset_coder.py b/mmdet/core/bbox/coder/delta_xy_offset_coder.py
new file mode 100644
index 00000000..f50bdda0
--- /dev/null
+++ b/mmdet/core/bbox/coder/delta_xy_offset_coder.py
@@ -0,0 +1,80 @@
+# -*- encoding: utf-8 -*-
+"""
+@File    :   delta_xy_offset_coder.py
+@Time    :   2021/01/17 17:31:16
+@Author  :   Jinwang Wang
+@Version :   1.0
+@Contact :   jwwangchn@163.com
+@License :   (C)Copyright 2017-2021
+@Desc    :   encode offset in (x, y) coordinate
+"""
+
+import torch
+
+from ..builder import BBOX_CODERS
+from .base_bbox_coder import BaseBBoxCoder
+
+
+@BBOX_CODERS.register_module()
+class DeltaXYOffsetCoder(BaseBBoxCoder):
+    def __init__(self, target_means=(0.0, 0.0), target_stds=(0.5, 0.5)):
+        super(BaseBBoxCoder, self).__init__()
+        self.means = target_means
+        self.stds = target_stds
+
+    def encode(self, bboxes, gt_offsets):
+        assert bboxes.size(0) == gt_offsets.size(0)
+        assert gt_offsets.size(-1) == 2
+        encoded_offsets = offset2delta(bboxes, gt_offsets, self.means, self.stds)
+        return encoded_offsets
+
+    def decode(self, bboxes, pred_offsets, max_shape=None, wh_ratio_clip=16 / 1000):
+        assert pred_offsets.size(0) == bboxes.size(0)
+        decoded_offsets = delta2offset(
+            bboxes, pred_offsets, self.means, self.stds, max_shape, wh_ratio_clip
+        )
+
+        return decoded_offsets
+
+
+def offset2delta(proposals, gt, means=(0.0, 0.0), stds=(0.5, 0.5)):
+    assert proposals.size()[0] == gt.size()[0]
+
+    proposals = proposals.float()
+    gt = gt.float()
+    pw = proposals[..., 2] - proposals[..., 0]
+    ph = proposals[..., 3] - proposals[..., 1]
+
+    gx = gt[..., 0]
+    gy = gt[..., 1]
+
+    dx = gx / pw
+    dy = gy / ph
+    deltas = torch.stack([dx, dy], dim=-1)
+
+    means = deltas.new_tensor(means).unsqueeze(0)
+    stds = deltas.new_tensor(stds).unsqueeze(0)
+    deltas = deltas.sub_(means).div_(stds)
+
+    return deltas
+
+
+def delta2offset(
+    rois, deltas, means=(0.0, 0.0), stds=(1.0, 1.0), max_shape=None, wh_ratio_clip=16 / 1000
+):
+    means = deltas.new_tensor(means).repeat(1, deltas.size(1) // 2)
+    stds = deltas.new_tensor(stds).repeat(1, deltas.size(1) // 2)
+    denorm_deltas = deltas * stds + means
+    dx = denorm_deltas[:, 0::2]
+    dy = denorm_deltas[:, 1::2]
+    # Compute width/height of each roi
+    pw = (rois[:, 2] - rois[:, 0]).unsqueeze(1).expand_as(dx)
+    ph = (rois[:, 3] - rois[:, 1]).unsqueeze(1).expand_as(dy)
+    # Use network energy to shift the center of each roi
+    gx = pw * dx
+    gy = ph * dy
+    if max_shape is not None:
+        gx = gx.clamp(min=-max_shape[1], max=max_shape[1])
+        gy = gy.clamp(min=-max_shape[0], max=max_shape[0])
+    offsets = torch.stack([gx, gy], dim=-1).view_as(deltas)
+    return offsets
diff --git a/mmdet/core/bbox/samplers/sampling_result.py b/mmdet/core/bbox/samplers/sampling_result.py
index 11a02c5d..075aee03 100644
--- a/mmdet/core/bbox/samplers/sampling_result.py
+++ b/mmdet/core/bbox/samplers/sampling_result.py
@@ -23,8 +23,7 @@ class SamplingResult(util_mixins.NiceRepr):
         })>
     """
 
-    def __init__(self, pos_inds, neg_inds, bboxes, gt_bboxes, assign_result,
-                 gt_flags):
+    def __init__(self, pos_inds, neg_inds, bboxes, gt_bboxes, assign_result, gt_flags):
         self.pos_inds = pos_inds
         self.neg_inds = neg_inds
         self.pos_bboxes = bboxes[pos_inds]
@@ -71,23 +70,23 @@ def to(self, device):
 
     def __nice__(self):
         data = self.info.copy()
-        data['pos_bboxes'] = data.pop('pos_bboxes').shape
-        data['neg_bboxes'] = data.pop('neg_bboxes').shape
+        data["pos_bboxes"] = data.pop("pos_bboxes").shape
+        data["neg_bboxes"] = data.pop("neg_bboxes").shape
         parts = [f"'{k}': {v!r}" for k, v in sorted(data.items())]
-        body = '    ' + ',\n    '.join(parts)
-        return '{\n' + body + '\n}'
+        body = "    " + ",\n    ".join(parts)
+        return "{\n" + body + "\n}"
 
     @property
     def info(self):
         """Returns a dictionary of info about the object."""
         return {
-            'pos_inds': self.pos_inds,
-            'neg_inds': self.neg_inds,
-            'pos_bboxes': self.pos_bboxes,
-            'neg_bboxes': self.neg_bboxes,
-            'pos_is_gt': self.pos_is_gt,
-            'num_gts': self.num_gts,
-            'pos_assigned_gt_inds': self.pos_assigned_gt_inds,
+            "pos_inds": self.pos_inds,
+            "neg_inds": self.neg_inds,
+            "pos_bboxes": self.pos_bboxes,
+            "neg_bboxes": self.neg_bboxes,
+            "pos_is_gt": self.pos_is_gt,
+            "num_gts": self.num_gts,
+            "pos_assigned_gt_inds": self.pos_assigned_gt_inds,
         }
 
     @classmethod
@@ -115,6 +114,7 @@ def random(cls, rng=None, **kwargs):
         from mmdet.core.bbox import demodata
         from mmdet.core.bbox.assigners.assign_result import AssignResult
         from mmdet.core.bbox.samplers.random_sampler import RandomSampler
+
         rng = demodata.ensure_rng(rng)
 
         # make probabilistic?
@@ -148,6 +148,18 @@ def random(cls, rng=None, **kwargs):
             pos_fraction,
             neg_pos_ub=neg_pos_ub,
             add_gt_as_proposals=add_gt_as_proposals,
-            rng=rng)
+            rng=rng,
+        )
         self = sampler.sample(assign_result, bboxes, gt_bboxes, gt_labels)
         return self
+
+    def refresh_gt_bboxes(self, gt_bboxes):
+        if gt_bboxes.numel() == 0:
+            # hack for index error case
+            assert self.pos_assigned_gt_inds.numel() == 0
+            self.pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4)
+        else:
+            if len(gt_bboxes.shape) < 2:
+                gt_bboxes = gt_bboxes.view(-1, 4)
+
+            self.pos_gt_bboxes = gt_bboxes[self.pos_assigned_gt_inds.long(), :]
diff --git a/mmdet/core/mask/structures.py b/mmdet/core/mask/structures.py
index 7e730dc5..c3e83e69 100644
--- a/mmdet/core/mask/structures.py
+++ b/mmdet/core/mask/structures.py
@@ -13,7 +13,7 @@ class BaseInstanceMasks(metaclass=ABCMeta):
     """Base class for instance masks."""
 
     @abstractmethod
-    def rescale(self, scale, interpolation='nearest'):
+    def rescale(self, scale, interpolation="nearest"):
         """Rescale masks as large as possible while keeping the aspect ratio.
         For details can refer to `mmcv.imrescale`.
 
@@ -26,7 +26,7 @@ def rescale(self, scale, interpolation='nearest'):
         """
 
     @abstractmethod
-    def resize(self, out_shape, interpolation='nearest'):
+    def resize(self, out_shape, interpolation="nearest"):
         """Resize masks to the given out_shape.
 
         Args:
@@ -38,7 +38,7 @@ def resize(self, out_shape, interpolation='nearest'):
         """
 
     @abstractmethod
-    def flip(self, flip_direction='horizontal'):
+    def flip(self, flip_direction="horizontal"):
         """Flip masks alone the given direction.
 
         Args:
@@ -72,13 +72,9 @@ def crop(self, bbox):
         """
 
     @abstractmethod
-    def crop_and_resize(self,
-                        bboxes,
-                        out_shape,
-                        inds,
-                        device,
-                        interpolation='bilinear',
-                        binarize=True):
+    def crop_and_resize(
+        self, bboxes, out_shape, inds, device, interpolation="bilinear", binarize=True
+    ):
         """Crop and resize masks by the given bboxes.
 
         This function is mainly used in mask targets computation.
@@ -104,6 +100,10 @@ def crop_and_resize(self,
     def expand(self, expanded_h, expanded_w, top, left):
         """see :class:`Expand`."""
 
+    @abstractmethod
+    def translation(self, offset_x, offset_y):
+        pass
+
     @property
     @abstractmethod
     def areas(self):
@@ -130,12 +130,9 @@ def to_tensor(self, dtype, device):
         """
 
     @abstractmethod
-    def translate(self,
-                  out_shape,
-                  offset,
-                  direction='horizontal',
-                  fill_val=0,
-                  interpolation='bilinear'):
+    def translate(
+        self, out_shape, offset, direction="horizontal", fill_val=0, interpolation="bilinear"
+    ):
         """Translate the masks.
 
         Args:
@@ -150,12 +147,9 @@ def translate(self,
             Translated masks.
         """
 
-    def shear(self,
-              out_shape,
-              magnitude,
-              direction='horizontal',
-              border_value=0,
-              interpolation='bilinear'):
+    def shear(
+        self, out_shape, magnitude, direction="horizontal", border_value=0, interpolation="bilinear"
+    ):
         """Shear the masks.
 
         Args:
@@ -252,52 +246,51 @@ def __iter__(self):
         return iter(self.masks)
 
     def __repr__(self):
-        s = self.__class__.__name__ + '('
-        s += f'num_masks={len(self.masks)}, '
-        s += f'height={self.height}, '
-        s += f'width={self.width})'
+        s = self.__class__.__name__ + "("
+        s += f"num_masks={len(self.masks)}, "
+        s += f"height={self.height}, "
+        s += f"width={self.width})"
         return s
 
     def __len__(self):
         """Number of masks."""
         return len(self.masks)
 
-    def rescale(self, scale, interpolation='nearest'):
+    def rescale(self, scale, interpolation="nearest"):
         """See :func:`BaseInstanceMasks.rescale`."""
         if len(self.masks) == 0:
             new_w, new_h = mmcv.rescale_size((self.width, self.height), scale)
             rescaled_masks = np.empty((0, new_h, new_w), dtype=np.uint8)
         else:
-            rescaled_masks = np.stack([
-                mmcv.imrescale(mask, scale, interpolation=interpolation)
-                for mask in self.masks
-            ])
+            rescaled_masks = np.stack(
+                [mmcv.imrescale(mask, scale, interpolation=interpolation) for mask in self.masks]
+            )
         height, width = rescaled_masks.shape[1:]
         return BitmapMasks(rescaled_masks, height, width)
 
-    def resize(self, out_shape, interpolation='nearest'):
+    def resize(self, out_shape, interpolation="nearest"):
         """See :func:`BaseInstanceMasks.resize`."""
         if len(self.masks) == 0:
             resized_masks = np.empty((0, *out_shape), dtype=np.uint8)
         else:
-            resized_masks = np.stack([
-                mmcv.imresize(
-                    mask, out_shape[::-1], interpolation=interpolation)
-                for mask in self.masks
-            ])
+            resized_masks = np.stack(
+                [
+                    mmcv.imresize(mask, out_shape[::-1], interpolation=interpolation)
+                    for mask in self.masks
+                ]
+            )
         return BitmapMasks(resized_masks, *out_shape)
 
-    def flip(self, flip_direction='horizontal'):
+    def flip(self, flip_direction="horizontal"):
         """See :func:`BaseInstanceMasks.flip`."""
-        assert flip_direction in ('horizontal', 'vertical', 'diagonal')
+        assert flip_direction in ("horizontal", "vertical", "diagonal")
 
         if len(self.masks) == 0:
             flipped_masks = self.masks
         else:
-            flipped_masks = np.stack([
-                mmcv.imflip(mask, direction=flip_direction)
-                for mask in self.masks
-            ])
+            flipped_masks = np.stack(
+                [mmcv.imflip(mask, direction=flip_direction) for mask in self.masks]
+            )
         return BitmapMasks(flipped_masks, self.height, self.width)
 
     def pad(self, out_shape, pad_val=0):
@@ -305,10 +298,9 @@ def pad(self, out_shape, pad_val=0):
         if len(self.masks) == 0:
             padded_masks = np.empty((0, *out_shape), dtype=np.uint8)
         else:
-            padded_masks = np.stack([
-                mmcv.impad(mask, shape=out_shape, pad_val=pad_val)
-                for mask in self.masks
-            ])
+            padded_masks = np.stack(
+                [mmcv.impad(mask, shape=out_shape, pad_val=pad_val) for mask in self.masks]
+            )
         return BitmapMasks(padded_masks, *out_shape)
 
     def crop(self, bbox):
@@ -327,16 +319,12 @@ def crop(self, bbox):
         if len(self.masks) == 0:
             cropped_masks = np.empty((0, h, w), dtype=np.uint8)
         else:
-            cropped_masks = self.masks[:, y1:y1 + h, x1:x1 + w]
+            cropped_masks = self.masks[:, y1 : y1 + h, x1 : x1 + w]
         return BitmapMasks(cropped_masks, h, w)
 
-    def crop_and_resize(self,
-                        bboxes,
-                        out_shape,
-                        inds,
-                        device='cpu',
-                        interpolation='bilinear',
-                        binarize=True):
+    def crop_and_resize(
+        self, bboxes, out_shape, inds, device="cpu", interpolation="bilinear", binarize=True
+    ):
         """See :func:`BaseInstanceMasks.crop_and_resize`."""
         if len(self.masks) == 0:
             empty_masks = np.empty((0, *out_shape), dtype=np.uint8)
@@ -349,15 +337,16 @@ def crop_and_resize(self,
             inds = torch.from_numpy(inds).to(device=device)
 
         num_bbox = bboxes.shape[0]
-        fake_inds = torch.arange(
-            num_bbox, device=device).to(dtype=bboxes.dtype)[:, None]
+        fake_inds = torch.arange(num_bbox, device=device).to(dtype=bboxes.dtype)[:, None]
         rois = torch.cat([fake_inds, bboxes], dim=1)  # Nx5
         rois = rois.to(device=device)
         if num_bbox > 0:
-            gt_masks_th = torch.from_numpy(self.masks).to(device).index_select(
-                0, inds).to(dtype=rois.dtype)
-            targets = roi_align(gt_masks_th[:, None, :, :], rois, out_shape,
-                                1.0, 0, 'avg', True).squeeze(1)
+            gt_masks_th = (
+                torch.from_numpy(self.masks).to(device).index_select(0, inds).to(dtype=rois.dtype)
+            )
+            targets = roi_align(
+                gt_masks_th[:, None, :, :], rois, out_shape, 1.0, 0, "avg", True
+            ).squeeze(1)
             if binarize:
                 resized_masks = (targets >= 0.5).cpu().numpy()
             else:
@@ -369,21 +358,42 @@ def crop_and_resize(self,
     def expand(self, expanded_h, expanded_w, top, left):
         """See :func:`BaseInstanceMasks.expand`."""
         if len(self.masks) == 0:
-            expanded_mask = np.empty((0, expanded_h, expanded_w),
-                                     dtype=np.uint8)
+            expanded_mask = np.empty((0, expanded_h, expanded_w), dtype=np.uint8)
         else:
-            expanded_mask = np.zeros((len(self), expanded_h, expanded_w),
-                                     dtype=np.uint8)
-            expanded_mask[:, top:top + self.height,
-                          left:left + self.width] = self.masks
+            expanded_mask = np.zeros((len(self), expanded_h, expanded_w), dtype=np.uint8)
+            expanded_mask[:, top : top + self.height, left : left + self.width] = self.masks
         return BitmapMasks(expanded_mask, expanded_h, expanded_w)
 
-    def translate(self,
-                  out_shape,
-                  offset,
-                  direction='horizontal',
-                  fill_val=0,
-                  interpolation='bilinear'):
+    def translation(self, offsets, inds):
+        """translation mask by offset value (used in semi-supervised learning)
+
+        Args:
+            offsets (np.array): input offset value
+            inds (list): positive index
+
+        Returns:
+            BitmapMasks: translated mask
+        """
+        if len(self.masks) == 0:
+            translated_masks = np.empty((0, self.height, self.width), dtype=np.uint8)
+        else:
+            translated_masks = []
+            for idx, mask in enumerate(self.masks):
+                if idx in inds:
+                    offset = offsets[inds.tolist().index(idx)]
+                else:
+                    offset = [0, 0]
+                translated_mask = image_translation(mask, -offset[0], -offset[1])
+                translated_mask = cv2.threshold(translated_mask, 0.5, 1, cv2.THRESH_BINARY)[1]
+                translated_masks.append(translated_mask)
+
+            translated_masks = np.stack(translated_masks).astype(np.uint8)
+
+        return BitmapMasks(translated_masks, self.height, self.width)
+
+    def translate(
+        self, out_shape, offset, direction="horizontal", fill_val=0, interpolation="bilinear"
+    ):
         """Translate the BitmapMasks.
 
         Args:
@@ -421,19 +431,16 @@ def translate(self,
                 offset,
                 direction,
                 border_value=fill_val,
-                interpolation=interpolation)
+                interpolation=interpolation,
+            )
             if translated_masks.ndim == 2:
                 translated_masks = translated_masks[:, :, None]
-            translated_masks = translated_masks.transpose(
-                (2, 0, 1)).astype(self.masks.dtype)
+            translated_masks = translated_masks.transpose((2, 0, 1)).astype(self.masks.dtype)
         return BitmapMasks(translated_masks, *out_shape)
 
-    def shear(self,
-              out_shape,
-              magnitude,
-              direction='horizontal',
-              border_value=0,
-              interpolation='bilinear'):
+    def shear(
+        self, out_shape, magnitude, direction="horizontal", border_value=0, interpolation="bilinear"
+    ):
         """Shear the BitmapMasks.
 
         Args:
@@ -456,11 +463,11 @@ def shear(self,
                 magnitude,
                 direction,
                 border_value=border_value,
-                interpolation=interpolation)
+                interpolation=interpolation,
+            )
             if sheared_masks.ndim == 2:
                 sheared_masks = sheared_masks[:, :, None]
-            sheared_masks = sheared_masks.transpose(
-                (2, 0, 1)).astype(self.masks.dtype)
+            sheared_masks = sheared_masks.transpose((2, 0, 1)).astype(self.masks.dtype)
         return BitmapMasks(sheared_masks, *out_shape)
 
     def rotate(self, out_shape, angle, center=None, scale=1.0, fill_val=0):
@@ -487,12 +494,12 @@ def rotate(self, out_shape, angle, center=None, scale=1.0, fill_val=0):
                 angle,
                 center=center,
                 scale=scale,
-                border_value=fill_val)
+                border_value=fill_val,
+            )
             if rotated_masks.ndim == 2:
                 # case when only one mask, (h, w)
                 rotated_masks = rotated_masks[:, :, None]  # (h, w, 1)
-            rotated_masks = rotated_masks.transpose(
-                (2, 0, 1)).astype(self.masks.dtype)
+            rotated_masks = rotated_masks.transpose((2, 0, 1)).astype(self.masks.dtype)
         return BitmapMasks(rotated_masks, *out_shape)
 
     @property
@@ -509,12 +516,7 @@ def to_tensor(self, dtype, device):
         return torch.tensor(self.masks, dtype=dtype, device=device)
 
     @classmethod
-    def random(cls,
-               num_masks=3,
-               height=32,
-               width=32,
-               dtype=np.uint8,
-               rng=None):
+    def random(cls, num_masks=3, height=32, width=32, dtype=np.uint8, rng=None):
         """Generate random bitmap masks for demo / testing purposes.
 
         Example:
@@ -524,6 +526,7 @@ def random(cls,
             self = BitmapMasks(num_masks=3, height=32, width=32)
         """
         from mmdet.utils.util_random import ensure_rng
+
         rng = ensure_rng(rng)
         masks = (rng.rand(num_masks, height, width) > 0.1).astype(dtype)
         self = cls(masks, height=height, width=width)
@@ -540,8 +543,7 @@ def get_bboxes(self):
             if len(x) > 0 and len(y) > 0:
                 # use +1 for x_max and y_max so that the right and bottom
                 # boundary of instance masks are fully included by the box
-                boxes[idx, :] = np.array([x[0], y[0], x[-1] + 1, y[-1] + 1],
-                                         dtype=np.float32)
+                boxes[idx, :] = np.array([x[0], y[0], x[-1] + 1, y[-1] + 1], dtype=np.float32)
         return boxes
 
 
@@ -612,8 +614,7 @@ def __getitem__(self, index):
             try:
                 masks = self.masks[index]
             except Exception:
-                raise ValueError(
-                    f'Unsupported input of type {type(index)} for indexing!')
+                raise ValueError(f"Unsupported input of type {type(index)} for indexing!")
         if len(masks) and isinstance(masks[0], np.ndarray):
             masks = [masks]  # ensure a list of three levels
         return PolygonMasks(masks, self.height, self.width)
@@ -622,10 +623,10 @@ def __iter__(self):
         return iter(self.masks)
 
     def __repr__(self):
-        s = self.__class__.__name__ + '('
-        s += f'num_masks={len(self.masks)}, '
-        s += f'height={self.height}, '
-        s += f'width={self.width})'
+        s = self.__class__.__name__ + "("
+        s += f"num_masks={len(self.masks)}, "
+        s += f"height={self.height}, "
+        s += f"width={self.width})"
         return s
 
     def __len__(self):
@@ -660,9 +661,9 @@ def resize(self, out_shape, interpolation=None):
             resized_masks = PolygonMasks(resized_masks, *out_shape)
         return resized_masks
 
-    def flip(self, flip_direction='horizontal'):
+    def flip(self, flip_direction="horizontal"):
         """see :func:`BaseInstanceMasks.flip`"""
-        assert flip_direction in ('horizontal', 'vertical', 'diagonal')
+        assert flip_direction in ("horizontal", "vertical", "diagonal")
         if len(self.masks) == 0:
             flipped_masks = PolygonMasks([], self.height, self.width)
         else:
@@ -671,17 +672,16 @@ def flip(self, flip_direction='horizontal'):
                 flipped_poly_per_obj = []
                 for p in poly_per_obj:
                     p = p.copy()
-                    if flip_direction == 'horizontal':
+                    if flip_direction == "horizontal":
                         p[0::2] = self.width - p[0::2]
-                    elif flip_direction == 'vertical':
+                    elif flip_direction == "vertical":
                         p[1::2] = self.height - p[1::2]
                     else:
                         p[0::2] = self.width - p[0::2]
                         p[1::2] = self.height - p[1::2]
                     flipped_poly_per_obj.append(p)
                 flipped_masks.append(flipped_poly_per_obj)
-            flipped_masks = PolygonMasks(flipped_masks, self.height,
-                                         self.width)
+            flipped_masks = PolygonMasks(flipped_masks, self.height, self.width)
         return flipped_masks
 
     def crop(self, bbox):
@@ -721,21 +721,16 @@ def expand(self, *args, **kwargs):
         """TODO: Add expand for polygon"""
         raise NotImplementedError
 
-    def crop_and_resize(self,
-                        bboxes,
-                        out_shape,
-                        inds,
-                        device='cpu',
-                        interpolation='bilinear',
-                        binarize=True):
+    def crop_and_resize(
+        self, bboxes, out_shape, inds, device="cpu", interpolation="bilinear", binarize=True
+    ):
         """see :func:`BaseInstanceMasks.crop_and_resize`"""
         out_h, out_w = out_shape
         if len(self.masks) == 0:
             return PolygonMasks([], out_h, out_w)
 
         if not binarize:
-            raise ValueError('Polygons are always binary, '
-                             'setting binarize=False is unsupported')
+            raise ValueError("Polygons are always binary, " "setting binarize=False is unsupported")
 
         resized_masks = []
         for i in range(len(bboxes)):
@@ -762,12 +757,9 @@ def crop_and_resize(self,
             resized_masks.append(resized_mask)
         return PolygonMasks(resized_masks, *out_shape)
 
-    def translate(self,
-                  out_shape,
-                  offset,
-                  direction='horizontal',
-                  fill_val=None,
-                  interpolation=None):
+    def translate(
+        self, out_shape, offset, direction="horizontal", fill_val=None, interpolation=None
+    ):
         """Translate the PolygonMasks.
 
         Example:
@@ -777,8 +769,9 @@ def translate(self,
             >>> assert np.all(new.masks[0][0][1::2] == self.masks[0][0][1::2])
             >>> assert np.all(new.masks[0][0][0::2] == self.masks[0][0][0::2] + 4)  # noqa: E501
         """
-        assert fill_val is None or fill_val == 0, 'Here fill_val is not '\
-            f'used, and defaultly should be None or 0. got {fill_val}.'
+        assert fill_val is None or fill_val == 0, (
+            "Here fill_val is not " f"used, and defaultly should be None or 0. got {fill_val}."
+        )
         if len(self.masks) == 0:
             translated_masks = PolygonMasks([], *out_shape)
         else:
@@ -787,43 +780,35 @@ def translate(self,
                 translated_poly_per_obj = []
                 for p in poly_per_obj:
                     p = p.copy()
-                    if direction == 'horizontal':
+                    if direction == "horizontal":
                         p[0::2] = np.clip(p[0::2] + offset, 0, out_shape[1])
-                    elif direction == 'vertical':
+                    elif direction == "vertical":
                         p[1::2] = np.clip(p[1::2] + offset, 0, out_shape[0])
                     translated_poly_per_obj.append(p)
                 translated_masks.append(translated_poly_per_obj)
             translated_masks = PolygonMasks(translated_masks, *out_shape)
         return translated_masks
 
-    def shear(self,
-              out_shape,
-              magnitude,
-              direction='horizontal',
-              border_value=0,
-              interpolation='bilinear'):
+    def shear(
+        self, out_shape, magnitude, direction="horizontal", border_value=0, interpolation="bilinear"
+    ):
         """See :func:`BaseInstanceMasks.shear`."""
         if len(self.masks) == 0:
             sheared_masks = PolygonMasks([], *out_shape)
         else:
             sheared_masks = []
-            if direction == 'horizontal':
-                shear_matrix = np.stack([[1, magnitude],
-                                         [0, 1]]).astype(np.float32)
-            elif direction == 'vertical':
-                shear_matrix = np.stack([[1, 0], [magnitude,
-                                                  1]]).astype(np.float32)
+            if direction == "horizontal":
+                shear_matrix = np.stack([[1, magnitude], [0, 1]]).astype(np.float32)
+            elif direction == "vertical":
+                shear_matrix = np.stack([[1, 0], [magnitude, 1]]).astype(np.float32)
             for poly_per_obj in self.masks:
                 sheared_poly = []
                 for p in poly_per_obj:
                     p = np.stack([p[0::2], p[1::2]], axis=0)  # [2, n]
                     new_coords = np.matmul(shear_matrix, p)  # [2, n]
-                    new_coords[0, :] = np.clip(new_coords[0, :], 0,
-                                               out_shape[1])
-                    new_coords[1, :] = np.clip(new_coords[1, :], 0,
-                                               out_shape[0])
-                    sheared_poly.append(
-                        new_coords.transpose((1, 0)).reshape(-1))
+                    new_coords[0, :] = np.clip(new_coords[0, :], 0, out_shape[1])
+                    new_coords[1, :] = np.clip(new_coords[1, :], 0, out_shape[0])
+                    sheared_poly.append(new_coords.transpose((1, 0)).reshape(-1))
                 sheared_masks.append(sheared_poly)
             sheared_masks = PolygonMasks(sheared_masks, *out_shape)
         return sheared_masks
@@ -843,15 +828,13 @@ def rotate(self, out_shape, angle, center=None, scale=1.0, fill_val=0):
                     # pad 1 to convert from format [x, y] to homogeneous
                     # coordinates format [x, y, 1]
                     coords = np.concatenate(
-                        (coords, np.ones((coords.shape[0], 1), coords.dtype)),
-                        axis=1)  # [n, 3]
-                    rotated_coords = np.matmul(
-                        rotate_matrix[None, :, :],
-                        coords[:, :, None])[..., 0]  # [n, 2, 1] -> [n, 2]
-                    rotated_coords[:, 0] = np.clip(rotated_coords[:, 0], 0,
-                                                   out_shape[1])
-                    rotated_coords[:, 1] = np.clip(rotated_coords[:, 1], 0,
-                                                   out_shape[0])
+                        (coords, np.ones((coords.shape[0], 1), coords.dtype)), axis=1
+                    )  # [n, 3]
+                    rotated_coords = np.matmul(rotate_matrix[None, :, :], coords[:, :, None])[
+                        ..., 0
+                    ]  # [n, 2, 1] -> [n, 2]
+                    rotated_coords[:, 0] = np.clip(rotated_coords[:, 0], 0, out_shape[1])
+                    rotated_coords[:, 1] = np.clip(rotated_coords[:, 1], 0, out_shape[0])
                     rotated_poly.append(rotated_coords.reshape(-1))
                 rotated_masks.append(rotated_poly)
             rotated_masks = PolygonMasks(rotated_masks, *out_shape)
@@ -894,8 +877,7 @@ def _polygon_area(self, x, y):
         Return:
             float: the are of the component
         """  # noqa: 501
-        return 0.5 * np.abs(
-            np.dot(x, np.roll(y, 1)) - np.dot(y, np.roll(x, 1)))
+        return 0.5 * np.abs(np.dot(x, np.roll(y, 1)) - np.dot(y, np.roll(x, 1)))
 
     def to_ndarray(self):
         """Convert masks to the format of ndarray."""
@@ -903,27 +885,18 @@ def to_ndarray(self):
             return np.empty((0, self.height, self.width), dtype=np.uint8)
         bitmap_masks = []
         for poly_per_obj in self.masks:
-            bitmap_masks.append(
-                polygon_to_bitmap(poly_per_obj, self.height, self.width))
+            bitmap_masks.append(polygon_to_bitmap(poly_per_obj, self.height, self.width))
         return np.stack(bitmap_masks)
 
     def to_tensor(self, dtype, device):
         """See :func:`BaseInstanceMasks.to_tensor`."""
         if len(self.masks) == 0:
-            return torch.empty((0, self.height, self.width),
-                               dtype=dtype,
-                               device=device)
+            return torch.empty((0, self.height, self.width), dtype=dtype, device=device)
         ndarray_masks = self.to_ndarray()
         return torch.tensor(ndarray_masks, dtype=dtype, device=device)
 
     @classmethod
-    def random(cls,
-               num_masks=3,
-               height=32,
-               width=32,
-               n_verts=5,
-               dtype=np.float32,
-               rng=None):
+    def random(cls, num_masks=3, height=32, width=32, n_verts=5, dtype=np.float32, rng=None):
         """Generate random polygon masks for demo / testing purposes.
 
         Adapted from [1]_
@@ -937,6 +910,7 @@ def random(cls,
             >>> print('self = {}'.format(self))
         """
         from mmdet.utils.util_random import ensure_rng
+
         rng = ensure_rng(rng)
 
         def _gen_polygon(n, irregularity, spikeyness):
@@ -1001,12 +975,12 @@ def _gen_polygon(n, irregularity, spikeyness):
             points = points / points.max(axis=0)
 
             # Randomly place within 0-1 space
-            points = points * (rng.rand() * .8 + .2)
+            points = points * (rng.rand() * 0.8 + 0.2)
             min_pt = points.min(axis=0)
             max_pt = points.max(axis=0)
 
-            high = (1 - max_pt)
-            low = (0 - min_pt)
+            high = 1 - max_pt
+            low = 0 - min_pt
             offset = (rng.rand(2) * (high - low)) + low
             points = points + offset
             return points
@@ -1020,8 +994,7 @@ def _order_vertices(verts):
             mlng = verts.T[1].sum() / len(verts)
 
             tau = np.pi * 2
-            angle = (np.arctan2(mlat - verts.T[0], verts.T[1] - mlng) +
-                     tau) % tau
+            angle = (np.arctan2(mlat - verts.T[0], verts.T[1] - mlng) + tau) % tau
             sortx = angle.argsort()
             verts = verts.take(sortx, axis=0)
             return verts
@@ -1042,8 +1015,7 @@ def get_bboxes(self):
         for idx, poly_per_obj in enumerate(self.masks):
             # simply use a number that is big enough for comparison with
             # coordinates
-            xy_min = np.array([self.width * 2, self.height * 2],
-                              dtype=np.float32)
+            xy_min = np.array([self.width * 2, self.height * 2], dtype=np.float32)
             xy_max = np.zeros(2, dtype=np.float32)
             for p in poly_per_obj:
                 xy = np.array(p).reshape(-1, 2).astype(np.float32)
@@ -1100,3 +1072,19 @@ def bitmap_to_polygon(bitmap):
     with_hole = (hierarchy.reshape(-1, 4)[:, 3] >= 0).any()
     contours = [c.reshape(-1, 2) for c in contours]
     return contours, with_hole
+
+
+def image_translation(img, offset_x, offset_y, border_value=0):
+    """translate image
+
+    Args:
+        img (np.array): input image
+        offset_x (int or float): translation distance in the x
+        offset_y (int or float): translation distance in the y
+        border_value (int, optional): [description]. Defaults to 0.
+    """
+    h, w = img.shape[:2]
+    matrix = np.float32([[1, 0, offset_x], [0, 1, offset_y]])
+    translated = cv2.warpAffine(img, matrix, (w, h), borderValue=border_value)
+
+    return translated
diff --git a/mmdet/core/mask/utils.py b/mmdet/core/mask/utils.py
index 90544b34..3251eb75 100644
--- a/mmdet/core/mask/utils.py
+++ b/mmdet/core/mask/utils.py
@@ -55,10 +55,8 @@ def encode_mask_results(mask_results):
     for i in range(len(cls_segms)):
         for cls_segm in cls_segms[i]:
             encoded_mask_results[i].append(
-                mask_util.encode(
-                    np.array(
-                        cls_segm[:, :, np.newaxis], order='F',
-                        dtype='uint8'))[0])  # encoded with RLE
+                mask_util.encode(np.array(cls_segm[:, :, np.newaxis], order="F", dtype="uint8"))[0]
+            )  # encoded with RLE
     if isinstance(mask_results, tuple):
         return encoded_mask_results, cls_mask_scores
     else:
@@ -83,7 +81,6 @@ def mask2bbox(masks):
         x = torch.where(x_any[i, :])[0]
         y = torch.where(y_any[i, :])[0]
         if len(x) > 0 and len(y) > 0:
-            bboxes[i, :] = bboxes.new_tensor(
-                [x[0], y[0], x[-1] + 1, y[-1] + 1])
+            bboxes[i, :] = bboxes.new_tensor([x[0], y[0], x[-1] + 1, y[-1] + 1])
 
     return bboxes
diff --git a/mmdet/core/runner/__init__.py b/mmdet/core/runner/__init__.py
new file mode 100644
index 00000000..7ad65cc7
--- /dev/null
+++ b/mmdet/core/runner/__init__.py
@@ -0,0 +1,7 @@
+"""Customized runner for supporting semi-supervised learning"""
+
+from .staged_epoch_based_runner import StagedEpochBasedRunner
+
+__all__ = [
+    "StagedEpochBasedRunner",
+]
diff --git a/mmdet/core/runner/staged_epoch_based_runner.py b/mmdet/core/runner/staged_epoch_based_runner.py
new file mode 100644
index 00000000..6c72d815
--- /dev/null
+++ b/mmdet/core/runner/staged_epoch_based_runner.py
@@ -0,0 +1,46 @@
+"""The StagedEpochBasedRunner is used for supporting semi-supervised learning."""
+
+import time
+
+import numpy as np
+import torch
+from mmcv.parallel import DataContainer as DC
+from mmcv.runner import RUNNERS, EpochBasedRunner
+
+
+@RUNNERS.register_module()
+class StagedEpochBasedRunner(EpochBasedRunner):
+    """The runner class supports semi-supervised learning."""
+
+    def __init__(self, *args, **kwargs):
+        assert "supervised_epochs" in kwargs
+        self._supervised_epochs = kwargs.pop("supervised_epochs")
+        super().__init__(*args, **kwargs)
+
+    @property
+    def supervised_epochs(self):
+        """int: Supervised training epochs."""
+        return self._supervised_epochs
+
+    def train(self, data_loader, **kwargs):
+        self.model.train()
+        self.mode = "train"
+        self.data_loader = data_loader
+        self._max_iters = self._max_epochs * len(self.data_loader)
+        ssl_flag = int(self.epoch >= self.supervised_epochs)
+        self.call_hook("before_train_epoch")
+        time.sleep(2)  # Prevent possible deadlock during epoch transition
+        for i, data_batch in enumerate(self.data_loader):
+            batch_size = data_batch["img"].data[0].shape[0]
+            is_semi_supervised_stage = DC([[torch.from_numpy(np.array([ssl_flag]))] * batch_size])
+            data_batch["is_semi_supervised_stage"] = is_semi_supervised_stage
+            self.data_batch = data_batch
+            self._inner_iter = i
+            self.call_hook("before_train_iter")
+            self.run_iter(data_batch, train_mode=True, **kwargs)
+            self.call_hook("after_train_iter")
+            del self.data_batch
+            self._iter += 1
+
+        self.call_hook("after_train_epoch")
+        self._epoch += 1
diff --git a/mmdet/datasets/__init__.py b/mmdet/datasets/__init__.py
index 46c49fd4..720cd938 100644
--- a/mmdet/datasets/__init__.py
+++ b/mmdet/datasets/__init__.py
@@ -1,31 +1,59 @@
 # Copyright (c) OpenMMLab. All rights reserved.
+from .bonai import BONAI
+from .bonai_ssl import BONAI_SSL
 from .builder import DATASETS, PIPELINES, build_dataloader, build_dataset
 from .cityscapes import CityscapesDataset
 from .coco import CocoDataset
 from .coco_occluded import OccludedSeparatedCocoDataset
 from .coco_panoptic import CocoPanopticDataset
 from .custom import CustomDataset
-from .dataset_wrappers import (ClassBalancedDataset, ConcatDataset,
-                               MultiImageMixDataset, RepeatDataset)
+from .dataset_wrappers import (
+    ClassBalancedDataset,
+    ConcatDataset,
+    MultiImageMixDataset,
+    RepeatDataset,
+)
 from .deepfashion import DeepFashionDataset
 from .lvis import LVISDataset, LVISV1Dataset, LVISV05Dataset
 from .objects365 import Objects365V1Dataset, Objects365V2Dataset
 from .openimages import OpenImagesChallengeDataset, OpenImagesDataset
 from .samplers import DistributedGroupSampler, DistributedSampler, GroupSampler
-from .utils import (NumClassCheckHook, get_loading_pipeline,
-                    replace_ImageToTensor)
+from .utils import NumClassCheckHook, get_loading_pipeline, replace_ImageToTensor
 from .voc import VOCDataset
 from .wider_face import WIDERFaceDataset
 from .xml_style import XMLDataset
 
 __all__ = [
-    'CustomDataset', 'XMLDataset', 'CocoDataset', 'DeepFashionDataset',
-    'VOCDataset', 'CityscapesDataset', 'LVISDataset', 'LVISV05Dataset',
-    'LVISV1Dataset', 'GroupSampler', 'DistributedGroupSampler',
-    'DistributedSampler', 'build_dataloader', 'ConcatDataset', 'RepeatDataset',
-    'ClassBalancedDataset', 'WIDERFaceDataset', 'DATASETS', 'PIPELINES',
-    'build_dataset', 'replace_ImageToTensor', 'get_loading_pipeline',
-    'NumClassCheckHook', 'CocoPanopticDataset', 'MultiImageMixDataset',
-    'OpenImagesDataset', 'OpenImagesChallengeDataset', 'Objects365V1Dataset',
-    'Objects365V2Dataset', 'OccludedSeparatedCocoDataset'
+    "CustomDataset",
+    "XMLDataset",
+    "CocoDataset",
+    "DeepFashionDataset",
+    "VOCDataset",
+    "CityscapesDataset",
+    "LVISDataset",
+    "LVISV05Dataset",
+    "LVISV1Dataset",
+    "GroupSampler",
+    "DistributedGroupSampler",
+    "DistributedSampler",
+    "build_dataloader",
+    "ConcatDataset",
+    "RepeatDataset",
+    "ClassBalancedDataset",
+    "WIDERFaceDataset",
+    "DATASETS",
+    "PIPELINES",
+    "build_dataset",
+    "replace_ImageToTensor",
+    "get_loading_pipeline",
+    "NumClassCheckHook",
+    "CocoPanopticDataset",
+    "MultiImageMixDataset",
+    "OpenImagesDataset",
+    "OpenImagesChallengeDataset",
+    "Objects365V1Dataset",
+    "Objects365V2Dataset",
+    "OccludedSeparatedCocoDataset",
+    "BONAI",
+    "BONAI_SSL",
 ]
diff --git a/mmdet/datasets/bonai.py b/mmdet/datasets/bonai.py
new file mode 100644
index 00000000..8c1df2b8
--- /dev/null
+++ b/mmdet/datasets/bonai.py
@@ -0,0 +1,329 @@
+import csv
+import math
+import os.path as osp
+
+import numpy as np
+
+from .builder import DATASETS
+from .coco import CocoDataset
+
+
+@DATASETS.register_module()
+class BONAI(CocoDataset):
+    CLASSES = ("building",)
+
+    def __init__(
+        self,
+        ann_file,
+        pipeline,
+        classes=None,
+        data_root=None,
+        img_prefix="",
+        seg_prefix=None,
+        edge_prefix=None,
+        side_face_prefix=None,
+        offset_field_prefix=None,
+        proposal_file=None,
+        test_mode=False,
+        filter_empty_gt=True,
+        gt_footprint_csv_file=None,
+        bbox_type="roof",
+        mask_type="roof",
+        offset_coordinate="rectangle",
+        resolution=0.6,
+        ignore_buildings=True,
+    ):
+        super().__init__(
+            ann_file=ann_file,
+            pipeline=pipeline,
+            classes=classes,
+            data_root=data_root,
+            img_prefix=img_prefix,
+            seg_prefix=seg_prefix,
+            proposal_file=proposal_file,
+            test_mode=test_mode,
+            filter_empty_gt=filter_empty_gt,
+        )
+        self.ann_file = ann_file
+        self.bbox_type = bbox_type
+        self.mask_type = mask_type
+        self.offset_coordinate = offset_coordinate
+        self.resolution = resolution
+        self.ignore_buildings = ignore_buildings
+        self.gt_footprint_csv_file = gt_footprint_csv_file
+
+        self.edge_prefix = edge_prefix
+        self.side_face_prefix = side_face_prefix
+        self.offset_field_prefix = offset_field_prefix
+
+        if self.data_root is not None:
+            if not (self.edge_prefix is None or osp.isabs(self.edge_prefix)):
+                self.edge_prefix = osp.join(self.data_root, self.edge_prefix)
+            if not (self.side_face_prefix is None or osp.isabs(self.side_face_prefix)):
+                self.side_face_prefix = osp.join(self.data_root, self.side_face_prefix)
+            if not (self.offset_field_prefix is None or osp.isabs(self.offset_field_prefix)):
+                self.offset_field_prefix = osp.join(self.data_root, self.offset_field_prefix)
+
+    def pre_pipeline(self, results):
+        super(BONAI, self).pre_pipeline(results)
+        results["edge_prefix"] = self.edge_prefix
+        results["edge_fields"] = []
+
+        results["side_face_prefix"] = self.side_face_prefix
+        results["side_face_fields"] = []
+
+        results["offset_field_prefix"] = self.offset_field_prefix
+        results["offset_field_fields"] = []
+
+    def get_properties(self, idx):
+        img_id = self.data_infos[idx]["id"]
+        ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
+        ann_info = self.coco.load_anns(ann_ids)
+
+        return ann_info[0].keys()
+
+    def _filter_imgs(self, min_size=32):
+        """Filter images too small or without ground truths."""
+        valid_inds = []
+        ids_with_ann = set(_["image_id"] for _ in self.coco.anns.values())
+        for i, img_info in enumerate(self.data_infos):
+            img_id = img_info["id"]
+            ann_ids = self.coco.getAnnIds(imgIds=[img_id])
+            ann_info = self.coco.loadAnns(ann_ids)
+            all_iscrowd = all([_["iscrowd"] for _ in ann_info])
+            if self.filter_empty_gt and (self.img_ids[i] not in ids_with_ann or all_iscrowd):
+                continue
+            if min(img_info["width"], img_info["height"]) >= min_size:
+                valid_inds.append(i)
+        return valid_inds
+
+    def _parse_ann_info(self, img_info, ann_info):
+        """Parse bbox and mask annotation.
+
+        Args:
+            ann_info (list[dict]): Annotation info of an image.
+            with_mask (bool): Whether to parse mask annotations.
+
+        Returns:
+            dict: A dict containing the following keys: bboxes, bboxes_ignore,
+                labels, masks, seg_map. "masks" are raw annotations and not
+                decoded into binary masks.
+        """
+        gt_bboxes = []
+        gt_labels = []
+        gt_bboxes_ignore = []
+        gt_masks = []
+        gt_roof_masks = []
+        gt_footprint_masks = []
+        gt_offsets = []
+        gt_heights = []
+        gt_angles = []
+        gt_mean_angle = 0.0
+        gt_roof_bboxes = []
+        gt_footprint_bboxes = []
+        gt_only_footprint_flag = 0
+
+        for ann in ann_info:
+            if ann.get("ignore", False):
+                continue
+
+            # bbox type may be roof, building and footprint, set it in the config file
+            if self.bbox_type == "roof":
+                x1, y1, w, h = ann["roof_bbox"]
+            elif self.bbox_type == "building":
+                x1, y1, w, h = ann["building_bbox"]
+            elif self.bbox_type == "footprint":
+                x1, y1, w, h = ann["footprint_bbox"]
+            else:
+                raise (TypeError(f"unsupported bbox_type={self.bbox_type}"))
+
+            inter_w = max(0, min(x1 + w, img_info["width"]) - max(x1, 0))
+            inter_h = max(0, min(y1 + h, img_info["height"]) - max(y1, 0))
+            if inter_w * inter_h == 0:
+                continue
+
+            if ann["area"] <= 0 or w < 1 or h < 1:
+                continue
+            if ann["category_id"] not in self.cat_ids:
+                continue
+            bbox = [x1, y1, x1 + w, y1 + h]
+
+            if ann.get("iscrowd", False) and self.ignore_buildings:
+                gt_bboxes_ignore.append(bbox)
+            else:
+                gt_bboxes.append(bbox)
+                gt_labels.append(self.cat2label[ann["category_id"]])
+                gt_roof_masks.append(ann["segmentation"])
+                gt_footprint_masks.append([ann["footprint_mask"]])
+
+                if "roof_bbox" in ann:
+                    x1, y1, w, h = ann["roof_bbox"]
+                    gt_roof_bboxes.append([x1, y1, x1 + w, y1 + h])
+
+                if "footprint_bbox" in ann:
+                    x1, y1, w, h = ann["footprint_bbox"]
+                    gt_footprint_bboxes.append([x1, y1, x1 + w, y1 + h])
+
+                gt_only_footprint_flag = ann.get("only_footprint", 0)
+                if gt_only_footprint_flag == 0:
+                    if self.mask_type == "roof":
+                        gt_masks.append(ann["segmentation"])
+                    elif self.mask_type == "footprint":
+                        gt_masks.append([ann["footprint_mask"]])
+                    else:
+                        raise (TypeError(f"unsupported mask_type={self.mask_type}"))
+                else:
+                    gt_masks.append([ann["footprint_mask"]])
+
+                # rectangle coordinate -> offset = (x, y)
+                # polar coordinate -> offset = (length, theta)
+                if "offset" in ann:
+                    if self.offset_coordinate == "rectangle":
+                        gt_offsets.append(ann["offset"])
+                    elif self.offset_coordinate == "polar":
+                        offset_x, offset_y = ann["offset"]
+                        length = math.sqrt(offset_x**2 + offset_y**2)
+                        angle = math.atan2(offset_y, offset_x)
+                        gt_offsets.append([length, angle])
+                    else:
+                        raise (RuntimeError(f"unsupported coordinate={self.offset_coordinate}"))
+                else:
+                    gt_offsets.append([0, 0])
+
+                if "building_height" in ann:
+                    gt_heights.append([ann["building_height"]])
+                else:
+                    gt_heights.append([0.0])
+
+                if "offset" in ann and "building_height" in ann:
+                    offset_x, offset_y = ann["offset"]
+                    height = ann["building_height"]
+                    angle = math.atan2(
+                        math.sqrt(offset_x**2 + offset_y**2) * self.resolution, height
+                    )
+
+                    gt_angles.append(angle)
+
+        if gt_bboxes:
+            gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
+            gt_roof_bboxes = np.array(gt_roof_bboxes, dtype=np.float32)
+            gt_footprint_bboxes = np.array(gt_footprint_bboxes, dtype=np.float32)
+            gt_labels = np.array(gt_labels, dtype=np.int64)
+            gt_offsets = np.array(gt_offsets, dtype=np.float32)
+            gt_heights = np.array(gt_heights, dtype=np.float32)
+            gt_mean_angle = float(np.array(gt_angles, dtype=np.float32).mean())
+            gt_only_footprint_flag = float(gt_only_footprint_flag)
+        else:
+            gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_roof_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_footprint_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_labels = np.array([], dtype=np.int64)
+            gt_offsets = np.zeros((0, 2), dtype=np.float32)
+            gt_heights = np.zeros((0, 1), dtype=np.float32)
+            gt_mean_angle = 0.0001
+            gt_only_footprint_flag = 0
+
+        if gt_bboxes_ignore:
+            gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32)
+        else:
+            gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)
+
+        seg_map = img_info["filename"].replace("jpg", "png")
+        edge_map = img_info["filename"].replace("jpg", "png")
+        side_face_map = img_info["filename"].replace("jpg", "png")
+        offset_field = img_info["filename"].replace("png", "npy")
+
+        ann = dict(
+            bboxes=gt_bboxes,
+            labels=gt_labels,
+            bboxes_ignore=gt_bboxes_ignore,
+            masks=gt_masks,
+            roof_masks=gt_roof_masks,
+            footprint_masks=gt_footprint_masks,
+            seg_map=seg_map,
+            offsets=gt_offsets,
+            building_heights=gt_heights,
+            angle=gt_mean_angle,
+            edge_map=edge_map,
+            side_face_map=side_face_map,
+            roof_bboxes=gt_roof_bboxes,
+            footprint_bboxes=gt_footprint_bboxes,
+            offset_field=offset_field,
+            only_footprint_flag=gt_only_footprint_flag,
+        )
+
+        return ann
+
+    def _segm2json(self, results):
+        bbox_json_results = []
+        segm_json_results = []
+        for idx in range(len(self)):
+            img_id = self.img_ids[idx]
+            if len(results[idx]) == 2:
+                det, seg = results[idx]
+            elif len(results[idx]) == 3:
+                det, seg, offset = results[idx]
+            elif len(results[idx]) == 4:
+                det, seg, offset, building_height = results[idx]
+            elif len(results[idx]) == 5:
+                det, seg, offset, building_height, footprint_seg = results[idx]
+            else:
+                raise (RuntimeError("unsupported len(results[idx])=", len(results[idx])))
+            for label in range(len(det)):
+                # bbox results
+                bboxes = det[label]
+                for i in range(bboxes.shape[0]):
+                    data = dict()
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[i])
+                    data["score"] = float(bboxes[i][4])
+                    data["category_id"] = self.cat_ids[label]
+                    bbox_json_results.append(data)
+
+                # segm results
+                # some detectors use different scores for bbox and mask
+                if isinstance(seg, tuple):
+                    segms = seg[0][label]
+                    mask_score = seg[1][label]
+                else:
+                    segms = seg[label]
+                    mask_score = [bbox[4] for bbox in bboxes]
+                for i in range(bboxes.shape[0]):
+                    data = dict()
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[i])
+                    data["score"] = float(mask_score[i])
+                    data["category_id"] = self.cat_ids[label]
+                    if isinstance(segms[i]["counts"], bytes):
+                        segms[i]["counts"] = segms[i]["counts"].decode()
+                    data["segmentation"] = segms[i]
+                    segm_json_results.append(data)
+
+        return bbox_json_results, segm_json_results
+
+    def write_results2csv(self, results, meta_info=None):
+        print("meta_info: ", meta_info)
+        segmentation_eval_results = results[0]
+        with open(meta_info["summary_file"], "w") as summary:
+            csv_writer = csv.writer(summary, delimiter=",")
+            csv_writer.writerow(["Meta Info"])
+            csv_writer.writerow(["model", meta_info["model"]])
+            csv_writer.writerow(["anno_file", meta_info["anno_file"]])
+            csv_writer.writerow(["gt_roof_csv_file", meta_info["gt_roof_csv_file"]])
+            csv_writer.writerow(["gt_footprint_csv_file", meta_info["gt_footprint_csv_file"]])
+            csv_writer.writerow(["vis_dir", meta_info["vis_dir"]])
+            csv_writer.writerow([""])
+            for mask_type in ["roof", "footprint"]:
+                csv_writer.writerow([mask_type])
+                csv_writer.writerow([segmentation_eval_results[mask_type]])
+                csv_writer.writerow(["F1 Score", segmentation_eval_results[mask_type]["F1_score"]])
+                csv_writer.writerow(
+                    ["Precision", segmentation_eval_results[mask_type]["Precision"]]
+                )
+                csv_writer.writerow(["Recall", segmentation_eval_results[mask_type]["Recall"]])
+                csv_writer.writerow(["True Positive", segmentation_eval_results[mask_type]["TP"]])
+                csv_writer.writerow(["False Positive", segmentation_eval_results[mask_type]["FP"]])
+                csv_writer.writerow(["False Negative", segmentation_eval_results[mask_type]["FN"]])
+                csv_writer.writerow([""])
+
+            csv_writer.writerow([""])
diff --git a/mmdet/datasets/bonai_ssl.py b/mmdet/datasets/bonai_ssl.py
new file mode 100644
index 00000000..9b7bb11f
--- /dev/null
+++ b/mmdet/datasets/bonai_ssl.py
@@ -0,0 +1,409 @@
+import csv
+import math
+import os.path as osp
+
+import numpy as np
+
+from .builder import DATASETS
+from .coco import CocoDataset
+
+
+@DATASETS.register_module()
+class BONAI_SSL(CocoDataset):
+    CLASSES = ("building",)
+
+    def __init__(
+        self,
+        ann_file,
+        pipeline,
+        classes=None,
+        data_root=None,
+        img_prefix="",
+        seg_prefix=None,
+        edge_prefix=None,
+        side_face_prefix=None,
+        offset_field_prefix=None,
+        proposal_file=None,
+        test_mode=False,
+        filter_empty_gt=True,
+        gt_footprint_csv_file=None,
+        bbox_type="roof",
+        mask_type="roof",
+        offset_coordinate="rectangle",
+        resolution=0.6,
+        ignore_buildings=True,
+        height_mask_shape=[256, 256],
+        image_scale_footprint_mask_shape=[256, 256],
+    ):
+        super().__init__(
+            ann_file=ann_file,
+            pipeline=pipeline,
+            classes=classes,
+            data_root=data_root,
+            img_prefix=img_prefix,
+            seg_prefix=seg_prefix,
+            proposal_file=proposal_file,
+            test_mode=test_mode,
+            filter_empty_gt=filter_empty_gt,
+        )
+        self.ann_file = ann_file
+        self.bbox_type = bbox_type
+        self.mask_type = mask_type
+        self.offset_coordinate = offset_coordinate
+        self.resolution = resolution
+        self.height_mask_shape = height_mask_shape
+        self.image_scale_footprint_mask_shape = image_scale_footprint_mask_shape
+        self.ignore_buildings = ignore_buildings
+        self.gt_footprint_csv_file = gt_footprint_csv_file
+
+        self.edge_prefix = edge_prefix
+        self.side_face_prefix = side_face_prefix
+        self.offset_field_prefix = offset_field_prefix
+
+        if self.data_root is not None:
+            if not (self.edge_prefix is None or osp.isabs(self.edge_prefix)):
+                self.edge_prefix = osp.join(self.data_root, self.edge_prefix)
+            if not (self.side_face_prefix is None or osp.isabs(self.side_face_prefix)):
+                self.side_face_prefix = osp.join(self.data_root, self.side_face_prefix)
+            if not (self.offset_field_prefix is None or osp.isabs(self.offset_field_prefix)):
+                self.offset_field_prefix = osp.join(self.data_root, self.offset_field_prefix)
+
+    def pre_pipeline(self, results):
+        super().pre_pipeline(results)
+        results["edge_prefix"] = self.edge_prefix
+        results["edge_fields"] = []
+
+        results["side_face_prefix"] = self.side_face_prefix
+        results["side_face_fields"] = []
+
+        results["offset_field_prefix"] = self.offset_field_prefix
+        results["offset_field_fields"] = []
+
+    def get_properties(self, idx):
+        """Get ann dict keys."""
+        img_id = self.data_infos[idx]["id"]
+        ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
+        ann_info = self.coco.load_anns(ann_ids)
+
+        return ann_info[0].keys()
+
+    def _filter_imgs(self, min_size=32):
+        """Filter images too small or without ground truths."""
+        valid_inds = []
+        ids_with_ann = set(_["image_id"] for _ in self.coco.anns.values())
+        for i, img_info in enumerate(self.data_infos):
+            img_id = img_info["id"]
+            ann_ids = self.coco.getAnnIds(imgIds=[img_id])
+            ann_info = self.coco.loadAnns(ann_ids)
+            all_iscrowd = all(_["iscrowd"] for _ in ann_info)
+            if self.filter_empty_gt and (self.img_ids[i] not in ids_with_ann or all_iscrowd):
+                continue
+            if min(img_info["width"], img_info["height"]) >= min_size:
+                valid_inds.append(i)
+        return valid_inds
+
+    def _parse_ann_info(self, img_info, ann_info):
+        """Parse bbox and mask annotation.
+
+        Args:
+            ann_info (list[dict]): Annotation info of an image.
+            with_mask (bool): Whether to parse mask annotations.
+
+        Returns:
+            dict: A dict containing the following keys: bboxes, bboxes_ignore,
+                labels, masks, seg_map. "masks" are raw annotations and not
+                decoded into binary masks.
+        """
+        gt_bboxes = []
+        gt_labels = []
+        gt_bboxes_ignore = []
+        gt_masks = []
+        gt_roof_masks = []
+        gt_footprint_masks = []
+        gt_offsets = []
+        gt_heights = []
+        gt_nadir_angles = []
+        gt_mean_nadir_angle = 0.0
+        gt_offset_angles = []
+        gt_mean_offset_angle = 0.0
+        gt_roof_bboxes = []
+        gt_footprint_bboxes = []
+        gt_only_footprint_flag = 0
+
+        # is_semi_supervised_sample = 1 if (self.bbox_type + "_bbox") not in ann_info[0] else 0
+        is_semi_supervised_sample = 1 if "offset" not in ann_info[0] else 0
+        is_valid_height_sample = 1 if "building_height" in ann_info[0] else 0
+        # Variable fake_wh is used only for passing the valid check.
+        fake_wh = 500
+
+        for ann in ann_info:
+            if ann.get("ignore", False):
+                continue
+
+            # TODO: remove the patch for wrong field name.
+            if not "footprint_bbox" in ann:
+                ann["footprint_bbox"] = ann["bbox"]
+            if not "footprint_mask" in ann:
+                ann["footprint_mask"] = ann["segmentation"][0]
+
+            ann_category_id = ann["category_id"]
+            if ann_category_id not in self.cat_ids:
+                continue
+
+            # bbox type may be roof, building and footprint, set it in the config file
+            if self.bbox_type == "roof":
+                x1, y1, w, h = ann.get(  # pylint: disable=invalid-name
+                    "roof_bbox", [0, 0, fake_wh, fake_wh]
+                )
+            elif self.bbox_type == "building":
+                x1, y1, w, h = ann.get(  # pylint: disable=invalid-name
+                    "building_bbox", [0, 0, fake_wh, fake_wh]
+                )
+            elif self.bbox_type == "footprint":
+                x1, y1, w, h = ann.get(  # pylint: disable=invalid-name
+                    "footprint_bbox", [0, 0, fake_wh, fake_wh]
+                )
+            else:
+                raise TypeError(f"unsupported bbox_type={self.bbox_type}")
+
+            if ("area" in ann and ann["area"] <= 0) or w < 1 or h < 1:
+                continue
+
+            inter_w = max(0, min(x1 + w, img_info["width"]) - max(x1, 0))
+            inter_h = max(0, min(y1 + h, img_info["height"]) - max(y1, 0))
+            if inter_w * inter_h == 0:
+                continue
+
+            bbox = [x1, y1, x1 + w, y1 + h]
+
+            if ann.get("iscrowd", False) and self.ignore_buildings:
+                gt_bboxes_ignore.append(bbox)
+                continue
+
+            gt_bboxes.append(bbox)
+            gt_labels.append(self.cat2label[ann["category_id"]])
+            gt_roof_masks.append(ann.get("segmentation", [[0, 0, 0, 0, 0, 0]]))
+            gt_footprint_masks.append([ann["footprint_mask"]])
+
+            if "roof_bbox" in ann:
+                x1, y1, w, h = ann["roof_bbox"]  # pylint: disable=invalid-name
+                gt_roof_bboxes.append([x1, y1, x1 + w, y1 + h])
+
+            if "footprint_bbox" in ann:
+                x1, y1, w, h = ann["footprint_bbox"]  # pylint: disable=invalid-name
+                gt_footprint_bboxes.append([x1, y1, x1 + w, y1 + h])
+
+            if self.mask_type == "roof":
+                gt_masks.append(
+                    ann.get(
+                        "segmentation", [[0, 0, fake_wh, 0, fake_wh, fake_wh, 0, fake_wh, 0, 0]]
+                    )
+                )
+            elif self.mask_type == "footprint":
+                gt_masks.append([ann["footprint_mask"]])
+            else:
+                raise TypeError(f"unsupported mask_type={self.mask_type}")
+
+            # rectangle coordinate -> offset = (x, y)
+            # polar coordinate -> offset = (length, theta)
+            if "offset" in ann:
+                if self.offset_coordinate == "rectangle":
+                    gt_offsets.append(ann["offset"])
+                elif self.offset_coordinate == "polar":
+                    offset_x, offset_y = ann["offset"]
+                    length = math.sqrt(offset_x**2 + offset_y**2)
+                    angle = math.atan2(offset_y, offset_x)
+                    gt_offsets.append([length, angle])
+                else:
+                    raise RuntimeError(f"unsupported coordinate={self.offset_coordinate}")
+            else:
+                gt_offsets.append([5, 5])
+
+            gt_height = ann.get("building_height", 0.0)
+            gt_heights.append([gt_height])
+
+            if "offset" in ann and "building_height" in ann:
+                offset_x, offset_y = ann["offset"]
+                norm = offset_x**2 + offset_y**2
+                height = ann["building_height"]
+                if height != 0 and norm != 0:
+                    angle = math.sqrt(norm) * self.resolution / float(height)
+                    gt_nadir_angles.append(angle)
+
+            if "offset" in ann:
+                offset = ann["offset"]
+                if offset != [0, 0]:
+                    z = math.sqrt(offset[0] ** 2 + offset[1] ** 2)
+                    gt_offset_angles.append([float(offset[1]) / z, float(offset[0]) / z])
+
+        if gt_bboxes:
+            gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
+            gt_roof_bboxes = np.array(gt_roof_bboxes, dtype=np.float32)
+            gt_footprint_bboxes = np.array(gt_footprint_bboxes, dtype=np.float32)
+            gt_labels = np.array(gt_labels, dtype=np.int64)
+            gt_offsets = np.array(gt_offsets, dtype=np.float32)
+            gt_heights = np.array(gt_heights, dtype=np.float32)
+            gt_only_footprint_flag = float(gt_only_footprint_flag)
+        else:
+            gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_roof_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_footprint_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_labels = np.array([], dtype=np.int64)
+            gt_offsets = np.zeros((0, 2), dtype=np.float32)
+            gt_heights = np.zeros((0, 1), dtype=np.float32)
+            gt_only_footprint_flag = 0
+
+        if len(gt_nadir_angles) > 0:
+            gt_mean_nadir_angle = float(np.array(gt_nadir_angles, dtype=np.float32).mean())
+        else:
+            gt_mean_nadir_angle = 1
+
+        if len(gt_offset_angles) > 0:
+            gt_mean_offset_angle = [list(np.array(gt_offset_angles, dtype=np.float32).mean(axis=0))]
+        else:
+            gt_mean_offset_angle = [[1.0, 0.0]]
+
+        if gt_bboxes_ignore:
+            gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32)
+        else:
+            gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)
+
+        seg_map = img_info["filename"].replace("jpg", "png")
+        edge_map = img_info["filename"].replace("jpg", "png")
+        side_face_map = img_info["filename"].replace("jpg", "png")
+        offset_field = img_info["filename"].replace("png", "npy")
+
+        ann = dict(
+            bboxes=gt_bboxes,
+            labels=gt_labels,
+            bboxes_ignore=gt_bboxes_ignore,
+            masks=gt_masks,
+            roof_masks=gt_roof_masks,
+            footprint_masks=gt_footprint_masks,
+            seg_map=seg_map,
+            offsets=gt_offsets,
+            building_heights=gt_heights,
+            nadir_angles=gt_mean_nadir_angle,
+            offset_angles=gt_mean_offset_angle,
+            edge_map=edge_map,
+            side_face_map=side_face_map,
+            roof_bboxes=gt_roof_bboxes,
+            footprint_bboxes=gt_footprint_bboxes,
+            offset_field=offset_field,
+            only_footprint_flag=gt_only_footprint_flag,
+            is_semi_supervised_sample=is_semi_supervised_sample,
+            is_valid_height_sample=is_valid_height_sample,
+            resolution=self.resolution,
+            height_mask_shape=np.array(self.height_mask_shape, dtype=np.int32),
+            image_scale_footprint_mask_shape=np.array(
+                self.image_scale_footprint_mask_shape, dtype=np.int32
+            ),
+        )
+
+        return ann
+
+    def _segm2json(self, results, mask_type=None):
+        assert mask_type is not None
+        bbox_json_results = []
+        segm_json_results = []
+        for idx in range(len(self)):
+            img_id = self.img_ids[idx]
+            result = results[idx]
+            det = result[0]
+            mask_type_idx = {
+                "roof": 1,
+                "offset_footprint": 4,
+                "direct_footprint": 5,
+            }
+            try:
+                seg = result[mask_type_idx[mask_type]]
+            except KeyError as e:
+                raise KeyError("wrong mask type to evaluate") from e
+
+            if seg is None:
+                break
+
+            for i, bboxes in enumerate(det):
+                # bbox results
+                for j in range(bboxes.shape[0]):
+                    data = dict()
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[j])
+                    data["score"] = float(bboxes[j][4])
+                    data["category_id"] = self.cat_ids[i]
+                    bbox_json_results.append(data)
+
+                # segm results
+                # some detectors use different scores for bbox and mask
+                if isinstance(seg, tuple):
+                    segms = seg[0][i]
+                    mask_score = seg[1][i]
+                else:
+                    segms = seg[i]
+                    mask_score = [bbox[4] for bbox in bboxes]
+                for j in range(bboxes.shape[0]):
+                    data = dict()
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[j])
+                    data["score"] = float(mask_score[j])
+                    data["category_id"] = self.cat_ids[i]
+                    if isinstance(segms[j]["counts"], bytes):
+                        segms[j]["counts"] = segms[j]["counts"].decode()
+                    data["segmentation"] = segms[j]
+                    segm_json_results.append(data)
+
+        return bbox_json_results, segm_json_results
+
+    def write_results2csv(self, results, meta_info=None):
+        """Write results to csv file."""
+        print("meta_info: ", meta_info)
+        segmentation_eval_results = results[0]
+        with open(meta_info["summary_file"], "w", encoding="utf-8") as summary:
+            csv_writer = csv.writer(summary, delimiter=",")
+            csv_writer.writerow(["Meta Info"])
+            csv_writer.writerow(["model", meta_info["model"]])
+            csv_writer.writerow(["anno_file", meta_info["anno_file"]])
+            csv_writer.writerow(["gt_roof_csv_file", meta_info["gt_roof_csv_file"]])
+            csv_writer.writerow(["gt_footprint_csv_file", meta_info["gt_footprint_csv_file"]])
+            csv_writer.writerow(["vis_dir", meta_info["vis_dir"]])
+            csv_writer.writerow([""])
+            for mask_type in ["roof", "footprint"]:
+                csv_writer.writerow([mask_type])
+                csv_writer.writerow([segmentation_eval_results[mask_type]])
+                csv_writer.writerow(["F1 Score", segmentation_eval_results[mask_type]["F1_score"]])
+                csv_writer.writerow(
+                    ["Precision", segmentation_eval_results[mask_type]["Precision"]]
+                )
+                csv_writer.writerow(["Recall", segmentation_eval_results[mask_type]["Recall"]])
+                csv_writer.writerow(["True Positive", segmentation_eval_results[mask_type]["TP"]])
+                csv_writer.writerow(["False Positive", segmentation_eval_results[mask_type]["FP"]])
+                csv_writer.writerow(["False Negative", segmentation_eval_results[mask_type]["FN"]])
+                csv_writer.writerow([""])
+
+            csv_writer.writerow([""])
+
+    def _set_group_flag(self):
+        """Set flag according to image aspect ratio and whether the image is fully annotated.
+
+        Images with aspect ratio greater than 1 will be set as group 1,
+        otherwise group 0.
+        """
+        self.flag = np.zeros(len(self), dtype=np.uint8)
+        for i in range(len(self)):
+            img_info = self.data_infos[i]
+            if img_info["width"] / img_info["height"] > 1:
+                if "offset" in self.get_properties(i):
+                    self.flag[i] = 0
+                else:
+                    if "building_height" in self.get_properties(i):
+                        self.flag[i] = 1
+                    else:
+                        self.flag[i] = 2
+            else:
+                if "offset" in self.get_properties(i):
+                    self.flag[i] = 3
+                else:
+                    if "building_height" in self.get_properties(i):
+                        self.flag[i] = 4
+                    else:
+                        self.flag[i] = 5
diff --git a/mmdet/datasets/builder.py b/mmdet/datasets/builder.py
index 1936296a..47882d92 100644
--- a/mmdet/datasets/builder.py
+++ b/mmdet/datasets/builder.py
@@ -12,71 +12,93 @@
 from mmcv.utils import TORCH_VERSION, Registry, build_from_cfg, digit_version
 from torch.utils.data import DataLoader
 
-from .samplers import (ClassAwareSampler, DistributedGroupSampler,
-                       DistributedSampler, GroupSampler, InfiniteBatchSampler,
-                       InfiniteGroupBatchSampler)
-
-if platform.system() != 'Windows':
+from .samplers import (
+    ClassAwareSampler,
+    DistributedGroupSampler,
+    DistributedSampler,
+    GroupSampler,
+    InfiniteBatchSampler,
+    InfiniteGroupBatchSampler,
+)
+
+if platform.system() != "Windows":
     # https://github.com/pytorch/pytorch/issues/973
     import resource
+
     rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
     base_soft_limit = rlimit[0]
     hard_limit = rlimit[1]
     soft_limit = min(max(4096, base_soft_limit), hard_limit)
     resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit))
 
-DATASETS = Registry('dataset')
-PIPELINES = Registry('pipeline')
+DATASETS = Registry("dataset")
+PIPELINES = Registry("pipeline")
 
 
 def _concat_dataset(cfg, default_args=None):
     from .dataset_wrappers import ConcatDataset
-    ann_files = cfg['ann_file']
-    img_prefixes = cfg.get('img_prefix', None)
-    seg_prefixes = cfg.get('seg_prefix', None)
-    proposal_files = cfg.get('proposal_file', None)
-    separate_eval = cfg.get('separate_eval', True)
+
+    ann_files = cfg["ann_file"]
+    img_prefixes = cfg.get("img_prefix", None)
+    seg_prefixes = cfg.get("seg_prefix", None)
+    edge_prefixes = cfg.get("edge_prefix", None)
+    side_face_prefixes = cfg.get("side_face_prefix", None)
+    offset_field_prefixs = cfg.get("offset_field_prefix", None)
+    proposal_files = cfg.get("proposal_file", None)
+    separate_eval = cfg.get("separate_eval", True)
 
     datasets = []
     num_dset = len(ann_files)
     for i in range(num_dset):
         data_cfg = copy.deepcopy(cfg)
         # pop 'separate_eval' since it is not a valid key for common datasets.
-        if 'separate_eval' in data_cfg:
-            data_cfg.pop('separate_eval')
-        data_cfg['ann_file'] = ann_files[i]
+        if "separate_eval" in data_cfg:
+            data_cfg.pop("separate_eval")
+        data_cfg["ann_file"] = ann_files[i]
         if isinstance(img_prefixes, (list, tuple)):
-            data_cfg['img_prefix'] = img_prefixes[i]
+            data_cfg["img_prefix"] = img_prefixes[i]
         if isinstance(seg_prefixes, (list, tuple)):
-            data_cfg['seg_prefix'] = seg_prefixes[i]
+            data_cfg["seg_prefix"] = seg_prefixes[i]
+        if isinstance(edge_prefixes, (list, tuple)):
+            data_cfg["edge_prefix"] = edge_prefixes[i]
+        if isinstance(side_face_prefixes, (list, tuple)):
+            data_cfg["side_face_prefix"] = side_face_prefixes[i]
+        if isinstance(offset_field_prefixs, (list, tuple)):
+            data_cfg["offset_field_prefix"] = offset_field_prefixs[i]
         if isinstance(proposal_files, (list, tuple)):
-            data_cfg['proposal_file'] = proposal_files[i]
+            data_cfg["proposal_file"] = proposal_files[i]
         datasets.append(build_dataset(data_cfg, default_args))
 
     return ConcatDataset(datasets, separate_eval)
 
 
 def build_dataset(cfg, default_args=None):
-    from .dataset_wrappers import (ClassBalancedDataset, ConcatDataset,
-                                   MultiImageMixDataset, RepeatDataset)
+    from .dataset_wrappers import (
+        ClassBalancedDataset,
+        ConcatDataset,
+        MultiImageMixDataset,
+        RepeatDataset,
+    )
+
     if isinstance(cfg, (list, tuple)):
         dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg])
-    elif cfg['type'] == 'ConcatDataset':
+    elif cfg["type"] == "ConcatDataset":
         dataset = ConcatDataset(
-            [build_dataset(c, default_args) for c in cfg['datasets']],
-            cfg.get('separate_eval', True))
-    elif cfg['type'] == 'RepeatDataset':
-        dataset = RepeatDataset(
-            build_dataset(cfg['dataset'], default_args), cfg['times'])
-    elif cfg['type'] == 'ClassBalancedDataset':
+            [build_dataset(c, default_args) for c in cfg["datasets"]],
+            cfg.get("separate_eval", True),
+        )
+    elif cfg["type"] == "RepeatDataset":
+        dataset = RepeatDataset(build_dataset(cfg["dataset"], default_args), cfg["times"])
+    elif cfg["type"] == "ClassBalancedDataset":
         dataset = ClassBalancedDataset(
-            build_dataset(cfg['dataset'], default_args), cfg['oversample_thr'])
-    elif cfg['type'] == 'MultiImageMixDataset':
+            build_dataset(cfg["dataset"], default_args), cfg["oversample_thr"]
+        )
+    elif cfg["type"] == "MultiImageMixDataset":
         cp_cfg = copy.deepcopy(cfg)
-        cp_cfg['dataset'] = build_dataset(cp_cfg['dataset'])
-        cp_cfg.pop('type')
+        cp_cfg["dataset"] = build_dataset(cp_cfg["dataset"])
+        cp_cfg.pop("type")
         dataset = MultiImageMixDataset(**cp_cfg)
-    elif isinstance(cfg.get('ann_file'), (list, tuple)):
+    elif isinstance(cfg.get("ann_file"), (list, tuple)):
         dataset = _concat_dataset(cfg, default_args)
     else:
         dataset = build_from_cfg(cfg, DATASETS, default_args)
@@ -84,17 +106,19 @@ def build_dataset(cfg, default_args=None):
     return dataset
 
 
-def build_dataloader(dataset,
-                     samples_per_gpu,
-                     workers_per_gpu,
-                     num_gpus=1,
-                     dist=True,
-                     shuffle=True,
-                     seed=None,
-                     runner_type='EpochBasedRunner',
-                     persistent_workers=False,
-                     class_aware_sampler=None,
-                     **kwargs):
+def build_dataloader(
+    dataset,
+    samples_per_gpu,
+    workers_per_gpu,
+    num_gpus=1,
+    dist=True,
+    shuffle=True,
+    seed=None,
+    runner_type="EpochBasedRunner",
+    persistent_workers=False,
+    class_aware_sampler=None,
+    **kwargs
+):
     """Build PyTorch DataLoader.
 
     In distributed training, each GPU/process has a dataloader.
@@ -137,60 +161,59 @@ def build_dataloader(dataset,
         batch_size = num_gpus * samples_per_gpu
         num_workers = num_gpus * workers_per_gpu
 
-    if runner_type == 'IterBasedRunner':
+    if runner_type == "IterBasedRunner":
         # this is a batch sampler, which can yield
         # a mini-batch indices each time.
         # it can be used in both `DataParallel` and
         # `DistributedDataParallel`
         if shuffle:
             batch_sampler = InfiniteGroupBatchSampler(
-                dataset, batch_size, world_size, rank, seed=seed)
+                dataset, batch_size, world_size, rank, seed=seed
+            )
         else:
             batch_sampler = InfiniteBatchSampler(
-                dataset,
-                batch_size,
-                world_size,
-                rank,
-                seed=seed,
-                shuffle=False)
+                dataset, batch_size, world_size, rank, seed=seed, shuffle=False
+            )
         batch_size = 1
         sampler = None
     else:
         if class_aware_sampler is not None:
             # ClassAwareSampler can be used in both distributed and
             # non-distributed training.
-            num_sample_class = class_aware_sampler.get('num_sample_class', 1)
+            num_sample_class = class_aware_sampler.get("num_sample_class", 1)
             sampler = ClassAwareSampler(
                 dataset,
                 samples_per_gpu,
                 world_size,
                 rank,
                 seed=seed,
-                num_sample_class=num_sample_class)
+                num_sample_class=num_sample_class,
+            )
         elif dist:
             # DistributedGroupSampler will definitely shuffle the data to
             # satisfy that images on each GPU are in the same group
             if shuffle:
                 sampler = DistributedGroupSampler(
-                    dataset, samples_per_gpu, world_size, rank, seed=seed)
+                    dataset, samples_per_gpu, world_size, rank, seed=seed
+                )
             else:
-                sampler = DistributedSampler(
-                    dataset, world_size, rank, shuffle=False, seed=seed)
+                sampler = DistributedSampler(dataset, world_size, rank, shuffle=False, seed=seed)
         else:
-            sampler = GroupSampler(dataset,
-                                   samples_per_gpu) if shuffle else None
+            sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
         batch_sampler = None
 
-    init_fn = partial(
-        worker_init_fn, num_workers=num_workers, rank=rank,
-        seed=seed) if seed is not None else None
+    init_fn = (
+        partial(worker_init_fn, num_workers=num_workers, rank=rank, seed=seed)
+        if seed is not None
+        else None
+    )
 
-    if (TORCH_VERSION != 'parrots'
-            and digit_version(TORCH_VERSION) >= digit_version('1.7.0')):
-        kwargs['persistent_workers'] = persistent_workers
+    if TORCH_VERSION != "parrots" and digit_version(TORCH_VERSION) >= digit_version("1.7.0"):
+        kwargs["persistent_workers"] = persistent_workers
     elif persistent_workers is True:
-        warnings.warn('persistent_workers is invalid because your pytorch '
-                      'version is lower than 1.7.0')
+        warnings.warn(
+            "persistent_workers is invalid because your pytorch " "version is lower than 1.7.0"
+        )
 
     data_loader = DataLoader(
         dataset,
@@ -199,9 +222,10 @@ def build_dataloader(dataset,
         num_workers=num_workers,
         batch_sampler=batch_sampler,
         collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),
-        pin_memory=kwargs.pop('pin_memory', False),
+        pin_memory=kwargs.pop("pin_memory", False),
         worker_init_fn=init_fn,
-        **kwargs)
+        **kwargs
+    )
 
     return data_loader
 
diff --git a/mmdet/datasets/coco.py b/mmdet/datasets/coco.py
index d20a121c..3e0ac927 100644
--- a/mmdet/datasets/coco.py
+++ b/mmdet/datasets/coco.py
@@ -21,43 +21,171 @@
 
 @DATASETS.register_module()
 class CocoDataset(CustomDataset):
-
-    CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
-               'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
-               'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
-               'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
-               'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
-               'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
-               'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
-               'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
-               'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
-               'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-               'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-               'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
-               'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
-               'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
-
-    PALETTE = [(220, 20, 60), (119, 11, 32), (0, 0, 142), (0, 0, 230),
-               (106, 0, 228), (0, 60, 100), (0, 80, 100), (0, 0, 70),
-               (0, 0, 192), (250, 170, 30), (100, 170, 30), (220, 220, 0),
-               (175, 116, 175), (250, 0, 30), (165, 42, 42), (255, 77, 255),
-               (0, 226, 252), (182, 182, 255), (0, 82, 0), (120, 166, 157),
-               (110, 76, 0), (174, 57, 255), (199, 100, 0), (72, 0, 118),
-               (255, 179, 240), (0, 125, 92), (209, 0, 151), (188, 208, 182),
-               (0, 220, 176), (255, 99, 164), (92, 0, 73), (133, 129, 255),
-               (78, 180, 255), (0, 228, 0), (174, 255, 243), (45, 89, 255),
-               (134, 134, 103), (145, 148, 174), (255, 208, 186),
-               (197, 226, 255), (171, 134, 1), (109, 63, 54), (207, 138, 255),
-               (151, 0, 95), (9, 80, 61), (84, 105, 51), (74, 65, 105),
-               (166, 196, 102), (208, 195, 210), (255, 109, 65), (0, 143, 149),
-               (179, 0, 194), (209, 99, 106), (5, 121, 0), (227, 255, 205),
-               (147, 186, 208), (153, 69, 1), (3, 95, 161), (163, 255, 0),
-               (119, 0, 170), (0, 182, 199), (0, 165, 120), (183, 130, 88),
-               (95, 32, 0), (130, 114, 135), (110, 129, 133), (166, 74, 118),
-               (219, 142, 185), (79, 210, 114), (178, 90, 62), (65, 70, 15),
-               (127, 167, 115), (59, 105, 106), (142, 108, 45), (196, 172, 0),
-               (95, 54, 80), (128, 76, 255), (201, 57, 1), (246, 0, 122),
-               (191, 162, 208)]
+    CLASSES = (
+        "person",
+        "bicycle",
+        "car",
+        "motorcycle",
+        "airplane",
+        "bus",
+        "train",
+        "truck",
+        "boat",
+        "traffic light",
+        "fire hydrant",
+        "stop sign",
+        "parking meter",
+        "bench",
+        "bird",
+        "cat",
+        "dog",
+        "horse",
+        "sheep",
+        "cow",
+        "elephant",
+        "bear",
+        "zebra",
+        "giraffe",
+        "backpack",
+        "umbrella",
+        "handbag",
+        "tie",
+        "suitcase",
+        "frisbee",
+        "skis",
+        "snowboard",
+        "sports ball",
+        "kite",
+        "baseball bat",
+        "baseball glove",
+        "skateboard",
+        "surfboard",
+        "tennis racket",
+        "bottle",
+        "wine glass",
+        "cup",
+        "fork",
+        "knife",
+        "spoon",
+        "bowl",
+        "banana",
+        "apple",
+        "sandwich",
+        "orange",
+        "broccoli",
+        "carrot",
+        "hot dog",
+        "pizza",
+        "donut",
+        "cake",
+        "chair",
+        "couch",
+        "potted plant",
+        "bed",
+        "dining table",
+        "toilet",
+        "tv",
+        "laptop",
+        "mouse",
+        "remote",
+        "keyboard",
+        "cell phone",
+        "microwave",
+        "oven",
+        "toaster",
+        "sink",
+        "refrigerator",
+        "book",
+        "clock",
+        "vase",
+        "scissors",
+        "teddy bear",
+        "hair drier",
+        "toothbrush",
+    )
+
+    PALETTE = [
+        (220, 20, 60),
+        (119, 11, 32),
+        (0, 0, 142),
+        (0, 0, 230),
+        (106, 0, 228),
+        (0, 60, 100),
+        (0, 80, 100),
+        (0, 0, 70),
+        (0, 0, 192),
+        (250, 170, 30),
+        (100, 170, 30),
+        (220, 220, 0),
+        (175, 116, 175),
+        (250, 0, 30),
+        (165, 42, 42),
+        (255, 77, 255),
+        (0, 226, 252),
+        (182, 182, 255),
+        (0, 82, 0),
+        (120, 166, 157),
+        (110, 76, 0),
+        (174, 57, 255),
+        (199, 100, 0),
+        (72, 0, 118),
+        (255, 179, 240),
+        (0, 125, 92),
+        (209, 0, 151),
+        (188, 208, 182),
+        (0, 220, 176),
+        (255, 99, 164),
+        (92, 0, 73),
+        (133, 129, 255),
+        (78, 180, 255),
+        (0, 228, 0),
+        (174, 255, 243),
+        (45, 89, 255),
+        (134, 134, 103),
+        (145, 148, 174),
+        (255, 208, 186),
+        (197, 226, 255),
+        (171, 134, 1),
+        (109, 63, 54),
+        (207, 138, 255),
+        (151, 0, 95),
+        (9, 80, 61),
+        (84, 105, 51),
+        (74, 65, 105),
+        (166, 196, 102),
+        (208, 195, 210),
+        (255, 109, 65),
+        (0, 143, 149),
+        (179, 0, 194),
+        (209, 99, 106),
+        (5, 121, 0),
+        (227, 255, 205),
+        (147, 186, 208),
+        (153, 69, 1),
+        (3, 95, 161),
+        (163, 255, 0),
+        (119, 0, 170),
+        (0, 182, 199),
+        (0, 165, 120),
+        (183, 130, 88),
+        (95, 32, 0),
+        (130, 114, 135),
+        (110, 129, 133),
+        (166, 74, 118),
+        (219, 142, 185),
+        (79, 210, 114),
+        (178, 90, 62),
+        (65, 70, 15),
+        (127, 167, 115),
+        (59, 105, 106),
+        (142, 108, 45),
+        (196, 172, 0),
+        (95, 54, 80),
+        (128, 76, 255),
+        (201, 57, 1),
+        (246, 0, 122),
+        (191, 162, 208),
+    ]
 
     def load_annotations(self, ann_file):
         """Load annotation from COCO style annotation file.
@@ -80,12 +208,13 @@ def load_annotations(self, ann_file):
         total_ann_ids = []
         for i in self.img_ids:
             info = self.coco.load_imgs([i])[0]
-            info['filename'] = info['file_name']
+            info["filename"] = info["file_name"]
             data_infos.append(info)
             ann_ids = self.coco.get_ann_ids(img_ids=[i])
             total_ann_ids.extend(ann_ids)
         assert len(set(total_ann_ids)) == len(
-            total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
+            total_ann_ids
+        ), f"Annotation ids in '{ann_file}' are not unique!"
         return data_infos
 
     def get_ann_info(self, idx):
@@ -98,7 +227,7 @@ def get_ann_info(self, idx):
             dict: Annotation info of specified index.
         """
 
-        img_id = self.data_infos[idx]['id']
+        img_id = self.data_infos[idx]["id"]
         ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
         ann_info = self.coco.load_anns(ann_ids)
         return self._parse_ann_info(self.data_infos[idx], ann_info)
@@ -113,16 +242,16 @@ def get_cat_ids(self, idx):
             list[int]: All categories in the image of specified index.
         """
 
-        img_id = self.data_infos[idx]['id']
+        img_id = self.data_infos[idx]["id"]
         ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
         ann_info = self.coco.load_anns(ann_ids)
-        return [ann['category_id'] for ann in ann_info]
+        return [ann["category_id"] for ann in ann_info]
 
     def _filter_imgs(self, min_size=32):
         """Filter images too small or without ground truths."""
         valid_inds = []
         # obtain images that contain annotation
-        ids_with_ann = set(_['image_id'] for _ in self.coco.anns.values())
+        ids_with_ann = set(_["image_id"] for _ in self.coco.anns.values())
         # obtain images that contain annotations of the required categories
         ids_in_cat = set()
         for i, class_id in enumerate(self.cat_ids):
@@ -136,7 +265,7 @@ def _filter_imgs(self, min_size=32):
             img_id = self.img_ids[i]
             if self.filter_empty_gt and img_id not in ids_in_cat:
                 continue
-            if min(img_info['width'], img_info['height']) >= min_size:
+            if min(img_info["width"], img_info["height"]) >= min_size:
                 valid_inds.append(i)
                 valid_img_ids.append(img_id)
         self.img_ids = valid_img_ids
@@ -159,24 +288,24 @@ def _parse_ann_info(self, img_info, ann_info):
         gt_bboxes_ignore = []
         gt_masks_ann = []
         for i, ann in enumerate(ann_info):
-            if ann.get('ignore', False):
+            if ann.get("ignore", False):
                 continue
-            x1, y1, w, h = ann['bbox']
-            inter_w = max(0, min(x1 + w, img_info['width']) - max(x1, 0))
-            inter_h = max(0, min(y1 + h, img_info['height']) - max(y1, 0))
+            x1, y1, w, h = ann["bbox"]
+            inter_w = max(0, min(x1 + w, img_info["width"]) - max(x1, 0))
+            inter_h = max(0, min(y1 + h, img_info["height"]) - max(y1, 0))
             if inter_w * inter_h == 0:
                 continue
-            if ann['area'] <= 0 or w < 1 or h < 1:
+            if ann["area"] <= 0 or w < 1 or h < 1:
                 continue
-            if ann['category_id'] not in self.cat_ids:
+            if ann["category_id"] not in self.cat_ids:
                 continue
             bbox = [x1, y1, x1 + w, y1 + h]
-            if ann.get('iscrowd', False):
+            if ann.get("iscrowd", False):
                 gt_bboxes_ignore.append(bbox)
             else:
                 gt_bboxes.append(bbox)
-                gt_labels.append(self.cat2label[ann['category_id']])
-                gt_masks_ann.append(ann.get('segmentation', None))
+                gt_labels.append(self.cat2label[ann["category_id"]])
+                gt_masks_ann.append(ann.get("segmentation", None))
 
         if gt_bboxes:
             gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
@@ -190,14 +319,15 @@ def _parse_ann_info(self, img_info, ann_info):
         else:
             gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)
 
-        seg_map = img_info['filename'].rsplit('.', 1)[0] + self.seg_suffix
+        seg_map = img_info["filename"].rsplit(".", 1)[0] + self.seg_suffix
 
         ann = dict(
             bboxes=gt_bboxes,
             labels=gt_labels,
             bboxes_ignore=gt_bboxes_ignore,
             masks=gt_masks_ann,
-            seg_map=seg_map)
+            seg_map=seg_map,
+        )
 
         return ann
 
@@ -229,10 +359,10 @@ def _proposal2json(self, results):
             bboxes = results[idx]
             for i in range(bboxes.shape[0]):
                 data = dict()
-                data['image_id'] = img_id
-                data['bbox'] = self.xyxy2xywh(bboxes[i])
-                data['score'] = float(bboxes[i][4])
-                data['category_id'] = 1
+                data["image_id"] = img_id
+                data["bbox"] = self.xyxy2xywh(bboxes[i])
+                data["score"] = float(bboxes[i][4])
+                data["category_id"] = 1
                 json_results.append(data)
         return json_results
 
@@ -246,10 +376,10 @@ def _det2json(self, results):
                 bboxes = result[label]
                 for i in range(bboxes.shape[0]):
                     data = dict()
-                    data['image_id'] = img_id
-                    data['bbox'] = self.xyxy2xywh(bboxes[i])
-                    data['score'] = float(bboxes[i][4])
-                    data['category_id'] = self.cat_ids[label]
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[i])
+                    data["score"] = float(bboxes[i][4])
+                    data["category_id"] = self.cat_ids[label]
                     json_results.append(data)
         return json_results
 
@@ -265,10 +395,10 @@ def _segm2json(self, results):
                 bboxes = det[label]
                 for i in range(bboxes.shape[0]):
                     data = dict()
-                    data['image_id'] = img_id
-                    data['bbox'] = self.xyxy2xywh(bboxes[i])
-                    data['score'] = float(bboxes[i][4])
-                    data['category_id'] = self.cat_ids[label]
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[i])
+                    data["score"] = float(bboxes[i][4])
+                    data["category_id"] = self.cat_ids[label]
                     bbox_json_results.append(data)
 
                 # segm results
@@ -281,17 +411,17 @@ def _segm2json(self, results):
                     mask_score = [bbox[4] for bbox in bboxes]
                 for i in range(bboxes.shape[0]):
                     data = dict()
-                    data['image_id'] = img_id
-                    data['bbox'] = self.xyxy2xywh(bboxes[i])
-                    data['score'] = float(mask_score[i])
-                    data['category_id'] = self.cat_ids[label]
-                    if isinstance(segms[i]['counts'], bytes):
-                        segms[i]['counts'] = segms[i]['counts'].decode()
-                    data['segmentation'] = segms[i]
+                    data["image_id"] = img_id
+                    data["bbox"] = self.xyxy2xywh(bboxes[i])
+                    data["score"] = float(mask_score[i])
+                    data["category_id"] = self.cat_ids[label]
+                    if isinstance(segms[i]["counts"], bytes):
+                        segms[i]["counts"] = segms[i]["counts"].decode()
+                    data["segmentation"] = segms[i]
                     segm_json_results.append(data)
         return bbox_json_results, segm_json_results
 
-    def results2json(self, results, outfile_prefix):
+    def results2json(self, results, outfile_prefix, mask_type=None):
         """Dump the detection results to a COCO style json file.
 
         There are 3 types of results: proposals, bbox predictions, mask
@@ -313,22 +443,22 @@ def results2json(self, results, outfile_prefix):
         result_files = dict()
         if isinstance(results[0], list):
             json_results = self._det2json(results)
-            result_files['bbox'] = f'{outfile_prefix}.bbox.json'
-            result_files['proposal'] = f'{outfile_prefix}.bbox.json'
-            mmcv.dump(json_results, result_files['bbox'])
+            result_files["bbox"] = f"{outfile_prefix}.bbox.json"
+            result_files["proposal"] = f"{outfile_prefix}.bbox.json"
+            mmcv.dump(json_results, result_files["bbox"])
         elif isinstance(results[0], tuple):
-            json_results = self._segm2json(results)
-            result_files['bbox'] = f'{outfile_prefix}.bbox.json'
-            result_files['proposal'] = f'{outfile_prefix}.bbox.json'
-            result_files['segm'] = f'{outfile_prefix}.segm.json'
-            mmcv.dump(json_results[0], result_files['bbox'])
-            mmcv.dump(json_results[1], result_files['segm'])
+            json_results = self._segm2json(results, mask_type)
+            result_files["bbox"] = f"{outfile_prefix}.bbox.json"
+            result_files["proposal"] = f"{outfile_prefix}.bbox.json"
+            result_files["segm"] = f"{outfile_prefix}.segm.json"
+            mmcv.dump(json_results[0], result_files["bbox"])
+            mmcv.dump(json_results[1], result_files["segm"])
         elif isinstance(results[0], np.ndarray):
             json_results = self._proposal2json(results)
-            result_files['proposal'] = f'{outfile_prefix}.proposal.json'
-            mmcv.dump(json_results, result_files['proposal'])
+            result_files["proposal"] = f"{outfile_prefix}.proposal.json"
+            mmcv.dump(json_results, result_files["proposal"])
         else:
-            raise TypeError('invalid type of results')
+            raise TypeError("invalid type of results")
         return result_files
 
     def fast_eval_recall(self, results, proposal_nums, iou_thrs, logger=None):
@@ -341,21 +471,20 @@ def fast_eval_recall(self, results, proposal_nums, iou_thrs, logger=None):
                 continue
             bboxes = []
             for ann in ann_info:
-                if ann.get('ignore', False) or ann['iscrowd']:
+                if ann.get("ignore", False) or ann["iscrowd"]:
                     continue
-                x1, y1, w, h = ann['bbox']
+                x1, y1, w, h = ann["bbox"]
                 bboxes.append([x1, y1, x1 + w, y1 + h])
             bboxes = np.array(bboxes, dtype=np.float32)
             if bboxes.shape[0] == 0:
                 bboxes = np.zeros((0, 4))
             gt_bboxes.append(bboxes)
 
-        recalls = eval_recalls(
-            gt_bboxes, results, proposal_nums, iou_thrs, logger=logger)
+        recalls = eval_recalls(gt_bboxes, results, proposal_nums, iou_thrs, logger=logger)
         ar = recalls.mean(axis=1)
         return ar
 
-    def format_results(self, results, jsonfile_prefix=None, **kwargs):
+    def format_results(self, results, jsonfile_prefix=None, mask_type=None, **kwargs):
         """Format the results to json (standard format for COCO evaluation).
 
         Args:
@@ -370,29 +499,33 @@ def format_results(self, results, jsonfile_prefix=None, **kwargs):
                 the json filepaths, tmp_dir is the temporal directory created \
                 for saving json files when jsonfile_prefix is not specified.
         """
-        assert isinstance(results, list), 'results must be a list'
-        assert len(results) == len(self), (
-            'The length of results is not equal to the dataset len: {} != {}'.
-            format(len(results), len(self)))
+        assert isinstance(results, list), "results must be a list"
+        assert len(results) == len(
+            self
+        ), "The length of results is not equal to the dataset len: {} != {}".format(
+            len(results), len(self)
+        )
 
         if jsonfile_prefix is None:
             tmp_dir = tempfile.TemporaryDirectory()
-            jsonfile_prefix = osp.join(tmp_dir.name, 'results')
+            jsonfile_prefix = osp.join(tmp_dir.name, "results")
         else:
             tmp_dir = None
-        result_files = self.results2json(results, jsonfile_prefix)
+        result_files = self.results2json(results, jsonfile_prefix, mask_type)
         return result_files, tmp_dir
 
-    def evaluate_det_segm(self,
-                          results,
-                          result_files,
-                          coco_gt,
-                          metrics,
-                          logger=None,
-                          classwise=False,
-                          proposal_nums=(100, 300, 1000),
-                          iou_thrs=None,
-                          metric_items=None):
+    def evaluate_det_segm(
+        self,
+        results,
+        result_files,
+        coco_gt,
+        metrics,
+        logger=None,
+        classwise=False,
+        proposal_nums=(100, 300, 1000),
+        iou_thrs=None,
+        metric_items=None,
+    ):
         """Instance segmentation and object detection evaluation in COCO
         protocol.
 
@@ -425,39 +558,38 @@ def evaluate_det_segm(self,
             dict[str, float]: COCO style evaluation metric.
         """
         if iou_thrs is None:
-            iou_thrs = np.linspace(
-                .5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
+            iou_thrs = np.linspace(0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True)
         if metric_items is not None:
             if not isinstance(metric_items, list):
                 metric_items = [metric_items]
 
         eval_results = OrderedDict()
         for metric in metrics:
-            msg = f'Evaluating {metric}...'
+            msg = f"Evaluating {metric}..."
             if logger is None:
-                msg = '\n' + msg
+                msg = "\n" + msg
             print_log(msg, logger=logger)
 
-            if metric == 'proposal_fast':
+            if metric == "proposal_fast":
                 if isinstance(results[0], tuple):
-                    raise KeyError('proposal_fast is not supported for '
-                                   'instance segmentation result.')
-                ar = self.fast_eval_recall(
-                    results, proposal_nums, iou_thrs, logger='silent')
+                    raise KeyError(
+                        "proposal_fast is not supported for " "instance segmentation result."
+                    )
+                ar = self.fast_eval_recall(results, proposal_nums, iou_thrs, logger="silent")
                 log_msg = []
                 for i, num in enumerate(proposal_nums):
-                    eval_results[f'AR@{num}'] = ar[i]
-                    log_msg.append(f'\nAR@{num}\t{ar[i]:.4f}')
-                log_msg = ''.join(log_msg)
+                    eval_results[f"AR@{num}"] = ar[i]
+                    log_msg.append(f"\nAR@{num}\t{ar[i]:.4f}")
+                log_msg = "".join(log_msg)
                 print_log(log_msg, logger=logger)
                 continue
 
-            iou_type = 'bbox' if metric == 'proposal' else metric
+            iou_type = "bbox" if metric == "proposal" else metric
             if metric not in result_files:
-                raise KeyError(f'{metric} is not in results')
+                raise KeyError(f"{metric} is not in results")
             try:
                 predictions = mmcv.load(result_files[metric])
-                if iou_type == 'segm':
+                if iou_type == "segm":
                     # Refer to https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/coco.py#L331  # noqa
                     # When evaluating mask AP, if the results contain bbox,
                     # cocoapi will use the box area instead of the mask area
@@ -465,19 +597,21 @@ def evaluate_det_segm(self,
                     # is not affected, this leads to different
                     # small/medium/large mask AP results.
                     for x in predictions:
-                        x.pop('bbox')
-                    warnings.simplefilter('once')
+                        x.pop("bbox")
+                    warnings.simplefilter("once")
                     warnings.warn(
                         'The key "bbox" is deleted for more accurate mask AP '
-                        'of small/medium/large instances since v2.12.0. This '
-                        'does not change the overall mAP calculation.',
-                        UserWarning)
+                        "of small/medium/large instances since v2.12.0. This "
+                        "does not change the overall mAP calculation.",
+                        UserWarning,
+                    )
                 coco_det = coco_gt.loadRes(predictions)
             except IndexError:
                 print_log(
-                    'The testing results of the whole dataset is empty.',
+                    "The testing results of the whole dataset is empty.",
                     logger=logger,
-                    level=logging.ERROR)
+                    level=logging.ERROR,
+                )
                 break
 
             cocoEval = COCOeval(coco_gt, coco_det, iou_type)
@@ -487,26 +621,25 @@ def evaluate_det_segm(self,
             cocoEval.params.iouThrs = iou_thrs
             # mapping of cocoEval.stats
             coco_metric_names = {
-                'mAP': 0,
-                'mAP_50': 1,
-                'mAP_75': 2,
-                'mAP_s': 3,
-                'mAP_m': 4,
-                'mAP_l': 5,
-                'AR@100': 6,
-                'AR@300': 7,
-                'AR@1000': 8,
-                'AR_s@1000': 9,
-                'AR_m@1000': 10,
-                'AR_l@1000': 11
+                "mAP": 0,
+                "mAP_50": 1,
+                "mAP_75": 2,
+                "mAP_s": 3,
+                "mAP_m": 4,
+                "mAP_l": 5,
+                "AR@100": 6,
+                "AR@300": 7,
+                "AR@1000": 8,
+                "AR_s@1000": 9,
+                "AR_m@1000": 10,
+                "AR_l@1000": 11,
             }
             if metric_items is not None:
                 for metric_item in metric_items:
                     if metric_item not in coco_metric_names:
-                        raise KeyError(
-                            f'metric item {metric_item} is not supported')
+                        raise KeyError(f"metric item {metric_item} is not supported")
 
-            if metric == 'proposal':
+            if metric == "proposal":
                 cocoEval.params.useCats = 0
                 cocoEval.evaluate()
                 cocoEval.accumulate()
@@ -515,17 +648,20 @@ def evaluate_det_segm(self,
                 redirect_string = io.StringIO()
                 with contextlib.redirect_stdout(redirect_string):
                     cocoEval.summarize()
-                print_log('\n' + redirect_string.getvalue(), logger=logger)
+                print_log("\n" + redirect_string.getvalue(), logger=logger)
 
                 if metric_items is None:
                     metric_items = [
-                        'AR@100', 'AR@300', 'AR@1000', 'AR_s@1000',
-                        'AR_m@1000', 'AR_l@1000'
+                        "AR@100",
+                        "AR@300",
+                        "AR@1000",
+                        "AR_s@1000",
+                        "AR_m@1000",
+                        "AR_l@1000",
                     ]
 
                 for item in metric_items:
-                    val = float(
-                        f'{cocoEval.stats[coco_metric_names[item]]:.4f}')
+                    val = float(f"{cocoEval.stats[coco_metric_names[item]]:.4f}")
                     eval_results[item] = val
             else:
                 cocoEval.evaluate()
@@ -535,12 +671,12 @@ def evaluate_det_segm(self,
                 redirect_string = io.StringIO()
                 with contextlib.redirect_stdout(redirect_string):
                     cocoEval.summarize()
-                print_log('\n' + redirect_string.getvalue(), logger=logger)
+                print_log("\n" + redirect_string.getvalue(), logger=logger)
 
                 if classwise:  # Compute per-category AP
                     # Compute per-category AP
                     # from https://github.com/facebookresearch/detectron2/
-                    precisions = cocoEval.eval['precision']
+                    precisions = cocoEval.eval["precision"]
                     # precision: (iou, recall, cls, area range, max dets)
                     assert len(self.cat_ids) == precisions.shape[2]
 
@@ -554,50 +690,46 @@ def evaluate_det_segm(self,
                         if precision.size:
                             ap = np.mean(precision)
                         else:
-                            ap = float('nan')
-                        results_per_category.append(
-                            (f'{nm["name"]}', f'{float(ap):0.3f}'))
+                            ap = float("nan")
+                        results_per_category.append((f'{nm["name"]}', f"{float(ap):0.3f}"))
 
                     num_columns = min(6, len(results_per_category) * 2)
-                    results_flatten = list(
-                        itertools.chain(*results_per_category))
-                    headers = ['category', 'AP'] * (num_columns // 2)
-                    results_2d = itertools.zip_longest(*[
-                        results_flatten[i::num_columns]
-                        for i in range(num_columns)
-                    ])
+                    results_flatten = list(itertools.chain(*results_per_category))
+                    headers = ["category", "AP"] * (num_columns // 2)
+                    results_2d = itertools.zip_longest(
+                        *[results_flatten[i::num_columns] for i in range(num_columns)]
+                    )
                     table_data = [headers]
                     table_data += [result for result in results_2d]
                     table = AsciiTable(table_data)
-                    print_log('\n' + table.table, logger=logger)
+                    print_log("\n" + table.table, logger=logger)
 
                 if metric_items is None:
-                    metric_items = [
-                        'mAP', 'mAP_50', 'mAP_75', 'mAP_s', 'mAP_m', 'mAP_l'
-                    ]
+                    metric_items = ["mAP", "mAP_50", "mAP_75", "mAP_s", "mAP_m", "mAP_l"]
 
                 for metric_item in metric_items:
-                    key = f'{metric}_{metric_item}'
-                    val = float(
-                        f'{cocoEval.stats[coco_metric_names[metric_item]]:.4f}'
-                    )
+                    key = f"{metric}_{metric_item}"
+                    val = float(f"{cocoEval.stats[coco_metric_names[metric_item]]:.4f}")
                     eval_results[key] = val
                 ap = cocoEval.stats[:6]
-                eval_results[f'{metric}_mAP_copypaste'] = (
-                    f'{ap[0]:.4f} {ap[1]:.4f} {ap[2]:.4f} {ap[3]:.4f} '
-                    f'{ap[4]:.4f} {ap[5]:.4f}')
+                eval_results[f"{metric}_mAP_copypaste"] = (
+                    f"{ap[0]:.4f} {ap[1]:.4f} {ap[2]:.4f} {ap[3]:.4f} " f"{ap[4]:.4f} {ap[5]:.4f}"
+                )
 
         return eval_results
 
-    def evaluate(self,
-                 results,
-                 metric='bbox',
-                 logger=None,
-                 jsonfile_prefix=None,
-                 classwise=False,
-                 proposal_nums=(100, 300, 1000),
-                 iou_thrs=None,
-                 metric_items=None):
+    def evaluate(
+        self,
+        results,
+        metric="bbox",
+        mask_type=None,
+        logger=None,
+        jsonfile_prefix=None,
+        classwise=False,
+        proposal_nums=(100, 300, 1000),
+        iou_thrs=None,
+        metric_items=None,
+    ):
         """Evaluation in COCO protocol.
 
         Args:
@@ -630,19 +762,29 @@ def evaluate(self,
         """
 
         metrics = metric if isinstance(metric, list) else [metric]
-        allowed_metrics = ['bbox', 'segm', 'proposal', 'proposal_fast']
+        allowed_metrics = ["bbox", "segm", "proposal", "proposal_fast"]
         for metric in metrics:
             if metric not in allowed_metrics:
-                raise KeyError(f'metric {metric} is not supported')
+                raise KeyError(f"metric {metric} is not supported")
 
         coco_gt = self.coco
         self.cat_ids = coco_gt.get_cat_ids(cat_names=self.CLASSES)
 
-        result_files, tmp_dir = self.format_results(results, jsonfile_prefix)
-        eval_results = self.evaluate_det_segm(results, result_files, coco_gt,
-                                              metrics, logger, classwise,
-                                              proposal_nums, iou_thrs,
-                                              metric_items)
+        result_files, tmp_dir = self.format_results(results, jsonfile_prefix, mask_type)
+        if len(result_files['bbox'])==0 and len(result_files['segm'])==0:
+            print("wrong mask type to evaluate or the type of mask for evaluate is None  --->", mask_type)
+            return None
+        eval_results = self.evaluate_det_segm(
+            results,
+            result_files,
+            coco_gt,
+            metrics,
+            logger,
+            classwise,
+            proposal_nums,
+            iou_thrs,
+            metric_items,
+        )
 
         if tmp_dir is not None:
             tmp_dir.cleanup()
diff --git a/mmdet/datasets/custom.py b/mmdet/datasets/custom.py
index 3b97685b..ff82cf01 100644
--- a/mmdet/datasets/custom.py
+++ b/mmdet/datasets/custom.py
@@ -1,3 +1,5 @@
+# bonai changd
+# 1. modify pre_pipeline
 # Copyright (c) OpenMMLab. All rights reserved.
 import os.path as osp
 import warnings
@@ -56,18 +58,20 @@ class CustomDataset(Dataset):
 
     PALETTE = None
 
-    def __init__(self,
-                 ann_file,
-                 pipeline,
-                 classes=None,
-                 data_root=None,
-                 img_prefix='',
-                 seg_prefix=None,
-                 seg_suffix='.png',
-                 proposal_file=None,
-                 test_mode=False,
-                 filter_empty_gt=True,
-                 file_client_args=dict(backend='disk')):
+    def __init__(
+        self,
+        ann_file,
+        pipeline,
+        classes=None,
+        data_root=None,
+        img_prefix="",
+        seg_prefix=None,
+        seg_suffix=".png",
+        proposal_file=None,
+        test_mode=False,
+        filter_empty_gt=True,
+        file_client_args=dict(backend="disk"),
+    ):
         self.ann_file = ann_file
         self.data_root = data_root
         self.img_prefix = img_prefix
@@ -87,33 +91,32 @@ def __init__(self,
                 self.img_prefix = osp.join(self.data_root, self.img_prefix)
             if not (self.seg_prefix is None or osp.isabs(self.seg_prefix)):
                 self.seg_prefix = osp.join(self.data_root, self.seg_prefix)
-            if not (self.proposal_file is None
-                    or osp.isabs(self.proposal_file)):
-                self.proposal_file = osp.join(self.data_root,
-                                              self.proposal_file)
+            if not (self.proposal_file is None or osp.isabs(self.proposal_file)):
+                self.proposal_file = osp.join(self.data_root, self.proposal_file)
         # load annotations (and proposals)
-        if hasattr(self.file_client, 'get_local_path'):
+        if hasattr(self.file_client, "get_local_path"):
             with self.file_client.get_local_path(self.ann_file) as local_path:
                 self.data_infos = self.load_annotations(local_path)
         else:
             warnings.warn(
-                'The used MMCV version does not have get_local_path. '
-                f'We treat the {self.ann_file} as local paths and it '
-                'might cause errors if the path is not a local path. '
-                'Please use MMCV>= 1.3.16 if you meet errors.')
+                "The used MMCV version does not have get_local_path. "
+                f"We treat the {self.ann_file} as local paths and it "
+                "might cause errors if the path is not a local path. "
+                "Please use MMCV>= 1.3.16 if you meet errors."
+            )
             self.data_infos = self.load_annotations(self.ann_file)
 
         if self.proposal_file is not None:
-            if hasattr(self.file_client, 'get_local_path'):
-                with self.file_client.get_local_path(
-                        self.proposal_file) as local_path:
+            if hasattr(self.file_client, "get_local_path"):
+                with self.file_client.get_local_path(self.proposal_file) as local_path:
                     self.proposals = self.load_proposals(local_path)
             else:
                 warnings.warn(
-                    'The used MMCV version does not have get_local_path. '
-                    f'We treat the {self.ann_file} as local paths and it '
-                    'might cause errors if the path is not a local path. '
-                    'Please use MMCV>= 1.3.16 if you meet errors.')
+                    "The used MMCV version does not have get_local_path. "
+                    f"We treat the {self.ann_file} as local paths and it "
+                    "might cause errors if the path is not a local path. "
+                    "Please use MMCV>= 1.3.16 if you meet errors."
+                )
                 self.proposals = self.load_proposals(self.proposal_file)
         else:
             self.proposals = None
@@ -152,7 +155,7 @@ def get_ann_info(self, idx):
             dict: Annotation info of specified index.
         """
 
-        return self.data_infos[idx]['ann']
+        return self.data_infos[idx]["ann"]
 
     def get_cat_ids(self, idx):
         """Get category ids by index.
@@ -164,25 +167,29 @@ def get_cat_ids(self, idx):
             list[int]: All categories in the image of specified index.
         """
 
-        return self.data_infos[idx]['ann']['labels'].astype(np.int).tolist()
+        return self.data_infos[idx]["ann"]["labels"].astype(np.int).tolist()
 
     def pre_pipeline(self, results):
         """Prepare results dict for pipeline."""
-        results['img_prefix'] = self.img_prefix
-        results['seg_prefix'] = self.seg_prefix
-        results['proposal_file'] = self.proposal_file
-        results['bbox_fields'] = []
-        results['mask_fields'] = []
-        results['seg_fields'] = []
+        results["img_prefix"] = self.img_prefix
+        results["seg_prefix"] = self.seg_prefix
+        results["proposal_file"] = self.proposal_file
+        results["bbox_fields"] = []
+        results["mask_fields"] = []
+        results["seg_fields"] = []
+        # add addtional fields
+        results["offset_fields"] = []
+        results["height_fields"] = []
+        results["rbbox_fields"] = []
+        results["angle_fields"] = []
 
     def _filter_imgs(self, min_size=32):
         """Filter images too small."""
         if self.filter_empty_gt:
-            warnings.warn(
-                'CustomDataset does not support filtering empty gt images.')
+            warnings.warn("CustomDataset does not support filtering empty gt images.")
         valid_inds = []
         for i, img_info in enumerate(self.data_infos):
-            if min(img_info['width'], img_info['height']) >= min_size:
+            if min(img_info["width"], img_info["height"]) >= min_size:
                 valid_inds.append(i)
         return valid_inds
 
@@ -195,7 +202,7 @@ def _set_group_flag(self):
         self.flag = np.zeros(len(self), dtype=np.uint8)
         for i in range(len(self)):
             img_info = self.data_infos[i]
-            if img_info['width'] / img_info['height'] > 1:
+            if img_info["width"] / img_info["height"] > 1:
                 self.flag[i] = 1
 
     def _rand_another(self, idx):
@@ -234,12 +241,16 @@ def prepare_train_img(self, idx):
                 introduced by pipeline.
         """
 
+        # print("========== before img info ==========")
         img_info = self.data_infos[idx]
+        # print("========== before ann info ==========")
         ann_info = self.get_ann_info(idx)
         results = dict(img_info=img_info, ann_info=ann_info)
         if self.proposals is not None:
-            results['proposals'] = self.proposals[idx]
+            results["proposals"] = self.proposals[idx]
+        # print("========== before pipeline ==========")
         self.pre_pipeline(results)
+        # print("========== after pipeline ==========")
         return self.pipeline(results)
 
     def prepare_test_img(self, idx):
@@ -256,7 +267,7 @@ def prepare_test_img(self, idx):
         img_info = self.data_infos[idx]
         results = dict(img_info=img_info)
         if self.proposals is not None:
-            results['proposals'] = self.proposals[idx]
+            results["proposals"] = self.proposals[idx]
         self.pre_pipeline(results)
         return self.pipeline(results)
 
@@ -283,7 +294,7 @@ def get_classes(cls, classes=None):
         elif isinstance(classes, (tuple, list)):
             class_names = classes
         else:
-            raise ValueError(f'Unsupported type {type(classes)} of classes.')
+            raise ValueError(f"Unsupported type {type(classes)} of classes.")
 
         return class_names
 
@@ -297,7 +308,7 @@ def get_cat2imgs(self):
             corresponds to the image index that contains the label.
         """
         if self.CLASSES is None:
-            raise ValueError('self.CLASSES can not be None')
+            raise ValueError("self.CLASSES can not be None")
         # sort the label index
         cat2imgs = {i: [] for i in range(len(self.CLASSES))}
         for i in range(len(self)):
@@ -309,13 +320,15 @@ def get_cat2imgs(self):
     def format_results(self, results, **kwargs):
         """Place holder to format result to dataset specific output."""
 
-    def evaluate(self,
-                 results,
-                 metric='mAP',
-                 logger=None,
-                 proposal_nums=(100, 300, 1000),
-                 iou_thr=0.5,
-                 scale_ranges=None):
+    def evaluate(
+        self,
+        results,
+        metric="mAP",
+        logger=None,
+        proposal_nums=(100, 300, 1000),
+        iou_thr=0.5,
+        scale_ranges=None,
+    ):
         """Evaluate the dataset.
 
         Args:
@@ -334,13 +347,13 @@ def evaluate(self,
         if not isinstance(metric, str):
             assert len(metric) == 1
             metric = metric[0]
-        allowed_metrics = ['mAP', 'recall']
+        allowed_metrics = ["mAP", "recall"]
         if metric not in allowed_metrics:
-            raise KeyError(f'metric {metric} is not supported')
+            raise KeyError(f"metric {metric} is not supported")
         annotations = [self.get_ann_info(i) for i in range(len(self))]
         eval_results = OrderedDict()
         iou_thrs = [iou_thr] if isinstance(iou_thr, float) else iou_thr
-        if metric == 'mAP':
+        if metric == "mAP":
             assert isinstance(iou_thrs, list)
             mean_aps = []
             for iou_thr in iou_thrs:
@@ -351,36 +364,38 @@ def evaluate(self,
                     scale_ranges=scale_ranges,
                     iou_thr=iou_thr,
                     dataset=self.CLASSES,
-                    logger=logger)
+                    logger=logger,
+                )
                 mean_aps.append(mean_ap)
-                eval_results[f'AP{int(iou_thr * 100):02d}'] = round(mean_ap, 3)
-            eval_results['mAP'] = sum(mean_aps) / len(mean_aps)
-        elif metric == 'recall':
-            gt_bboxes = [ann['bboxes'] for ann in annotations]
-            recalls = eval_recalls(
-                gt_bboxes, results, proposal_nums, iou_thr, logger=logger)
+                eval_results[f"AP{int(iou_thr * 100):02d}"] = round(mean_ap, 3)
+            eval_results["mAP"] = sum(mean_aps) / len(mean_aps)
+        elif metric == "recall":
+            gt_bboxes = [ann["bboxes"] for ann in annotations]
+            recalls = eval_recalls(gt_bboxes, results, proposal_nums, iou_thr, logger=logger)
             for i, num in enumerate(proposal_nums):
                 for j, iou in enumerate(iou_thrs):
-                    eval_results[f'recall@{num}@{iou}'] = recalls[i, j]
+                    eval_results[f"recall@{num}@{iou}"] = recalls[i, j]
             if recalls.shape[1] > 1:
                 ar = recalls.mean(axis=1)
                 for i, num in enumerate(proposal_nums):
-                    eval_results[f'AR@{num}'] = ar[i]
+                    eval_results[f"AR@{num}"] = ar[i]
         return eval_results
 
     def __repr__(self):
         """Print the number of instance number."""
-        dataset_type = 'Test' if self.test_mode else 'Train'
-        result = (f'\n{self.__class__.__name__} {dataset_type} dataset '
-                  f'with number of images {len(self)}, '
-                  f'and instance counts: \n')
+        dataset_type = "Test" if self.test_mode else "Train"
+        result = (
+            f"\n{self.__class__.__name__} {dataset_type} dataset "
+            f"with number of images {len(self)}, "
+            f"and instance counts: \n"
+        )
         if self.CLASSES is None:
-            result += 'Category names are not provided. \n'
+            result += "Category names are not provided. \n"
             return result
         instance_count = np.zeros(len(self.CLASSES) + 1).astype(int)
         # count the instance number in each image
         for idx in range(len(self)):
-            label = self.get_ann_info(idx)['labels']
+            label = self.get_ann_info(idx)["labels"]
             unique, counts = np.unique(label, return_counts=True)
             if len(unique) > 0:
                 # add the occurrence number to each class
@@ -389,19 +404,19 @@ def __repr__(self):
                 # background is the last index
                 instance_count[-1] += 1
         # create a table with category count
-        table_data = [['category', 'count'] * 5]
+        table_data = [["category", "count"] * 5]
         row_data = []
         for cls, count in enumerate(instance_count):
             if cls < len(self.CLASSES):
-                row_data += [f'{cls} [{self.CLASSES[cls]}]', f'{count}']
+                row_data += [f"{cls} [{self.CLASSES[cls]}]", f"{count}"]
             else:
                 # add the background number
-                row_data += ['-1 background', f'{count}']
+                row_data += ["-1 background", f"{count}"]
             if len(row_data) == 10:
                 table_data.append(row_data)
                 row_data = []
         if len(row_data) >= 2:
-            if row_data[-1] == '0':
+            if row_data[-1] == "0":
                 row_data = row_data[:-2]
             if len(row_data) >= 2:
                 table_data.append([])
diff --git a/mmdet/datasets/pipelines/__init__.py b/mmdet/datasets/pipelines/__init__.py
index 8260da64..ab5b9ed1 100644
--- a/mmdet/datasets/pipelines/__init__.py
+++ b/mmdet/datasets/pipelines/__init__.py
@@ -1,31 +1,106 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from .auto_augment import (AutoAugment, BrightnessTransform, ColorTransform,
-                           ContrastTransform, EqualizeTransform, Rotate, Shear,
-                           Translate)
+from .auto_augment import (
+    AutoAugment,
+    BrightnessTransform,
+    ColorTransform,
+    ContrastTransform,
+    EqualizeTransform,
+    Rotate,
+    Shear,
+    Translate,
+)
 from .compose import Compose
-from .formatting import (Collect, DefaultFormatBundle, ImageToTensor,
-                         ToDataContainer, ToTensor, Transpose, to_tensor)
+from .formatting import (
+    Collect,
+    DefaultFormatBundle,
+    ImageToTensor,
+    LOFTFormatBundle,
+    ToDataContainer,
+    ToTensor,
+    Transpose,
+    to_tensor,
+)
 from .instaboost import InstaBoost
-from .loading import (FilterAnnotations, LoadAnnotations, LoadImageFromFile,
-                      LoadImageFromWebcam, LoadMultiChannelImageFromFiles,
-                      LoadPanopticAnnotations, LoadProposals)
+from .loading import (
+    FilterAnnotations,
+    LoadAnnotations,
+    LoadImageFromFile,
+    LoadImageFromWebcam,
+    LoadMultiChannelImageFromFiles,
+    LoadPanopticAnnotations,
+    LoadProposals,
+)
 from .test_time_aug import MultiScaleFlipAug
-from .transforms import (Albu, CopyPaste, CutOut, Expand, MinIoURandomCrop,
-                         MixUp, Mosaic, Normalize, Pad, PhotoMetricDistortion,
-                         RandomAffine, RandomCenterCropPad, RandomCrop,
-                         RandomFlip, RandomShift, Resize, SegRescale,
-                         YOLOXHSVRandomAug)
+from .transforms import (
+    Albu,
+    CopyPaste,
+    CutOut,
+    Expand,
+    MinIoURandomCrop,
+    MixUp,
+    Mosaic,
+    Normalize,
+    OffsetTransform,
+    Pad,
+    PhotoMetricDistortion,
+    Pointobb2RBBox,
+    RandomAffine,
+    RandomCenterCropPad,
+    RandomCrop,
+    RandomFlip,
+    RandomRotate,
+    RandomShift,
+    Resize,
+    SegRescale,
+    YOLOXHSVRandomAug,
+)
 
 __all__ = [
-    'Compose', 'to_tensor', 'ToTensor', 'ImageToTensor', 'ToDataContainer',
-    'Transpose', 'Collect', 'DefaultFormatBundle', 'LoadAnnotations',
-    'LoadImageFromFile', 'LoadImageFromWebcam', 'LoadPanopticAnnotations',
-    'LoadMultiChannelImageFromFiles', 'LoadProposals', 'FilterAnnotations',
-    'MultiScaleFlipAug', 'Resize', 'RandomFlip', 'Pad', 'RandomCrop',
-    'Normalize', 'SegRescale', 'MinIoURandomCrop', 'Expand',
-    'PhotoMetricDistortion', 'Albu', 'InstaBoost', 'RandomCenterCropPad',
-    'AutoAugment', 'CutOut', 'Shear', 'Rotate', 'ColorTransform',
-    'EqualizeTransform', 'BrightnessTransform', 'ContrastTransform',
-    'Translate', 'RandomShift', 'Mosaic', 'MixUp', 'RandomAffine',
-    'YOLOXHSVRandomAug', 'CopyPaste'
+    "Compose",
+    "to_tensor",
+    "ToTensor",
+    "ImageToTensor",
+    "ToDataContainer",
+    "Transpose",
+    "Collect",
+    "DefaultFormatBundle",
+    "LOFTFormatBundle",
+    "LoadAnnotations",
+    "LoadImageFromFile",
+    "LoadImageFromWebcam",
+    "LoadPanopticAnnotations",
+    "LoadMultiChannelImageFromFiles",
+    "LoadProposals",
+    "FilterAnnotations",
+    "MultiScaleFlipAug",
+    "Resize",
+    "RandomFlip",
+    "Pad",
+    "RandomCrop",
+    "Normalize",
+    "SegRescale",
+    "MinIoURandomCrop",
+    "Expand",
+    "PhotoMetricDistortion",
+    "Albu",
+    "InstaBoost",
+    "RandomCenterCropPad",
+    "AutoAugment",
+    "CutOut",
+    "Shear",
+    "Rotate",
+    "ColorTransform",
+    "EqualizeTransform",
+    "BrightnessTransform",
+    "ContrastTransform",
+    "Translate",
+    "RandomShift",
+    "Mosaic",
+    "MixUp",
+    "RandomAffine",
+    "YOLOXHSVRandomAug",
+    "CopyPaste",
+    "OffsetTransform",
+    "Pointobb2RBBox",
+    "RandomRotate",
 ]
diff --git a/mmdet/datasets/pipelines/compose.py b/mmdet/datasets/pipelines/compose.py
index d7592200..5a7d6768 100644
--- a/mmdet/datasets/pipelines/compose.py
+++ b/mmdet/datasets/pipelines/compose.py
@@ -1,6 +1,8 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 import collections
 
+import numpy as np
+from mmcv.parallel import DataContainer as DC
 from mmcv.utils import build_from_cfg
 
 from ..builder import PIPELINES
@@ -25,7 +27,7 @@ def __init__(self, transforms):
             elif callable(transform):
                 self.transforms.append(transform)
             else:
-                raise TypeError('transform must be callable or a dict')
+                raise TypeError("transform must be callable or a dict")
 
     def __call__(self, data):
         """Call function to apply transforms sequentially.
@@ -44,12 +46,12 @@ def __call__(self, data):
         return data
 
     def __repr__(self):
-        format_string = self.__class__.__name__ + '('
+        format_string = self.__class__.__name__ + "("
         for t in self.transforms:
             str_ = t.__repr__()
-            if 'Compose(' in str_:
-                str_ = str_.replace('\n', '\n    ')
-            format_string += '\n'
-            format_string += f'    {str_}'
-        format_string += '\n)'
+            if "Compose(" in str_:
+                str_ = str_.replace("\n", "\n    ")
+            format_string += "\n"
+            format_string += f"    {str_}"
+        format_string += "\n)"
         return format_string
diff --git a/mmdet/datasets/pipelines/formatting.py b/mmdet/datasets/pipelines/formatting.py
index 2e07f389..e6815259 100644
--- a/mmdet/datasets/pipelines/formatting.py
+++ b/mmdet/datasets/pipelines/formatting.py
@@ -31,7 +31,7 @@ def to_tensor(data):
     elif isinstance(data, float):
         return torch.FloatTensor([data])
     else:
-        raise TypeError(f'type {type(data)} cannot be converted to tensor.')
+        raise TypeError(f"type {type(data)} cannot be converted to tensor.")
 
 
 @PIPELINES.register_module()
@@ -60,7 +60,7 @@ def __call__(self, results):
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + f'(keys={self.keys})'
+        return self.__class__.__name__ + f"(keys={self.keys})"
 
 
 @PIPELINES.register_module()
@@ -97,7 +97,7 @@ def __call__(self, results):
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + f'(keys={self.keys})'
+        return self.__class__.__name__ + f"(keys={self.keys})"
 
 
 @PIPELINES.register_module()
@@ -128,8 +128,7 @@ def __call__(self, results):
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + \
-               f'(keys={self.keys}, order={self.order})'
+        return self.__class__.__name__ + f"(keys={self.keys}, order={self.order})"
 
 
 @PIPELINES.register_module()
@@ -144,9 +143,9 @@ class ToDataContainer:
             dict(key='gt_labels'))``.
     """
 
-    def __init__(self,
-                 fields=(dict(key='img', stack=True), dict(key='gt_bboxes'),
-                         dict(key='gt_labels'))):
+    def __init__(
+        self, fields=(dict(key="img", stack=True), dict(key="gt_bboxes"), dict(key="gt_labels"))
+    ):
         self.fields = fields
 
     def __call__(self, results):
@@ -163,12 +162,12 @@ def __call__(self, results):
 
         for field in self.fields:
             field = field.copy()
-            key = field.pop('key')
+            key = field.pop("key")
             results[key] = DC(results[key], **field)
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + f'(fields={self.fields})'
+        return self.__class__.__name__ + f"(fields={self.fields})"
 
 
 @PIPELINES.register_module()
@@ -197,9 +196,7 @@ class DefaultFormatBundle:
             will be set to 0 by default, which should be 255.
     """
 
-    def __init__(self,
-                 img_to_float=True,
-                 pad_val=dict(img=0, masks=0, seg=255)):
+    def __init__(self, img_to_float=True, pad_val=dict(img=0, masks=0, seg=255)):
         self.img_to_float = img_to_float
         self.pad_val = pad_val
 
@@ -214,8 +211,8 @@ def __call__(self, results):
                 default bundle.
         """
 
-        if 'img' in results:
-            img = results['img']
+        if "img" in results:
+            img = results["img"]
             if self.img_to_float is True and img.dtype == np.uint8:
                 # Normally, image is of uint8 type without normalization.
                 # At this time, it needs to be forced to be converted to
@@ -238,22 +235,21 @@ def __call__(self, results):
                 img = to_tensor(img)
             else:
                 img = to_tensor(img).permute(2, 0, 1).contiguous()
-            results['img'] = DC(
-                img, padding_value=self.pad_val['img'], stack=True)
-        for key in ['proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels']:
+            results["img"] = DC(img, padding_value=self.pad_val["img"], stack=True)
+        for key in ["proposals", "gt_bboxes", "gt_bboxes_ignore", "gt_labels"]:
             if key not in results:
                 continue
             results[key] = DC(to_tensor(results[key]))
-        if 'gt_masks' in results:
-            results['gt_masks'] = DC(
-                results['gt_masks'],
-                padding_value=self.pad_val['masks'],
-                cpu_only=True)
-        if 'gt_semantic_seg' in results:
-            results['gt_semantic_seg'] = DC(
-                to_tensor(results['gt_semantic_seg'][None, ...]),
-                padding_value=self.pad_val['seg'],
-                stack=True)
+        if "gt_masks" in results:
+            results["gt_masks"] = DC(
+                results["gt_masks"], padding_value=self.pad_val["masks"], cpu_only=True
+            )
+        if "gt_semantic_seg" in results:
+            results["gt_semantic_seg"] = DC(
+                to_tensor(results["gt_semantic_seg"][None, ...]),
+                padding_value=self.pad_val["seg"],
+                stack=True,
+            )
         return results
 
     def _add_default_meta_keys(self, results):
@@ -269,21 +265,120 @@ def _add_default_meta_keys(self, results):
         Returns:
             results (dict): Updated result dict contains the data to convert.
         """
-        img = results['img']
-        results.setdefault('pad_shape', img.shape)
-        results.setdefault('scale_factor', 1.0)
+        img = results["img"]
+        results.setdefault("pad_shape", img.shape)
+        results.setdefault("scale_factor", 1.0)
         num_channels = 1 if len(img.shape) < 3 else img.shape[2]
         results.setdefault(
-            'img_norm_cfg',
+            "img_norm_cfg",
             dict(
                 mean=np.zeros(num_channels, dtype=np.float32),
                 std=np.ones(num_channels, dtype=np.float32),
-                to_rgb=False))
+                to_rgb=False,
+            ),
+        )
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + \
-               f'(img_to_float={self.img_to_float})'
+        return self.__class__.__name__ + f"(img_to_float={self.img_to_float})"
+
+
+@PIPELINES.register_module()
+class LOFTFormatBundle(DefaultFormatBundle):
+    """LOFT formatting bundle."""
+
+    def __call__(self, results):
+        """Call function to transform and format common fields in results.
+
+        Args:
+            results (dict): Result dict contains the data to convert.
+
+        Returns:
+            dict: The result dict contains the data that is formatted with \
+                default bundle.
+        """
+
+        if "img" in results:
+            img = results["img"]
+            if self.img_to_float is True and img.dtype == np.uint8:
+                # Normally, image is of uint8 type without normalization.
+                # At this time, it needs to be forced to be converted to
+                # flot32, otherwise the model training and inference
+                # will be wrong. Only used for YOLOX currently .
+                img = img.astype(np.float32)
+            # add default meta keys
+            results = self._add_default_meta_keys(results)
+            if len(img.shape) < 3:
+                img = np.expand_dims(img, -1)
+            # To improve the computational speed by by 3-5 times, apply:
+            # If image is not contiguous, use
+            # `numpy.transpose()` followed by `numpy.ascontiguousarray()`
+            # If image is already contiguous, use
+            # `torch.permute()` followed by `torch.contiguous()`
+            # Refer to https://github.com/open-mmlab/mmdetection/pull/9533
+            # for more details
+            if not img.flags.c_contiguous:
+                img = np.ascontiguousarray(img.transpose(2, 0, 1))
+                img = to_tensor(img)
+            else:
+                img = to_tensor(img).permute(2, 0, 1).contiguous()
+            results["img"] = DC(img, padding_value=self.pad_val["img"], stack=True)
+        # for key in ['proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels']:
+        for key in [
+            "proposals",
+            "gt_bboxes",
+            "gt_bboxes_ignore",
+            "gt_labels",
+            "gt_offsets",
+            "gt_heights",
+            "gt_nadir_angles",
+            "gt_offset_angles",
+            "gt_rbboxes",
+            "gt_roof_bboxes",
+            "gt_footprint_bboxes",
+            "gt_only_footprint_flag",
+            "gt_roof_masks",
+            "gt_is_semi_supervised_sample",
+            "gt_is_valid_height_sample",
+            "height_mask_shape",
+            "image_scale_footprint_mask_shape",
+            # "gt_footprint_masks",
+        ]:
+            if key not in results:
+                continue
+            results[key] = DC(to_tensor(results[key]))
+        if "gt_masks" in results:
+            results["gt_masks"] = DC(
+                results["gt_masks"], padding_value=self.pad_val["masks"], cpu_only=True
+            )
+        if "gt_roof_masks" in results:
+            results["gt_roof_masks"] = DC(results["gt_roof_masks"], cpu_only=True)
+        if "gt_height_masks" in results:
+            results["gt_height_masks"] = DC(results["gt_height_masks"], cpu_only=True)
+        if "gt_footprint_masks" in results:
+            results["gt_footprint_masks"] = DC(results["gt_footprint_masks"], cpu_only=True)
+        if "gt_image_scale_footprint_masks" in results:
+            results["gt_image_scale_footprint_masks"] = DC(
+                results["gt_image_scale_footprint_masks"], cpu_only=True
+            )
+        if "gt_footprint_polygons" in results:
+            results["gt_footprint_polygons"] = DC(results["gt_footprint_polygons"], cpu_only=True)
+        if "gt_semantic_seg" in results:
+            results["gt_semantic_seg"] = DC(
+                to_tensor(results["gt_semantic_seg"][None, ...]),
+                padding_value=self.pad_val["seg"],
+                stack=True,
+            )
+        if "gt_edge_maps" in results:
+            results["gt_edge_maps"] = DC(results["gt_edge_maps"], cpu_only=True)
+        if "gt_side_face_maps" in results:
+            results["gt_side_face_maps"] = DC(results["gt_side_face_maps"], cpu_only=True)
+        if "gt_offset_field" in results:
+            results["gt_offset_field"] = DC(
+                to_tensor(results["gt_offset_field"][None, ...]), stack=True
+            )
+
+        return results
 
 
 @PIPELINES.register_module()
@@ -326,11 +421,21 @@ class Collect:
             'img_norm_cfg')``
     """
 
-    def __init__(self,
-                 keys,
-                 meta_keys=('filename', 'ori_filename', 'ori_shape',
-                            'img_shape', 'pad_shape', 'scale_factor', 'flip',
-                            'flip_direction', 'img_norm_cfg')):
+    def __init__(
+        self,
+        keys,
+        meta_keys=(
+            "filename",
+            "ori_filename",
+            "ori_shape",
+            "img_shape",
+            "pad_shape",
+            "scale_factor",
+            "flip",
+            "flip_direction",
+            "img_norm_cfg",
+        ),
+    ):
         self.keys = keys
         self.meta_keys = meta_keys
 
@@ -352,14 +457,13 @@ def __call__(self, results):
         img_meta = {}
         for key in self.meta_keys:
             img_meta[key] = results[key]
-        data['img_metas'] = DC(img_meta, cpu_only=True)
+        data["img_metas"] = DC(img_meta, cpu_only=True)
         for key in self.keys:
             data[key] = results[key]
         return data
 
     def __repr__(self):
-        return self.__class__.__name__ + \
-               f'(keys={self.keys}, meta_keys={self.meta_keys})'
+        return self.__class__.__name__ + f"(keys={self.keys}, meta_keys={self.meta_keys})"
 
 
 @PIPELINES.register_module()
@@ -400,4 +504,4 @@ def __call__(self, results):
         return results
 
     def __repr__(self):
-        return f'{self.__class__.__name__}()'
+        return f"{self.__class__.__name__}()"
diff --git a/mmdet/datasets/pipelines/loading.py b/mmdet/datasets/pipelines/loading.py
index 8af8cf35..09ac17a2 100644
--- a/mmdet/datasets/pipelines/loading.py
+++ b/mmdet/datasets/pipelines/loading.py
@@ -1,5 +1,6 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 import os.path as osp
+from copy import deepcopy
 
 import mmcv
 import numpy as np
@@ -34,11 +35,13 @@ class LoadImageFromFile:
             Defaults to ``dict(backend='disk')``.
     """
 
-    def __init__(self,
-                 to_float32=False,
-                 color_type='color',
-                 channel_order='bgr',
-                 file_client_args=dict(backend='disk')):
+    def __init__(
+        self,
+        to_float32=False,
+        color_type="color",
+        channel_order="bgr",
+        file_client_args=dict(backend="disk"),
+    ):
         self.to_float32 = to_float32
         self.color_type = color_type
         self.channel_order = channel_order
@@ -58,32 +61,32 @@ def __call__(self, results):
         if self.file_client is None:
             self.file_client = mmcv.FileClient(**self.file_client_args)
 
-        if results['img_prefix'] is not None:
-            filename = osp.join(results['img_prefix'],
-                                results['img_info']['filename'])
+        if results["img_prefix"] is not None:
+            filename = osp.join(results["img_prefix"], results["img_info"]["filename"])
         else:
-            filename = results['img_info']['filename']
+            filename = results["img_info"]["filename"]
 
         img_bytes = self.file_client.get(filename)
-        img = mmcv.imfrombytes(
-            img_bytes, flag=self.color_type, channel_order=self.channel_order)
+        img = mmcv.imfrombytes(img_bytes, flag=self.color_type, channel_order=self.channel_order)
         if self.to_float32:
             img = img.astype(np.float32)
 
-        results['filename'] = filename
-        results['ori_filename'] = results['img_info']['filename']
-        results['img'] = img
-        results['img_shape'] = img.shape
-        results['ori_shape'] = img.shape
-        results['img_fields'] = ['img']
+        results["filename"] = filename
+        results["ori_filename"] = results["img_info"]["filename"]
+        results["img"] = img
+        results["img_shape"] = img.shape
+        results["ori_shape"] = img.shape
+        results["img_fields"] = ["img"]
         return results
 
     def __repr__(self):
-        repr_str = (f'{self.__class__.__name__}('
-                    f'to_float32={self.to_float32}, '
-                    f"color_type='{self.color_type}', "
-                    f"channel_order='{self.channel_order}', "
-                    f'file_client_args={self.file_client_args})')
+        repr_str = (
+            f"{self.__class__.__name__}("
+            f"to_float32={self.to_float32}, "
+            f"color_type='{self.color_type}', "
+            f"channel_order='{self.channel_order}', "
+            f"file_client_args={self.file_client_args})"
+        )
         return repr_str
 
 
@@ -106,16 +109,16 @@ def __call__(self, results):
             dict: The dict contains loaded image and meta information.
         """
 
-        img = results['img']
+        img = results["img"]
         if self.to_float32:
             img = img.astype(np.float32)
 
-        results['filename'] = None
-        results['ori_filename'] = None
-        results['img'] = img
-        results['img_shape'] = img.shape
-        results['ori_shape'] = img.shape
-        results['img_fields'] = ['img']
+        results["filename"] = None
+        results["ori_filename"] = None
+        results["img"] = img
+        results["img_shape"] = img.shape
+        results["ori_shape"] = img.shape
+        results["img_fields"] = ["img"]
         return results
 
 
@@ -140,10 +143,9 @@ class LoadMultiChannelImageFromFiles:
             Defaults to ``dict(backend='disk')``.
     """
 
-    def __init__(self,
-                 to_float32=False,
-                 color_type='unchanged',
-                 file_client_args=dict(backend='disk')):
+    def __init__(
+        self, to_float32=False, color_type="unchanged", file_client_args=dict(backend="disk")
+    ):
         self.to_float32 = to_float32
         self.color_type = color_type
         self.file_client_args = file_client_args.copy()
@@ -163,13 +165,12 @@ def __call__(self, results):
         if self.file_client is None:
             self.file_client = mmcv.FileClient(**self.file_client_args)
 
-        if results['img_prefix'] is not None:
+        if results["img_prefix"] is not None:
             filename = [
-                osp.join(results['img_prefix'], fname)
-                for fname in results['img_info']['filename']
+                osp.join(results["img_prefix"], fname) for fname in results["img_info"]["filename"]
             ]
         else:
-            filename = results['img_info']['filename']
+            filename = results["img_info"]["filename"]
 
         img = []
         for name in filename:
@@ -179,68 +180,106 @@ def __call__(self, results):
         if self.to_float32:
             img = img.astype(np.float32)
 
-        results['filename'] = filename
-        results['ori_filename'] = results['img_info']['filename']
-        results['img'] = img
-        results['img_shape'] = img.shape
-        results['ori_shape'] = img.shape
+        results["filename"] = filename
+        results["ori_filename"] = results["img_info"]["filename"]
+        results["img"] = img
+        results["img_shape"] = img.shape
+        results["ori_shape"] = img.shape
         # Set initial values for default meta_keys
-        results['pad_shape'] = img.shape
-        results['scale_factor'] = 1.0
+        results["pad_shape"] = img.shape
+        results["scale_factor"] = 1.0
         num_channels = 1 if len(img.shape) < 3 else img.shape[2]
-        results['img_norm_cfg'] = dict(
+        results["img_norm_cfg"] = dict(
             mean=np.zeros(num_channels, dtype=np.float32),
             std=np.ones(num_channels, dtype=np.float32),
-            to_rgb=False)
+            to_rgb=False,
+        )
         return results
 
     def __repr__(self):
-        repr_str = (f'{self.__class__.__name__}('
-                    f'to_float32={self.to_float32}, '
-                    f"color_type='{self.color_type}', "
-                    f'file_client_args={self.file_client_args})')
+        repr_str = (
+            f"{self.__class__.__name__}("
+            f"to_float32={self.to_float32}, "
+            f"color_type='{self.color_type}', "
+            f"file_client_args={self.file_client_args})"
+        )
         return repr_str
 
 
 @PIPELINES.register_module()
-class LoadAnnotations:
-    """Load multiple types of annotations.
+class LoadAnnotations(object):
+    """Load mutiple types of annotations.
 
     Args:
         with_bbox (bool): Whether to parse and load the bbox annotation.
-             Default: True.
+            Default: True.
         with_label (bool): Whether to parse and load the label annotation.
             Default: True.
         with_mask (bool): Whether to parse and load the mask annotation.
-             Default: False.
+            Default: False.
         with_seg (bool): Whether to parse and load the semantic segmentation
             annotation. Default: False.
         poly2mask (bool): Whether to convert the instance masks from polygons
             to bitmaps. Default: True.
-        denorm_bbox (bool): Whether to convert bbox from relative value to
-            absolute value. Only used in OpenImage Dataset.
-            Default: False.
         file_client_args (dict): Arguments to instantiate a FileClient.
             See :class:`mmcv.fileio.FileClient` for details.
             Defaults to ``dict(backend='disk')``.
     """
 
-    def __init__(self,
-                 with_bbox=True,
-                 with_label=True,
-                 with_mask=False,
-                 with_seg=False,
-                 poly2mask=True,
-                 denorm_bbox=False,
-                 file_client_args=dict(backend='disk')):
+    def __init__(
+        self,
+        with_bbox=True,
+        with_label=True,
+        with_mask=False,
+        with_seg=False,
+        with_offset=False,
+        with_height=False,
+        with_height_mask=False,
+        with_nadir_angle=False,
+        with_offset_angle=False,
+        with_rbbox=False,
+        with_edge=False,
+        with_side_face=False,
+        with_offset_field=False,
+        with_roof_bbox=False,
+        with_footprint_bbox=False,
+        with_only_footprint_flag=False,
+        with_semi_supervised_learning=False,
+        with_valid_height_flag=False,
+        with_roof_mask=False,
+        with_footprint_mask=False,
+        with_image_scale_footprint_mask=False,
+        poly2mask=True,
+        file_client_args=dict(backend="disk"),
+    ):
         self.with_bbox = with_bbox
         self.with_label = with_label
         self.with_mask = with_mask
         self.with_seg = with_seg
+        self.with_offset = with_offset
+        self.with_height = with_height
+        self.with_height_mask = with_height_mask
+        self.with_nadir_angle = with_nadir_angle
+        self.with_offset_angle = with_offset_angle
+        self.with_rbbox = with_rbbox
+        self.with_edge = with_edge
+        self.with_side_face = with_side_face
+        self.with_offset_field = with_offset_field
+        self.with_roof_bbox = with_roof_bbox
+        self.with_footprint_bbox = with_footprint_bbox
         self.poly2mask = poly2mask
-        self.denorm_bbox = denorm_bbox
         self.file_client_args = file_client_args.copy()
         self.file_client = None
+        self.with_only_footprint_flag = with_only_footprint_flag
+        self.with_roof_mask = with_roof_mask
+        self.with_footprint_mask = with_footprint_mask
+        self.with_image_scale_footprint_mask = with_image_scale_footprint_mask
+        if self.with_image_scale_footprint_mask:
+            assert self.with_footprint_mask
+        if self.with_height_mask:
+            assert self.with_height and self.with_footprint_mask
+        self.with_semi_supervised_learning = with_semi_supervised_learning
+        self.with_valid_height_flag = with_valid_height_flag
 
     def _load_bboxes(self, results):
         """Private function to load bounding box annotations.
@@ -252,26 +291,28 @@ def _load_bboxes(self, results):
             dict: The dict contains loaded bounding box annotations.
         """
 
-        ann_info = results['ann_info']
-        results['gt_bboxes'] = ann_info['bboxes'].copy()
-
-        if self.denorm_bbox:
-            bbox_num = results['gt_bboxes'].shape[0]
-            if bbox_num != 0:
-                h, w = results['img_shape'][:2]
-                results['gt_bboxes'][:, 0::2] *= w
-                results['gt_bboxes'][:, 1::2] *= h
+        ann_info = results["ann_info"]
+        results["gt_bboxes"] = ann_info["bboxes"].copy()
 
-        gt_bboxes_ignore = ann_info.get('bboxes_ignore', None)
+        gt_bboxes_ignore = ann_info.get("bboxes_ignore", None)
         if gt_bboxes_ignore is not None:
-            results['gt_bboxes_ignore'] = gt_bboxes_ignore.copy()
-            results['bbox_fields'].append('gt_bboxes_ignore')
-        results['bbox_fields'].append('gt_bboxes')
+            results["gt_bboxes_ignore"] = gt_bboxes_ignore.copy()
+            results["bbox_fields"].append("gt_bboxes_ignore")
+        results["bbox_fields"].append("gt_bboxes")
+        return results
 
-        gt_is_group_ofs = ann_info.get('gt_is_group_ofs', None)
-        if gt_is_group_ofs is not None:
-            results['gt_is_group_ofs'] = gt_is_group_ofs.copy()
+    def _load_roof_bboxes(self, results):
+        ann_info = results["ann_info"]
+        results["gt_roof_bboxes"] = ann_info["roof_bboxes"].copy()
 
+        results["bbox_fields"].append("gt_roof_bboxes")
+        return results
+
+    def _load_footprint_bboxes(self, results):
+        ann_info = results["ann_info"]
+        results["gt_footprint_bboxes"] = ann_info["footprint_bboxes"].copy()
+
+        results["bbox_fields"].append("gt_footprint_bboxes")
         return results
 
     def _load_labels(self, results):
@@ -284,7 +325,7 @@ def _load_labels(self, results):
             dict: The dict contains loaded label annotations.
         """
 
-        results['gt_labels'] = results['ann_info']['labels'].copy()
+        results["gt_labels"] = results["ann_info"]["labels"].copy()
         return results
 
     def _poly2mask(self, mask_ann, img_h, img_w):
@@ -305,7 +346,7 @@ def _poly2mask(self, mask_ann, img_h, img_w):
             # we merge all parts into one mask rle code
             rles = maskUtils.frPyObjects(mask_ann, img_h, img_w)
             rle = maskUtils.merge(rles)
-        elif isinstance(mask_ann['counts'], list):
+        elif isinstance(mask_ann["counts"], list):
             # uncompressed RLE
             rle = maskUtils.frPyObjects(mask_ann, img_h, img_w)
         else:
@@ -343,17 +384,68 @@ def _load_masks(self, results):
                 :obj:`PolygonMasks`. Otherwise, :obj:`BitmapMasks` is used.
         """
 
-        h, w = results['img_info']['height'], results['img_info']['width']
-        gt_masks = results['ann_info']['masks']
+        h, w = results["img_info"]["height"], results["img_info"]["width"]
+        gt_masks = results["ann_info"]["masks"]
         if self.poly2mask:
-            gt_masks = BitmapMasks(
-                [self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
+            gt_masks = BitmapMasks([self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
         else:
-            gt_masks = PolygonMasks(
-                [self.process_polygons(polygons) for polygons in gt_masks], h,
-                w)
-        results['gt_masks'] = gt_masks
-        results['mask_fields'].append('gt_masks')
+            gt_masks = PolygonMasks(  # pylint: disable=abstract-class-instantiated
+                [self.process_polygons(polygons) for polygons in gt_masks], h, w
+            )
+        results["gt_masks"] = gt_masks
+        results["mask_fields"].append("gt_masks")
+        return results
+
+    def _load_roof_masks(self, results):
+        """Private function to load mask annotations.
+
+        Args:
+            results (dict): Result dict from :obj:`mmdet.CustomDataset`.
+
+        Returns:
+            dict: The dict contains loaded mask annotations.
+                If ``self.poly2mask`` is set ``True``, `gt_mask` will contain
+                :obj:`PolygonMasks`. Otherwise, :obj:`BitmapMasks` is used.
+        """
+
+        h, w = results["img_info"]["height"], results["img_info"]["width"]
+        gt_roof_masks = results["ann_info"]["roof_masks"]
+        if self.poly2mask:
+            gt_roof_masks = BitmapMasks(
+                [self._poly2mask(mask, h, w) for mask in gt_roof_masks], h, w
+            )
+        else:
+            gt_roof_masks = PolygonMasks(  # pylint: disable=abstract-class-instantiated
+                [self.process_polygons(polygons) for polygons in gt_roof_masks], h, w
+            )
+        results["gt_roof_masks"] = gt_roof_masks
+        results["mask_fields"].append("gt_roof_masks")
+        return results
+
+    def _load_footprint_masks(self, results):
+        """Private function to load mask annotations.
+
+        Args:
+            results (dict): Result dict from :obj:`mmdet.CustomDataset`.
+
+        Returns:
+            dict: The dict contains loaded mask annotations.
+                If ``self.poly2mask`` is set ``True``, `gt_mask` will contain
+                :obj:`PolygonMasks`. Otherwise, :obj:`BitmapMasks` is used.
+        """
+
+        h, w = results["img_info"]["height"], results["img_info"]["width"]
+        gt_footprint_polygons = results["ann_info"]["footprint_masks"]
+        if self.poly2mask:
+            gt_footprint_masks = BitmapMasks(
+                [self._poly2mask(mask, h, w) for mask in gt_footprint_polygons], h, w
+            )
+        else:
+            gt_footprint_masks = PolygonMasks(  # pylint: disable=abstract-class-instantiated
+                [self.process_polygons(polygons) for polygons in gt_footprint_polygons], h, w
+            )
+        results["gt_footprint_masks"] = gt_footprint_masks
+        results["mask_fields"].append("gt_footprint_masks")
         return results
 
     def _load_semantic_seg(self, results):
@@ -369,12 +461,234 @@ def _load_semantic_seg(self, results):
         if self.file_client is None:
             self.file_client = mmcv.FileClient(**self.file_client_args)
 
-        filename = osp.join(results['seg_prefix'],
-                            results['ann_info']['seg_map'])
+        filename = osp.join(results["seg_prefix"], results["ann_info"]["seg_map"])
         img_bytes = self.file_client.get(filename)
-        results['gt_semantic_seg'] = mmcv.imfrombytes(
-            img_bytes, flag='unchanged').squeeze()
-        results['seg_fields'].append('gt_semantic_seg')
+        results["gt_semantic_seg"] = mmcv.imfrombytes(img_bytes, flag="unchanged").squeeze()
+        results["seg_fields"].append("gt_semantic_seg")
+        return results
+
+    def _load_offsets(self, results):
+        """loading offset value
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded offset annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_offsets"] = ann_info["offsets"]
+        results["offset_fields"].append("gt_offsets")
+        return results
+
+    def _load_heights(self, results):
+        """loading building height value
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded height annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_heights"] = ann_info["building_heights"]
+        results["height_fields"].append("gt_heights")
+        return results
+
+    def _load_height_masks(self, results):
+        """loading building height mask.
+
+        The pixel values inside the footprint area equal to the height value.
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded height mask annotation.
+        """
+        ann_info = results["ann_info"]
+        resolution = ann_info["resolution"]
+
+        gt_heights = results["gt_heights"].astype(np.float32).reshape((-1, 1, 1))
+        gt_height_masks = deepcopy(results["gt_footprint_masks"])
+        gt_height_masks.masks = gt_height_masks.masks.astype(np.float32)
+        gt_height_masks.masks *= gt_heights / resolution
+        gt_height_masks.masks = np.stack([np.sum(gt_height_masks.masks, axis=0)], axis=0)
+        results["gt_height_masks"] = gt_height_masks
+        results["mask_fields"].append("gt_height_masks")
+        results["height_mask_shape"] = ann_info["height_mask_shape"]
+
+        return results
+
+    def _load_image_scale_footprint_masks(self, results):
+        """loading building footprint mask in image scale.
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded height mask annotation.
+        """
+        ann_info = results["ann_info"]
+
+        gt_footprint_masks = deepcopy(results["gt_footprint_masks"])
+        gt_footprint_masks.masks = gt_footprint_masks.masks.astype(np.float32)
+        gt_footprint_masks.masks = np.stack([np.sum(gt_footprint_masks.masks, axis=0)], axis=0)
+        results["gt_image_scale_footprint_masks"] = gt_footprint_masks
+        results["mask_fields"].append("gt_image_scale_footprint_masks")
+        results["image_scale_footprint_mask_shape"] = ann_info["image_scale_footprint_mask_shape"]
+
+        return results
+
+    def _load_nadir_angle(self, results):
+        """loading nadir angle value
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded angle annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_nadir_angles"] = ann_info["nadir_angles"]
+        return results
+
+    def _load_offset_angle(self, results):
+        """loading offset angle value
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded offset angle annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_offset_angles"] = ann_info["offset_angles"]
+        results["angle_fields"].append("gt_offset_angles")
+        return results
+
+    def _load_only_footprint_flag(self, results):
+        """loading footprint flag which used in semi-supervised learning framework
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded footprint flag annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_only_footprint_flag"] = ann_info["only_footprint_flag"]
+        return results
+
+    def _load_semi_supervised_sample_flag(self, results):
+        """loading semi_supervised sample flag which used in semi-supervised learning framework
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded semi-supervised flag annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_is_semi_supervised_sample"] = ann_info["is_semi_supervised_sample"]
+        return results
+
+    def _load_valid_height_flag(self, results):
+        """loading valid height sample flag which used in semi-supervised learning framework
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains valid height flag annotations.
+        """
+        ann_info = results["ann_info"]
+        results["gt_is_valid_height_sample"] = ann_info["is_valid_height_sample"]
+        return results
+
+    def _load_rbboxes(self, results):
+        ann_info = results["ann_info"]
+        results["gt_rbboxes"] = ann_info["rbboxes"]
+        results["rbbox_fields"].append("gt_rbboxes")
+        return results
+
+    def _load_edge_map(self, results):
+        """loading the edge map which generated by weijia
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded edge map annotations.
+        """
+        if self.file_client is None:
+            self.file_client = mmcv.FileClient(**self.file_client_args)
+
+        filename = osp.join(results["edge_prefix"], results["ann_info"]["edge_map"])
+        img_bytes = self.file_client.get(filename)
+        edge_maps = mmcv.imfrombytes(img_bytes, flag="unchanged").squeeze()
+
+        h, w = results["img_info"]["height"], results["img_info"]["width"]
+        mask_num = len(results["ann_info"]["masks"])
+        gt_edge_maps = BitmapMasks([edge_maps for _ in range(mask_num)], h, w)
+
+        results["gt_edge_maps"] = gt_edge_maps
+        results["edge_fields"].append("gt_edge_maps")
+        return results
+
+    def _load_side_face_map(self, results):
+        """loading side face map
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded side face map annotations.
+        """
+        if self.file_client is None:
+            self.file_client = mmcv.FileClient(**self.file_client_args)
+
+        filename = osp.join(results["side_face_prefix"], results["ann_info"]["side_face_map"])
+        img_bytes = self.file_client.get(filename)
+        side_face_maps = mmcv.imfrombytes(img_bytes, flag="unchanged").squeeze()
+
+        h, w = results["img_info"]["height"], results["img_info"]["width"]
+        mask_num = len(results["ann_info"]["masks"])
+        gt_side_face_maps = BitmapMasks([side_face_maps for _ in range(mask_num)], h, w)
+
+        results["gt_side_face_maps"] = gt_side_face_maps
+        results["side_face_fields"].append("gt_side_face_maps")
+        return results
+
+    def _load_offset_field(self, results):
+        """loading offset field map which generated by weijia and lingxuan
+
+        Args:
+            results (dict): Result dict from :obj:`dataset`.
+
+        Returns:
+            dict: The dict contains loaded offset field annotations.
+        """
+        if self.file_client is None:
+            self.file_client = mmcv.FileClient(**self.file_client_args)
+
+        filename = osp.join(results["offset_field_prefix"], results["ann_info"]["offset_field"])
+
+        gt_offset_field = np.load(filename).astype(np.float32)
+
+        ignores_x, ignores_y = [], []
+        for subclass in [400, 500]:
+            ignores_x.append(gt_offset_field[..., 0] == subclass)
+            ignores_y.append(gt_offset_field[..., 1] == subclass)
+
+        ignore_x_bool = np.logical_or.reduce(tuple(ignores_x))
+        ignore_y_bool = np.logical_or.reduce(tuple(ignores_y))
+
+        gt_offset_field[..., 0][ignore_x_bool] = 0.0
+        gt_offset_field[..., 1][ignore_y_bool] = 0.0
+
+        results["gt_offset_field"] = gt_offset_field
+        results["offset_field_fields"].append("gt_offset_field")
         return results
 
     def __call__(self, results):
@@ -398,16 +712,50 @@ def __call__(self, results):
             results = self._load_masks(results)
         if self.with_seg:
             results = self._load_semantic_seg(results)
+        if self.with_offset:
+            results = self._load_offsets(results)
+        if self.with_height:
+            results = self._load_heights(results)
+        if self.with_nadir_angle:
+            results = self._load_nadir_angle(results)
+        if self.with_offset_angle:
+            results = self._load_offset_angle(results)
+        if self.with_rbbox:
+            results = self._load_rbboxes(results)
+        if self.with_edge:
+            results = self._load_edge_map(results)
+        if self.with_side_face:
+            results = self._load_side_face_map(results)
+        if self.with_roof_bbox:
+            results = self._load_roof_bboxes(results)
+        if self.with_footprint_bbox:
+            results = self._load_footprint_bboxes(results)
+        if self.with_offset_field:
+            results = self._load_offset_field(results)
+        if self.with_only_footprint_flag:
+            results = self._load_only_footprint_flag(results)
+        if self.with_semi_supervised_learning:
+            results = self._load_semi_supervised_sample_flag(results)
+        if self.with_valid_height_flag:
+            results = self._load_valid_height_flag(results)
+        if self.with_roof_mask:
+            results = self._load_roof_masks(results)
+        if self.with_footprint_mask:
+            results = self._load_footprint_masks(results)
+        if self.with_image_scale_footprint_mask:
+            results = self._load_image_scale_footprint_masks(results)
+        if self.with_height_mask:
+            results = self._load_height_masks(results)
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(with_bbox={self.with_bbox}, '
-        repr_str += f'with_label={self.with_label}, '
-        repr_str += f'with_mask={self.with_mask}, '
-        repr_str += f'with_seg={self.with_seg}, '
-        repr_str += f'poly2mask={self.poly2mask}, '
-        repr_str += f'file_client_args={self.file_client_args})'
+        repr_str += f"(with_bbox={self.with_bbox}, "
+        repr_str += f"with_label={self.with_label}, "
+        repr_str += f"with_mask={self.with_mask}, "
+        repr_str += f"with_seg={self.with_seg})"
+        repr_str += f"poly2mask={self.poly2mask})"
+        repr_str += f"poly2mask={self.file_client_args})"
         return repr_str
 
 
@@ -429,17 +777,20 @@ class LoadPanopticAnnotations(LoadAnnotations):
             Defaults to ``dict(backend='disk')``.
     """
 
-    def __init__(self,
-                 with_bbox=True,
-                 with_label=True,
-                 with_mask=True,
-                 with_seg=True,
-                 file_client_args=dict(backend='disk')):
+    def __init__(
+        self,
+        with_bbox=True,
+        with_label=True,
+        with_mask=True,
+        with_seg=True,
+        file_client_args=dict(backend="disk"),
+    ):
         if rgb2id is None:
             raise RuntimeError(
-                'panopticapi is not installed, please install it by: '
-                'pip install git+https://github.com/cocodataset/'
-                'panopticapi.git.')
+                "panopticapi is not installed, please install it by: "
+                "pip install git+https://github.com/cocodataset/"
+                "panopticapi.git."
+            )
 
         super(LoadPanopticAnnotations, self).__init__(
             with_bbox=with_bbox,
@@ -448,7 +799,8 @@ def __init__(self,
             with_seg=with_seg,
             poly2mask=True,
             denorm_bbox=False,
-            file_client_args=file_client_args)
+            file_client_args=file_client_args,
+        )
 
     def _load_masks_and_semantic_segs(self, results):
         """Private function to load mask and semantic segmentation annotations.
@@ -468,33 +820,31 @@ def _load_masks_and_semantic_segs(self, results):
         if self.file_client is None:
             self.file_client = mmcv.FileClient(**self.file_client_args)
 
-        filename = osp.join(results['seg_prefix'],
-                            results['ann_info']['seg_map'])
+        filename = osp.join(results["seg_prefix"], results["ann_info"]["seg_map"])
         img_bytes = self.file_client.get(filename)
-        pan_png = mmcv.imfrombytes(
-            img_bytes, flag='color', channel_order='rgb').squeeze()
+        pan_png = mmcv.imfrombytes(img_bytes, flag="color", channel_order="rgb").squeeze()
         pan_png = rgb2id(pan_png)
 
         gt_masks = []
         gt_seg = np.zeros_like(pan_png) + 255  # 255 as ignore
 
-        for mask_info in results['ann_info']['masks']:
-            mask = (pan_png == mask_info['id'])
-            gt_seg = np.where(mask, mask_info['category'], gt_seg)
+        for mask_info in results["ann_info"]["masks"]:
+            mask = pan_png == mask_info["id"]
+            gt_seg = np.where(mask, mask_info["category"], gt_seg)
 
             # The legal thing masks
-            if mask_info.get('is_thing'):
+            if mask_info.get("is_thing"):
                 gt_masks.append(mask.astype(np.uint8))
 
         if self.with_mask:
-            h, w = results['img_info']['height'], results['img_info']['width']
+            h, w = results["img_info"]["height"], results["img_info"]["width"]
             gt_masks = BitmapMasks(gt_masks, h, w)
-            results['gt_masks'] = gt_masks
-            results['mask_fields'].append('gt_masks')
+            results["gt_masks"] = gt_masks
+            results["mask_fields"].append("gt_masks")
 
         if self.with_seg:
-            results['gt_semantic_seg'] = gt_seg
-            results['seg_fields'].append('gt_semantic_seg')
+            results["gt_semantic_seg"] = gt_seg
+            results["seg_fields"].append("gt_semantic_seg")
         return results
 
     def __call__(self, results):
@@ -546,25 +896,24 @@ def __call__(self, results):
             dict: The dict contains loaded proposal annotations.
         """
 
-        proposals = results['proposals']
+        proposals = results["proposals"]
         if proposals.shape[1] not in (4, 5):
             raise AssertionError(
-                'proposals should have shapes (n, 4) or (n, 5), '
-                f'but found {proposals.shape}')
+                "proposals should have shapes (n, 4) or (n, 5), " f"but found {proposals.shape}"
+            )
         proposals = proposals[:, :4]
 
         if self.num_max_proposals is not None:
-            proposals = proposals[:self.num_max_proposals]
+            proposals = proposals[: self.num_max_proposals]
 
         if len(proposals) == 0:
             proposals = np.array([[0, 0, 0, 0]], dtype=np.float32)
-        results['proposals'] = proposals
-        results['bbox_fields'].append('proposals')
+        results["proposals"] = proposals
+        results["bbox_fields"].append("proposals")
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + \
-            f'(num_max_proposals={self.num_max_proposals})'
+        return self.__class__.__name__ + f"(num_max_proposals={self.num_max_proposals})"
 
 
 @PIPELINES.register_module()
@@ -584,13 +933,15 @@ class FilterAnnotations:
             becomes an empty bbox after filtering. Default: True
     """
 
-    def __init__(self,
-                 min_gt_bbox_wh=(1., 1.),
-                 min_gt_mask_area=1,
-                 by_box=True,
-                 by_mask=False,
-                 keep_empty=True):
-        # TODO: add more filter options
+    def __init__(
+        self,
+        min_gt_bbox_wh=(1.0, 1.0),
+        min_gt_mask_area=1,
+        by_box=True,
+        by_mask=False,
+        keep_empty=True,
+    ):
+        # TOD: add more filter options
         assert by_box or by_mask
         self.min_gt_bbox_wh = min_gt_bbox_wh
         self.min_gt_mask_area = min_gt_mask_area
@@ -600,12 +951,12 @@ def __init__(self,
 
     def __call__(self, results):
         if self.by_box:
-            assert 'gt_bboxes' in results
-            gt_bboxes = results['gt_bboxes']
+            assert "gt_bboxes" in results
+            gt_bboxes = results["gt_bboxes"]
             instance_num = gt_bboxes.shape[0]
         if self.by_mask:
-            assert 'gt_masks' in results
-            gt_masks = results['gt_masks']
+            assert "gt_masks" in results
+            gt_masks = results["gt_masks"]
             instance_num = len(gt_masks)
 
         if instance_num == 0:
@@ -615,10 +966,9 @@ def __call__(self, results):
         if self.by_box:
             w = gt_bboxes[:, 2] - gt_bboxes[:, 0]
             h = gt_bboxes[:, 3] - gt_bboxes[:, 1]
-            tests.append((w > self.min_gt_bbox_wh[0])
-                         & (h > self.min_gt_bbox_wh[1]))
+            tests.append((w > self.min_gt_bbox_wh[0]) & (h > self.min_gt_bbox_wh[1]))
         if self.by_mask:
-            gt_masks = results['gt_masks']
+            gt_masks = results["gt_masks"]
             tests.append(gt_masks.areas >= self.min_gt_mask_area)
 
         keep = tests[0]
@@ -627,7 +977,7 @@ def __call__(self, results):
 
         keep = keep.nonzero()[0]
 
-        keys = ('gt_bboxes', 'gt_labels', 'gt_masks')
+        keys = ("gt_bboxes", "gt_labels", "gt_masks")
         for key in keys:
             if key in results:
                 results[key] = results[key][keep]
@@ -637,9 +987,10 @@ def __call__(self, results):
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + \
-            f'(min_gt_bbox_wh={self.min_gt_bbox_wh},' \
-            f'min_gt_mask_area={self.min_gt_mask_area},' \
-            f'by_box={self.by_box},' \
-            f'by_mask={self.by_mask},' \
-            f'always_keep={self.always_keep})'
+        return (
+            self.__class__.__name__ + f"(min_gt_bbox_wh={self.min_gt_bbox_wh},"
+            f"min_gt_mask_area={self.min_gt_mask_area},"
+            f"by_box={self.by_box},"
+            f"by_mask={self.by_mask},"
+            f"always_keep={self.always_keep})"
+        )
diff --git a/mmdet/datasets/pipelines/transforms.py b/mmdet/datasets/pipelines/transforms.py
index 4c9ef72c..74014e52 100644
--- a/mmdet/datasets/pipelines/transforms.py
+++ b/mmdet/datasets/pipelines/transforms.py
@@ -73,15 +73,17 @@ class Resize:
             Defaults to False.
     """
 
-    def __init__(self,
-                 img_scale=None,
-                 multiscale_mode='range',
-                 ratio_range=None,
-                 keep_ratio=True,
-                 bbox_clip_border=True,
-                 backend='cv2',
-                 interpolation='bilinear',
-                 override=False):
+    def __init__(
+        self,
+        img_scale=None,
+        multiscale_mode="range",
+        ratio_range=None,
+        keep_ratio=True,
+        bbox_clip_border=True,
+        backend="cv2",
+        interpolation="bilinear",
+        override=False,
+    ):
         if img_scale is None:
             self.img_scale = None
         else:
@@ -96,7 +98,7 @@ def __init__(self,
             assert len(self.img_scale) == 1
         else:
             # mode 2: given multiple scales or a range of scales
-            assert multiscale_mode in ['value', 'range']
+            assert multiscale_mode in ["value", "range"]
 
         self.backend = backend
         self.multiscale_mode = multiscale_mode
@@ -143,12 +145,8 @@ def random_sample(img_scales):
         assert mmcv.is_list_of(img_scales, tuple) and len(img_scales) == 2
         img_scale_long = [max(s) for s in img_scales]
         img_scale_short = [min(s) for s in img_scales]
-        long_edge = np.random.randint(
-            min(img_scale_long),
-            max(img_scale_long) + 1)
-        short_edge = np.random.randint(
-            min(img_scale_short),
-            max(img_scale_short) + 1)
+        long_edge = np.random.randint(min(img_scale_long), max(img_scale_long) + 1)
+        short_edge = np.random.randint(min(img_scale_short), max(img_scale_short) + 1)
         img_scale = (long_edge, short_edge)
         return img_scale, None
 
@@ -198,30 +196,30 @@ def _random_scale(self, results):
         """
 
         if self.ratio_range is not None:
-            scale, scale_idx = self.random_sample_ratio(
-                self.img_scale[0], self.ratio_range)
+            scale, scale_idx = self.random_sample_ratio(self.img_scale[0], self.ratio_range)
         elif len(self.img_scale) == 1:
             scale, scale_idx = self.img_scale[0], 0
-        elif self.multiscale_mode == 'range':
+        elif self.multiscale_mode == "range":
             scale, scale_idx = self.random_sample(self.img_scale)
-        elif self.multiscale_mode == 'value':
+        elif self.multiscale_mode == "value":
             scale, scale_idx = self.random_select(self.img_scale)
         else:
             raise NotImplementedError
 
-        results['scale'] = scale
-        results['scale_idx'] = scale_idx
+        results["scale"] = scale
+        results["scale_idx"] = scale_idx
 
     def _resize_img(self, results):
         """Resize images with ``results['scale']``."""
-        for key in results.get('img_fields', ['img']):
+        for key in results.get("img_fields", ["img"]):
             if self.keep_ratio:
                 img, scale_factor = mmcv.imrescale(
                     results[key],
-                    results['scale'],
+                    results["scale"],
                     return_scale=True,
                     interpolation=self.interpolation,
-                    backend=self.backend)
+                    backend=self.backend,
+                )
                 # the w_scale and h_scale has minor difference
                 # a real fix should be done in the mmcv.imrescale in the future
                 new_h, new_w = img.shape[:2]
@@ -231,55 +229,98 @@ def _resize_img(self, results):
             else:
                 img, w_scale, h_scale = mmcv.imresize(
                     results[key],
-                    results['scale'],
+                    results["scale"],
                     return_scale=True,
                     interpolation=self.interpolation,
-                    backend=self.backend)
+                    backend=self.backend,
+                )
             results[key] = img
 
-            scale_factor = np.array([w_scale, h_scale, w_scale, h_scale],
-                                    dtype=np.float32)
-            results['img_shape'] = img.shape
+            scale_factor = np.array([w_scale, h_scale, w_scale, h_scale], dtype=np.float32)
+            results["img_shape"] = img.shape
             # in case that there is no padding
-            results['pad_shape'] = img.shape
-            results['scale_factor'] = scale_factor
-            results['keep_ratio'] = self.keep_ratio
+            results["pad_shape"] = img.shape
+            results["scale_factor"] = scale_factor
+            results["keep_ratio"] = self.keep_ratio
 
     def _resize_bboxes(self, results):
         """Resize bounding boxes with ``results['scale_factor']``."""
-        for key in results.get('bbox_fields', []):
-            bboxes = results[key] * results['scale_factor']
+        for key in results.get("bbox_fields", []):
+            bboxes = results[key] * results["scale_factor"]
             if self.bbox_clip_border:
-                img_shape = results['img_shape']
+                img_shape = results["img_shape"]
                 bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1])
                 bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0])
             results[key] = bboxes
 
+    def _resize_rbboxes(self, results):
+        img_shape = results["img_shape"]
+        for key in results.get("rbbox_fields", []):
+            results[key][:, 0::2] = results[key][:, 0::2] * results["scale_factor"][0]
+            results[key][:, 1::2] = results[key][:, 1::2] * results["scale_factor"][1]
+            rbboxes = results[key]
+
+            rbboxes[:, 0::2] = np.clip(rbboxes[:, 0::2], 0, img_shape[1] - 1)
+            rbboxes[:, 1::2] = np.clip(rbboxes[:, 1::2], 0, img_shape[0] - 1)
+            results[key] = rbboxes
+
     def _resize_masks(self, results):
         """Resize masks with ``results['scale']``"""
-        for key in results.get('mask_fields', []):
+        for key in results.get("mask_fields", []):
+            if results[key] is None:
+                continue
+            if self.keep_ratio:
+                results[key] = results[key].rescale(results["scale"])
+            else:
+                results[key] = results[key].resize(results["img_shape"][:2])
+
+    def _resize_edge_maps(self, results):
+        """Resize masks with ``results['scale']``"""
+        for key in results.get("edge_fields", []):
             if results[key] is None:
                 continue
             if self.keep_ratio:
-                results[key] = results[key].rescale(results['scale'])
+                results[key] = results[key].rescale(results["scale"])
             else:
-                results[key] = results[key].resize(results['img_shape'][:2])
+                results[key] = results[key].resize(results["img_shape"][:2])
+
+    def _resize_side_face_maps(self, results):
+        """Resize masks with ``results['scale']``"""
+        for key in results.get("side_face_fields", []):
+            if results[key] is None:
+                continue
+            if self.keep_ratio:
+                results[key] = results[key].rescale(results["scale"])
+            else:
+                results[key] = results[key].resize(results["img_shape"][:2])
+
+    def _resize_offset_field(self, results):
+        """Resize offsets with ``results['scale']``"""
+        for key in results.get("offset_field_fields", []):
+            if results[key] is None:
+                continue
+            if self.keep_ratio:
+                gt_offset_field = mmcv.imrescale(
+                    results[key], results["scale"], interpolation="nearest", backend=self.backend
+                )
+            else:
+                gt_offset_field = mmcv.imresize(
+                    results[key], results["scale"], interpolation="nearest", backend=self.backend
+                )
+
+            results["gt_offset_field"] = gt_offset_field
 
     def _resize_seg(self, results):
         """Resize semantic segmentation map with ``results['scale']``."""
-        for key in results.get('seg_fields', []):
+        for key in results.get("seg_fields", []):
             if self.keep_ratio:
                 gt_seg = mmcv.imrescale(
-                    results[key],
-                    results['scale'],
-                    interpolation='nearest',
-                    backend=self.backend)
+                    results[key], results["scale"], interpolation="nearest", backend=self.backend
+                )
             else:
                 gt_seg = mmcv.imresize(
-                    results[key],
-                    results['scale'],
-                    interpolation='nearest',
-                    backend=self.backend)
+                    results[key], results["scale"], interpolation="nearest", backend=self.backend
+                )
             results[key] = gt_seg
 
     def __call__(self, results):
@@ -294,38 +335,40 @@ def __call__(self, results):
                 'keep_ratio' keys are added into result dict.
         """
 
-        if 'scale' not in results:
-            if 'scale_factor' in results:
-                img_shape = results['img'].shape[:2]
-                scale_factor = results['scale_factor']
+        if "scale" not in results:
+            if "scale_factor" in results:
+                img_shape = results["img"].shape[:2]
+                scale_factor = results["scale_factor"]
                 assert isinstance(scale_factor, float)
-                results['scale'] = tuple(
-                    [int(x * scale_factor) for x in img_shape][::-1])
+                results["scale"] = tuple([int(x * scale_factor) for x in img_shape][::-1])
             else:
                 self._random_scale(results)
         else:
             if not self.override:
-                assert 'scale_factor' not in results, (
-                    'scale and scale_factor cannot be both set.')
+                assert "scale_factor" not in results, "scale and scale_factor cannot be both set."
             else:
-                results.pop('scale')
-                if 'scale_factor' in results:
-                    results.pop('scale_factor')
+                results.pop("scale")
+                if "scale_factor" in results:
+                    results.pop("scale_factor")
                 self._random_scale(results)
 
         self._resize_img(results)
         self._resize_bboxes(results)
         self._resize_masks(results)
         self._resize_seg(results)
+        self._resize_rbboxes(results)
+        self._resize_edge_maps(results)
+        self._resize_side_face_maps(results)
+        self._resize_offset_field(results)
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(img_scale={self.img_scale}, '
-        repr_str += f'multiscale_mode={self.multiscale_mode}, '
-        repr_str += f'ratio_range={self.ratio_range}, '
-        repr_str += f'keep_ratio={self.keep_ratio}, '
-        repr_str += f'bbox_clip_border={self.bbox_clip_border})'
+        repr_str += f"(img_scale={self.img_scale}, "
+        repr_str += f"multiscale_mode={self.multiscale_mode}, "
+        repr_str += f"ratio_range={self.ratio_range}, "
+        repr_str += f"keep_ratio={self.keep_ratio}, "
+        repr_str += f"bbox_clip_border={self.bbox_clip_border})"
         return repr_str
 
 
@@ -367,7 +410,7 @@ class RandomFlip:
             corresponding direction.
     """
 
-    def __init__(self, flip_ratio=None, direction='horizontal'):
+    def __init__(self, flip_ratio=None, direction="horizontal"):
         if isinstance(flip_ratio, list):
             assert mmcv.is_list_of(flip_ratio, float)
             assert 0 <= sum(flip_ratio) <= 1
@@ -376,18 +419,17 @@ def __init__(self, flip_ratio=None, direction='horizontal'):
         elif flip_ratio is None:
             pass
         else:
-            raise ValueError('flip_ratios must be None, float, '
-                             'or list of float')
+            raise ValueError("flip_ratios must be None, float, " "or list of float")
         self.flip_ratio = flip_ratio
 
-        valid_directions = ['horizontal', 'vertical', 'diagonal']
+        valid_directions = ["horizontal", "vertical", "diagonal"]
         if isinstance(direction, str):
             assert direction in valid_directions
         elif isinstance(direction, list):
             assert mmcv.is_list_of(direction, str)
             assert set(direction).issubset(set(valid_directions))
         else:
-            raise ValueError('direction must be either str or list of str')
+            raise ValueError("direction must be either str or list of str")
         self.direction = direction
 
         if isinstance(flip_ratio, list):
@@ -408,15 +450,15 @@ def bbox_flip(self, bboxes, img_shape, direction):
 
         assert bboxes.shape[-1] % 4 == 0
         flipped = bboxes.copy()
-        if direction == 'horizontal':
+        if direction == "horizontal":
             w = img_shape[1]
             flipped[..., 0::4] = w - bboxes[..., 2::4]
             flipped[..., 2::4] = w - bboxes[..., 0::4]
-        elif direction == 'vertical':
+        elif direction == "vertical":
             h = img_shape[0]
             flipped[..., 1::4] = h - bboxes[..., 3::4]
             flipped[..., 3::4] = h - bboxes[..., 1::4]
-        elif direction == 'diagonal':
+        elif direction == "diagonal":
             w = img_shape[1]
             h = img_shape[0]
             flipped[..., 0::4] = w - bboxes[..., 2::4]
@@ -427,6 +469,89 @@ def bbox_flip(self, bboxes, img_shape, direction):
             raise ValueError(f"Invalid flipping direction '{direction}'")
         return flipped
 
+    def _pointobb2bbox(self, pointobb):
+        """
+        docstring here
+            :param self:
+            :param pointobb: list, [x1, y1, x2, y2, x3, y3, x4, y4]
+            return [xmin, ymin, xmax, ymax]
+        """
+        xmin = min(pointobb[0::2])
+        ymin = min(pointobb[1::2])
+        xmax = max(pointobb[0::2])
+        ymax = max(pointobb[1::2])
+        bbox = [xmin, ymin, xmax, ymax]
+
+        return bbox
+
+    def _pointobb_best_point_sort(self, pointobb):
+        """
+        Find the "best" point and sort all points as the order that best point is first point
+            :param self: self
+            :param pointobb (list): unsorted pointobb, (1*8)
+        """
+        xmin, ymin, xmax, ymax = self._pointobb2bbox(pointobb)
+        reference_bbox = np.array([xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax])
+        normalize = np.array([1.0, 1.0] * 4)
+        combinate = [
+            np.roll(pointobb, 0),
+            np.roll(pointobb, 2),
+            np.roll(pointobb, 4),
+            np.roll(pointobb, 6),
+        ]
+        distances = np.array(
+            [np.sum(((coord - reference_bbox) / normalize) ** 2) for coord in combinate]
+        )
+        order = distances.argsort()
+        return combinate[order[0]].tolist()
+
+    def rbbox_flip(self, rbboxes, img_shape, direction):
+        """Flip rbboxes horizontally.
+
+        Args:
+            rbboxes(ndarray): shape (..., 8*k) (x1, y1, x2, y2, x3, y3, x4, y4)
+            img_shape(tuple): (height, width)
+        """
+        assert rbboxes.shape[-1] % 8 == 0
+        flipped = rbboxes.copy()
+        if direction == "horizontal":
+            w = img_shape[1]
+            flipped[..., 0::2] = w - flipped[..., 0::2] - 1
+
+        elif direction == "vertical":
+            h = img_shape[0]
+            flipped[..., 1::2] = h - flipped[..., 1::2] - 1
+        else:
+            raise ValueError('Invalid flipping direction "{}"'.format(direction))
+        flipped = np.array(
+            [self._pointobb_best_point_sort(pointobb) for pointobb in flipped.tolist()]
+        )
+
+        return flipped
+
+    def offset_flip(self, offsets, img_shape, direction):
+        if offsets.shape == (0, 2):
+            return offsets
+
+        if direction == "horizontal":
+            flipped = [[-offset[0], offset[1]] for offset in offsets]
+        elif direction == "vertical":
+            flipped = [[offset[0], -offset[1]] for offset in offsets]
+        else:
+            raise ValueError(f"Invalid flipping direction '{direction}'")
+
+        return np.array(flipped, dtype=np.float32)
+
+    def offset_angle_flip(self, offset_angles, img_shape, direction):
+        if direction == "horizontal":
+            flipped = [[angle[0], -angle[1]] for angle in offset_angles]
+        elif direction == "vertical":
+            flipped = [[-angle[0], angle[1]] for angle in offset_angles]
+        else:
+            raise ValueError(f"Invalid flipping direction '{direction}'")
+
+        return np.array(flipped, dtype=np.float32)
+
     def __call__(self, results):
         """Call function to flip bounding boxes, masks, semantic segmentation
         maps.
@@ -439,7 +564,7 @@ def __call__(self, results):
                 into result dict.
         """
 
-        if 'flip' not in results:
+        if "flip" not in results:
             if isinstance(self.direction, list):
                 # None means non-flip
                 direction_list = self.direction + [None]
@@ -454,36 +579,78 @@ def __call__(self, results):
                 non_flip_ratio = 1 - self.flip_ratio
                 # exclude non-flip
                 single_ratio = self.flip_ratio / (len(direction_list) - 1)
-                flip_ratio_list = [single_ratio] * (len(direction_list) -
-                                                    1) + [non_flip_ratio]
+                flip_ratio_list = [single_ratio] * (len(direction_list) - 1) + [non_flip_ratio]
 
             cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
 
-            results['flip'] = cur_dir is not None
-        if 'flip_direction' not in results:
-            results['flip_direction'] = cur_dir
-        if results['flip']:
+            results["flip"] = cur_dir is not None
+        if "flip_direction" not in results:
+            results["flip_direction"] = cur_dir
+        if results["flip"]:
             # flip image
-            for key in results.get('img_fields', ['img']):
-                results[key] = mmcv.imflip(
-                    results[key], direction=results['flip_direction'])
+            for key in results.get("img_fields", ["img"]):
+                results[key] = mmcv.imflip(results[key], direction=results["flip_direction"])
             # flip bboxes
-            for key in results.get('bbox_fields', []):
-                results[key] = self.bbox_flip(results[key],
-                                              results['img_shape'],
-                                              results['flip_direction'])
+            for key in results.get("bbox_fields", []):
+                results[key] = self.bbox_flip(
+                    results[key], results["img_shape"], results["flip_direction"]
+                )
             # flip masks
-            for key in results.get('mask_fields', []):
-                results[key] = results[key].flip(results['flip_direction'])
+            for key in results.get("mask_fields", []):
+                results[key] = results[key].flip(results["flip_direction"])
+
+            # flip edge maps
+            for key in results.get("edge_fields", []):
+                results[key] = results[key].flip(results["flip_direction"])
+
+            # flip edge maps
+            for key in results.get("side_face_fields", []):
+                results[key] = results[key].flip(results["flip_direction"])
+
+            # flip offset_field
+            for key in results.get("offset_field_fields", []):
+                ignores_x, ignores_y = [], []
+                for subclass in [400, 500]:
+                    ignores_x.append(results[key][..., 0] == subclass)
+                    ignores_y.append(results[key][..., 1] == subclass)
+
+                ignore_x_bool = np.logical_or.reduce(tuple(ignores_x))
+                ignore_y_bool = np.logical_or.reduce(tuple(ignores_y))
+
+                if results["flip_direction"] == "horizontal":
+                    results[key][..., 0] = -results[key][..., 0]
+                elif results["flip_direction"] == "vertical":
+                    results[key][..., 1] = -results[key][..., 1]
+                else:
+                    raise ValueError(f"Invalid flipping direction '{results['flip_direction']}'")
+
+                results[key][..., 0][ignore_x_bool] = 500
+                results[key][..., 1][ignore_y_bool] = 500
 
             # flip segs
-            for key in results.get('seg_fields', []):
-                results[key] = mmcv.imflip(
-                    results[key], direction=results['flip_direction'])
+            for key in results.get("seg_fields", []):
+                results[key] = mmcv.imflip(results[key], direction=results["flip_direction"])
+
+            # flip rbboxes (pointobb)
+            for key in results.get("rbbox_fields", []):
+                results[key] = self.rbbox_flip(
+                    results[key], results["img_shape"], results["flip_direction"]
+                )
+            # flip offsets
+            for key in results.get("offset_fields", []):
+                results[key] = self.offset_flip(
+                    results[key], results["img_shape"], results["flip_direction"]
+                )
+
+            # flip offset_angle
+            for key in results.get("angle_fields", []):
+                results[key] = self.offset_angle_flip(
+                    results[key], results["img_shape"], results["flip_direction"]
+                )
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + f'(flip_ratio={self.flip_ratio})'
+        return self.__class__.__name__ + f"(flip_ratio={self.flip_ratio})"
 
 
 @PIPELINES.register_module()
@@ -505,10 +672,7 @@ def __init__(self, shift_ratio=0.5, max_shift_px=32, filter_thr_px=1):
         self.max_shift_px = max_shift_px
         self.filter_thr_px = int(filter_thr_px)
         # The key correspondence from bboxes to labels.
-        self.bbox2label = {
-            'gt_bboxes': 'gt_labels',
-            'gt_bboxes_ignore': 'gt_labels_ignore'
-        }
+        self.bbox2label = {"gt_bboxes": "gt_labels", "gt_bboxes_ignore": "gt_labels_ignore"}
 
     def __call__(self, results):
         """Call function to random shift images, bounding boxes.
@@ -520,19 +684,17 @@ def __call__(self, results):
             dict: Shift results.
         """
         if random.random() < self.shift_ratio:
-            img_shape = results['img'].shape[:2]
+            img_shape = results["img"].shape[:2]
 
-            random_shift_x = random.randint(-self.max_shift_px,
-                                            self.max_shift_px)
-            random_shift_y = random.randint(-self.max_shift_px,
-                                            self.max_shift_px)
+            random_shift_x = random.randint(-self.max_shift_px, self.max_shift_px)
+            random_shift_y = random.randint(-self.max_shift_px, self.max_shift_px)
             new_x = max(0, random_shift_x)
             ori_x = max(0, -random_shift_x)
             new_y = max(0, random_shift_y)
             ori_y = max(0, -random_shift_y)
 
             # TODO: support mask and semantic segmentation maps.
-            for key in results.get('bbox_fields', []):
+            for key in results.get("bbox_fields", []):
                 bboxes = results[key].copy()
                 bboxes[..., 0::2] += random_shift_x
                 bboxes[..., 1::2] += random_shift_y
@@ -544,11 +706,10 @@ def __call__(self, results):
                 # remove invalid bboxes
                 bbox_w = bboxes[..., 2] - bboxes[..., 0]
                 bbox_h = bboxes[..., 3] - bboxes[..., 1]
-                valid_inds = (bbox_w > self.filter_thr_px) & (
-                    bbox_h > self.filter_thr_px)
+                valid_inds = (bbox_w > self.filter_thr_px) & (bbox_h > self.filter_thr_px)
                 # If the shift does not contain any gt-bbox area, skip this
                 # image.
-                if key == 'gt_bboxes' and not valid_inds.any():
+                if key == "gt_bboxes" and not valid_inds.any():
                     return results
                 bboxes = bboxes[valid_inds]
                 results[key] = bboxes
@@ -558,21 +719,22 @@ def __call__(self, results):
                 if label_key in results:
                     results[label_key] = results[label_key][valid_inds]
 
-            for key in results.get('img_fields', ['img']):
+            for key in results.get("img_fields", ["img"]):
                 img = results[key]
                 new_img = np.zeros_like(img)
                 img_h, img_w = img.shape[:2]
                 new_h = img_h - np.abs(random_shift_y)
                 new_w = img_w - np.abs(random_shift_x)
-                new_img[new_y:new_y + new_h, new_x:new_x + new_w] \
-                    = img[ori_y:ori_y + new_h, ori_x:ori_x + new_w]
+                new_img[new_y : new_y + new_h, new_x : new_x + new_w] = img[
+                    ori_y : ori_y + new_h, ori_x : ori_x + new_w
+                ]
                 results[key] = new_img
 
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(max_shift_px={self.max_shift_px}, '
+        repr_str += f"(max_shift_px={self.max_shift_px}, "
         return repr_str
 
 
@@ -593,64 +755,86 @@ class Pad:
             value is `dict(img=0, masks=0, seg=255)`.
     """
 
-    def __init__(self,
-                 size=None,
-                 size_divisor=None,
-                 pad_to_square=False,
-                 pad_val=dict(img=0, masks=0, seg=255)):
+    def __init__(
+        self,
+        size=None,
+        size_divisor=None,
+        pad_to_square=False,
+        pad_val=dict(img=0, masks=0, seg=255),
+    ):
         self.size = size
         self.size_divisor = size_divisor
         if isinstance(pad_val, float) or isinstance(pad_val, int):
             warnings.warn(
-                'pad_val of float type is deprecated now, '
-                f'please use pad_val=dict(img={pad_val}, '
-                f'masks={pad_val}, seg=255) instead.', DeprecationWarning)
+                "pad_val of float type is deprecated now, "
+                f"please use pad_val=dict(img={pad_val}, "
+                f"masks={pad_val}, seg=255) instead.",
+                DeprecationWarning,
+            )
             pad_val = dict(img=pad_val, masks=pad_val, seg=255)
         assert isinstance(pad_val, dict)
         self.pad_val = pad_val
         self.pad_to_square = pad_to_square
 
         if pad_to_square:
-            assert size is None and size_divisor is None, \
-                'The size and size_divisor must be None ' \
-                'when pad2square is True'
+            assert size is None and size_divisor is None, (
+                "The size and size_divisor must be None " "when pad2square is True"
+            )
         else:
-            assert size is not None or size_divisor is not None, \
-                'only one of size and size_divisor should be valid'
+            assert (
+                size is not None or size_divisor is not None
+            ), "only one of size and size_divisor should be valid"
             assert size is None or size_divisor is None
 
     def _pad_img(self, results):
         """Pad images according to ``self.size``."""
-        pad_val = self.pad_val.get('img', 0)
-        for key in results.get('img_fields', ['img']):
+        pad_val = self.pad_val.get("img", 0)
+        for key in results.get("img_fields", ["img"]):
             if self.pad_to_square:
                 max_size = max(results[key].shape[:2])
                 self.size = (max_size, max_size)
             if self.size is not None:
-                padded_img = mmcv.impad(
-                    results[key], shape=self.size, pad_val=pad_val)
+                padded_img = mmcv.impad(results[key], shape=self.size, pad_val=pad_val)
             elif self.size_divisor is not None:
                 padded_img = mmcv.impad_to_multiple(
-                    results[key], self.size_divisor, pad_val=pad_val)
+                    results[key], self.size_divisor, pad_val=pad_val
+                )
             results[key] = padded_img
-        results['pad_shape'] = padded_img.shape
-        results['pad_fixed_size'] = self.size
-        results['pad_size_divisor'] = self.size_divisor
+        results["pad_shape"] = padded_img.shape
+        results["pad_fixed_size"] = self.size
+        results["pad_size_divisor"] = self.size_divisor
 
     def _pad_masks(self, results):
         """Pad masks according to ``results['pad_shape']``."""
-        pad_shape = results['pad_shape'][:2]
-        pad_val = self.pad_val.get('masks', 0)
-        for key in results.get('mask_fields', []):
+        pad_shape = results["pad_shape"][:2]
+        pad_val = self.pad_val.get("masks", 0)
+        for key in results.get("mask_fields", []):
             results[key] = results[key].pad(pad_shape, pad_val=pad_val)
 
+    def _pad_edge_maps(self, results):
+        """Pad masks according to ``results['pad_shape']``."""
+        pad_shape = results["pad_shape"][:2]
+        for key in results.get("edge_fields", []):
+            results[key] = results[key].pad(pad_shape, pad_val=self.pad_val)
+
+    def _pad_side_face_maps(self, results):
+        """Pad masks according to ``results['pad_shape']``."""
+        pad_shape = results["pad_shape"][:2]
+        for key in results.get("side_face_fields", []):
+            results[key] = results[key].pad(pad_shape, pad_val=self.pad_val)
+
     def _pad_seg(self, results):
         """Pad semantic segmentation map according to
         ``results['pad_shape']``."""
-        pad_val = self.pad_val.get('seg', 255)
-        for key in results.get('seg_fields', []):
-            results[key] = mmcv.impad(
-                results[key], shape=results['pad_shape'][:2], pad_val=pad_val)
+        pad_val = self.pad_val.get("seg", 255)
+        for key in results.get("seg_fields", []):
+            results[key] = mmcv.impad(results[key], shape=results["pad_shape"][:2], pad_val=pad_val)
+
+    def _pad_offset_field(self, results):
+        """Pad semantic segmentation map according to
+        ``results['pad_shape']``."""
+        for key in results.get("offset_field_fields", []):
+            results[key] = mmcv.impad(results[key], shape=results["pad_shape"][:2])
 
     def __call__(self, results):
         """Call function to pad images, masks, semantic segmentation maps.
@@ -664,14 +848,17 @@ def __call__(self, results):
         self._pad_img(results)
         self._pad_masks(results)
         self._pad_seg(results)
+        self._pad_edge_maps(results)
+        self._pad_side_face_maps(results)
+        self._pad_offset_field(results)
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(size={self.size}, '
-        repr_str += f'size_divisor={self.size_divisor}, '
-        repr_str += f'pad_to_square={self.pad_to_square}, '
-        repr_str += f'pad_val={self.pad_val})'
+        repr_str += f"(size={self.size}, "
+        repr_str += f"size_divisor={self.size_divisor}, "
+        repr_str += f"pad_to_square={self.pad_to_square}, "
+        repr_str += f"pad_val={self.pad_val})"
         return repr_str
 
 
@@ -703,16 +890,14 @@ def __call__(self, results):
             dict: Normalized results, 'img_norm_cfg' key is added into
                 result dict.
         """
-        for key in results.get('img_fields', ['img']):
-            results[key] = mmcv.imnormalize(results[key], self.mean, self.std,
-                                            self.to_rgb)
-        results['img_norm_cfg'] = dict(
-            mean=self.mean, std=self.std, to_rgb=self.to_rgb)
+        for key in results.get("img_fields", ["img"]):
+            results[key] = mmcv.imnormalize(results[key], self.mean, self.std, self.to_rgb)
+        results["img_norm_cfg"] = dict(mean=self.mean, std=self.std, to_rgb=self.to_rgb)
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})'
+        repr_str += f"(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})"
         return repr_str
 
 
@@ -753,20 +938,19 @@ class RandomCrop:
           `allow_negative_crop` is set to False, skip this image.
     """
 
-    def __init__(self,
-                 crop_size,
-                 crop_type='absolute',
-                 allow_negative_crop=False,
-                 recompute_bbox=False,
-                 bbox_clip_border=True):
-        if crop_type not in [
-                'relative_range', 'relative', 'absolute', 'absolute_range'
-        ]:
-            raise ValueError(f'Invalid crop_type {crop_type}.')
-        if crop_type in ['absolute', 'absolute_range']:
+    def __init__(
+        self,
+        crop_size,
+        crop_type="absolute",
+        allow_negative_crop=False,
+        recompute_bbox=False,
+        bbox_clip_border=True,
+    ):
+        if crop_type not in ["relative_range", "relative", "absolute", "absolute_range"]:
+            raise ValueError(f"Invalid crop_type {crop_type}.")
+        if crop_type in ["absolute", "absolute_range"]:
             assert crop_size[0] > 0 and crop_size[1] > 0
-            assert isinstance(crop_size[0], int) and isinstance(
-                crop_size[1], int)
+            assert isinstance(crop_size[0], int) and isinstance(crop_size[1], int)
         else:
             assert 0 < crop_size[0] <= 1 and 0 < crop_size[1] <= 1
         self.crop_size = crop_size
@@ -775,14 +959,8 @@ def __init__(self,
         self.bbox_clip_border = bbox_clip_border
         self.recompute_bbox = recompute_bbox
         # The key correspondence from bboxes to labels and masks.
-        self.bbox2label = {
-            'gt_bboxes': 'gt_labels',
-            'gt_bboxes_ignore': 'gt_labels_ignore'
-        }
-        self.bbox2mask = {
-            'gt_bboxes': 'gt_masks',
-            'gt_bboxes_ignore': 'gt_masks_ignore'
-        }
+        self.bbox2label = {"gt_bboxes": "gt_labels", "gt_bboxes_ignore": "gt_labels_ignore"}
+        self.bbox2mask = {"gt_bboxes": "gt_masks", "gt_bboxes_ignore": "gt_masks_ignore"}
 
     def _crop_data(self, results, crop_size, allow_negative_crop):
         """Function to randomly crop images, bounding boxes, masks, semantic
@@ -799,7 +977,7 @@ def _crop_data(self, results, crop_size, allow_negative_crop):
                 updated according to crop size.
         """
         assert crop_size[0] > 0 and crop_size[1] > 0
-        for key in results.get('img_fields', ['img']):
+        for key in results.get("img_fields", ["img"]):
             img = results[key]
             margin_h = max(img.shape[0] - crop_size[0], 0)
             margin_w = max(img.shape[1] - crop_size[1], 0)
@@ -812,23 +990,20 @@ def _crop_data(self, results, crop_size, allow_negative_crop):
             img = img[crop_y1:crop_y2, crop_x1:crop_x2, ...]
             img_shape = img.shape
             results[key] = img
-        results['img_shape'] = img_shape
+        results["img_shape"] = img_shape
 
         # crop bboxes accordingly and clip to the image boundary
-        for key in results.get('bbox_fields', []):
+        for key in results.get("bbox_fields", []):
             # e.g. gt_bboxes and gt_bboxes_ignore
-            bbox_offset = np.array([offset_w, offset_h, offset_w, offset_h],
-                                   dtype=np.float32)
+            bbox_offset = np.array([offset_w, offset_h, offset_w, offset_h], dtype=np.float32)
             bboxes = results[key] - bbox_offset
             if self.bbox_clip_border:
                 bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1])
                 bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0])
-            valid_inds = (bboxes[:, 2] > bboxes[:, 0]) & (
-                bboxes[:, 3] > bboxes[:, 1])
+            valid_inds = (bboxes[:, 2] > bboxes[:, 0]) & (bboxes[:, 3] > bboxes[:, 1])
             # If the crop does not contain any gt-bbox area and
             # allow_negative_crop is False, skip this image.
-            if (key == 'gt_bboxes' and not valid_inds.any()
-                    and not allow_negative_crop):
+            if key == "gt_bboxes" and not valid_inds.any() and not allow_negative_crop:
                 return None
             results[key] = bboxes[valid_inds, :]
             # label fields. e.g. gt_labels and gt_labels_ignore
@@ -839,14 +1014,18 @@ def _crop_data(self, results, crop_size, allow_negative_crop):
             # mask fields, e.g. gt_masks and gt_masks_ignore
             mask_key = self.bbox2mask.get(key)
             if mask_key in results:
-                results[mask_key] = results[mask_key][
-                    valid_inds.nonzero()[0]].crop(
-                        np.asarray([crop_x1, crop_y1, crop_x2, crop_y2]))
+                results[mask_key] = results[mask_key][valid_inds.nonzero()[0]].crop(
+                    np.asarray([crop_x1, crop_y1, crop_x2, crop_y2])
+                )
                 if self.recompute_bbox:
                     results[key] = results[mask_key].get_bboxes()
 
         # crop semantic seg
-        for key in results.get('seg_fields', []):
+        for key in results.get("seg_fields", []):
+            results[key] = results[key][crop_y1:crop_y2, crop_x1:crop_x2]
+
+        # crop offset_fields
+        for key in results.get("offset_field_fields", []):
             results[key] = results[key][crop_y1:crop_y2, crop_x1:crop_x2]
 
         return results
@@ -862,21 +1041,17 @@ def _get_crop_size(self, image_size):
             crop_size (tuple): (crop_h, crop_w) in absolute pixels.
         """
         h, w = image_size
-        if self.crop_type == 'absolute':
+        if self.crop_type == "absolute":
             return (min(self.crop_size[0], h), min(self.crop_size[1], w))
-        elif self.crop_type == 'absolute_range':
+        elif self.crop_type == "absolute_range":
             assert self.crop_size[0] <= self.crop_size[1]
-            crop_h = np.random.randint(
-                min(h, self.crop_size[0]),
-                min(h, self.crop_size[1]) + 1)
-            crop_w = np.random.randint(
-                min(w, self.crop_size[0]),
-                min(w, self.crop_size[1]) + 1)
+            crop_h = np.random.randint(min(h, self.crop_size[0]), min(h, self.crop_size[1]) + 1)
+            crop_w = np.random.randint(min(w, self.crop_size[0]), min(w, self.crop_size[1]) + 1)
             return crop_h, crop_w
-        elif self.crop_type == 'relative':
+        elif self.crop_type == "relative":
             crop_h, crop_w = self.crop_size
             return int(h * crop_h + 0.5), int(w * crop_w + 0.5)
-        elif self.crop_type == 'relative_range':
+        elif self.crop_type == "relative_range":
             crop_size = np.asarray(self.crop_size, dtype=np.float32)
             crop_h, crop_w = crop_size + np.random.rand(2) * (1 - crop_size)
             return int(h * crop_h + 0.5), int(w * crop_w + 0.5)
@@ -892,17 +1067,17 @@ def __call__(self, results):
             dict: Randomly cropped results, 'img_shape' key in result dict is
                 updated according to crop size.
         """
-        image_size = results['img'].shape[:2]
+        image_size = results["img"].shape[:2]
         crop_size = self._get_crop_size(image_size)
         results = self._crop_data(results, crop_size, self.allow_negative_crop)
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(crop_size={self.crop_size}, '
-        repr_str += f'crop_type={self.crop_type}, '
-        repr_str += f'allow_negative_crop={self.allow_negative_crop}, '
-        repr_str += f'bbox_clip_border={self.bbox_clip_border})'
+        repr_str += f"(crop_size={self.crop_size}, "
+        repr_str += f"crop_type={self.crop_type}, "
+        repr_str += f"allow_negative_crop={self.allow_negative_crop}, "
+        repr_str += f"bbox_clip_border={self.bbox_clip_border})"
         return repr_str
 
 
@@ -917,7 +1092,7 @@ class SegRescale:
             to 'cv2'.
     """
 
-    def __init__(self, scale_factor=1, backend='cv2'):
+    def __init__(self, scale_factor=1, backend="cv2"):
         self.scale_factor = scale_factor
         self.backend = backend
 
@@ -931,17 +1106,15 @@ def __call__(self, results):
             dict: Result dict with semantic segmentation map scaled.
         """
 
-        for key in results.get('seg_fields', []):
+        for key in results.get("seg_fields", []):
             if self.scale_factor != 1:
                 results[key] = mmcv.imrescale(
-                    results[key],
-                    self.scale_factor,
-                    interpolation='nearest',
-                    backend=self.backend)
+                    results[key], self.scale_factor, interpolation="nearest", backend=self.backend
+                )
         return results
 
     def __repr__(self):
-        return self.__class__.__name__ + f'(scale_factor={self.scale_factor})'
+        return self.__class__.__name__ + f"(scale_factor={self.scale_factor})"
 
 
 @PIPELINES.register_module()
@@ -966,11 +1139,13 @@ class PhotoMetricDistortion:
         hue_delta (int): delta of hue.
     """
 
-    def __init__(self,
-                 brightness_delta=32,
-                 contrast_range=(0.5, 1.5),
-                 saturation_range=(0.5, 1.5),
-                 hue_delta=18):
+    def __init__(
+        self,
+        brightness_delta=32,
+        contrast_range=(0.5, 1.5),
+        saturation_range=(0.5, 1.5),
+        hue_delta=18,
+    ):
         self.brightness_delta = brightness_delta
         self.contrast_lower, self.contrast_upper = contrast_range
         self.saturation_lower, self.saturation_upper = saturation_range
@@ -986,15 +1161,13 @@ def __call__(self, results):
             dict: Result dict with images distorted.
         """
 
-        if 'img_fields' in results:
-            assert results['img_fields'] == ['img'], \
-                'Only single img_fields is allowed'
-        img = results['img']
+        if "img_fields" in results:
+            assert results["img_fields"] == ["img"], "Only single img_fields is allowed"
+        img = results["img"]
         img = img.astype(np.float32)
         # random brightness
         if random.randint(2):
-            delta = random.uniform(-self.brightness_delta,
-                                   self.brightness_delta)
+            delta = random.uniform(-self.brightness_delta, self.brightness_delta)
             img += delta
 
         # mode == 0 --> do random contrast first
@@ -1002,8 +1175,7 @@ def __call__(self, results):
         mode = random.randint(2)
         if mode == 1:
             if random.randint(2):
-                alpha = random.uniform(self.contrast_lower,
-                                       self.contrast_upper)
+                alpha = random.uniform(self.contrast_lower, self.contrast_upper)
                 img *= alpha
 
         # convert color from BGR to HSV
@@ -1011,8 +1183,7 @@ def __call__(self, results):
 
         # random saturation
         if random.randint(2):
-            img[..., 1] *= random.uniform(self.saturation_lower,
-                                          self.saturation_upper)
+            img[..., 1] *= random.uniform(self.saturation_lower, self.saturation_upper)
 
         # random hue
         if random.randint(2):
@@ -1026,25 +1197,24 @@ def __call__(self, results):
         # random contrast
         if mode == 0:
             if random.randint(2):
-                alpha = random.uniform(self.contrast_lower,
-                                       self.contrast_upper)
+                alpha = random.uniform(self.contrast_lower, self.contrast_upper)
                 img *= alpha
 
         # randomly swap channels
         if random.randint(2):
             img = img[..., random.permutation(3)]
 
-        results['img'] = img
+        results["img"] = img
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(\nbrightness_delta={self.brightness_delta},\n'
-        repr_str += 'contrast_range='
-        repr_str += f'{(self.contrast_lower, self.contrast_upper)},\n'
-        repr_str += 'saturation_range='
-        repr_str += f'{(self.saturation_lower, self.saturation_upper)},\n'
-        repr_str += f'hue_delta={self.hue_delta})'
+        repr_str += f"(\nbrightness_delta={self.brightness_delta},\n"
+        repr_str += "contrast_range="
+        repr_str += f"{(self.contrast_lower, self.contrast_upper)},\n"
+        repr_str += "saturation_range="
+        repr_str += f"{(self.saturation_lower, self.saturation_upper)},\n"
+        repr_str += f"hue_delta={self.hue_delta})"
         return repr_str
 
 
@@ -1062,12 +1232,9 @@ class Expand:
         prob (float): probability of applying this transformation
     """
 
-    def __init__(self,
-                 mean=(0, 0, 0),
-                 to_rgb=True,
-                 ratio_range=(1, 4),
-                 seg_ignore_label=None,
-                 prob=0.5):
+    def __init__(
+        self, mean=(0, 0, 0), to_rgb=True, ratio_range=(1, 4), seg_ignore_label=None, prob=0.5
+    ):
         self.to_rgb = to_rgb
         self.ratio_range = ratio_range
         if to_rgb:
@@ -1091,52 +1258,63 @@ def __call__(self, results):
         if random.uniform(0, 1) > self.prob:
             return results
 
-        if 'img_fields' in results:
-            assert results['img_fields'] == ['img'], \
-                'Only single img_fields is allowed'
-        img = results['img']
+        if "img_fields" in results:
+            assert results["img_fields"] == ["img"], "Only single img_fields is allowed"
+        img = results["img"]
 
         h, w, c = img.shape
         ratio = random.uniform(self.min_ratio, self.max_ratio)
         # speedup expand when meets large image
         if np.all(self.mean == self.mean[0]):
-            expand_img = np.empty((int(h * ratio), int(w * ratio), c),
-                                  img.dtype)
+            expand_img = np.empty((int(h * ratio), int(w * ratio), c), img.dtype)
             expand_img.fill(self.mean[0])
         else:
-            expand_img = np.full((int(h * ratio), int(w * ratio), c),
-                                 self.mean,
-                                 dtype=img.dtype)
+            expand_img = np.full((int(h * ratio), int(w * ratio), c), self.mean, dtype=img.dtype)
         left = int(random.uniform(0, w * ratio - w))
         top = int(random.uniform(0, h * ratio - h))
-        expand_img[top:top + h, left:left + w] = img
+        expand_img[top : top + h, left : left + w] = img
 
-        results['img'] = expand_img
+        results["img"] = expand_img
         # expand bboxes
-        for key in results.get('bbox_fields', []):
-            results[key] = results[key] + np.tile(
-                (left, top), 2).astype(results[key].dtype)
+        for key in results.get("bbox_fields", []):
+            results[key] = results[key] + np.tile((left, top), 2).astype(results[key].dtype)
 
         # expand masks
-        for key in results.get('mask_fields', []):
-            results[key] = results[key].expand(
-                int(h * ratio), int(w * ratio), top, left)
+        for key in results.get("mask_fields", []):
+            results[key] = results[key].expand(int(h * ratio), int(w * ratio), top, left)
+
+        # expand edge maps
+        for key in results.get("edge_fields", []):
+            results[key] = results[key].expand(int(h * ratio), int(w * ratio), top, left)
+
+        for key in results.get("side_face_fields", []):
+            results[key] = results[key].expand(int(h * ratio), int(w * ratio), top, left)
 
         # expand segs
-        for key in results.get('seg_fields', []):
+        for key in results.get("seg_fields", []):
             gt_seg = results[key]
-            expand_gt_seg = np.full((int(h * ratio), int(w * ratio)),
-                                    self.seg_ignore_label,
-                                    dtype=gt_seg.dtype)
-            expand_gt_seg[top:top + h, left:left + w] = gt_seg
+            expand_gt_seg = np.full(
+                (int(h * ratio), int(w * ratio)), self.seg_ignore_label, dtype=gt_seg.dtype
+            )
+            expand_gt_seg[top : top + h, left : left + w] = gt_seg
             results[key] = expand_gt_seg
+
+        # expand offset field
+        for key in results.get("offset_field_fields", []):
+            gt_offset_field = results[key]
+            expand_gt_offset_field = np.full(
+                (int(h * ratio), int(w * ratio)), self.seg_ignore_label, dtype=gt_seg.dtype
+            )
+            expand_gt_offset_field[top : top + h, left : left + w] = gt_offset_field
+            results[key] = expand_gt_offset_field
+
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(mean={self.mean}, to_rgb={self.to_rgb}, '
-        repr_str += f'ratio_range={self.ratio_range}, '
-        repr_str += f'seg_ignore_label={self.seg_ignore_label})'
+        repr_str += f"(mean={self.mean}, to_rgb={self.to_rgb}, "
+        repr_str += f"ratio_range={self.ratio_range}, "
+        repr_str += f"seg_ignore_label={self.seg_ignore_label})"
         return repr_str
 
 
@@ -1160,23 +1338,16 @@ class MinIoURandomCrop:
         `gt_bboxes_ignore` to `gt_labels_ignore` and `gt_masks_ignore`.
     """
 
-    def __init__(self,
-                 min_ious=(0.1, 0.3, 0.5, 0.7, 0.9),
-                 min_crop_size=0.3,
-                 bbox_clip_border=True):
+    def __init__(
+        self, min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3, bbox_clip_border=True
+    ):
         # 1: return ori img
         self.min_ious = min_ious
         self.sample_mode = (1, *min_ious, 0)
         self.min_crop_size = min_crop_size
         self.bbox_clip_border = bbox_clip_border
-        self.bbox2label = {
-            'gt_bboxes': 'gt_labels',
-            'gt_bboxes_ignore': 'gt_labels_ignore'
-        }
-        self.bbox2mask = {
-            'gt_bboxes': 'gt_masks',
-            'gt_bboxes_ignore': 'gt_masks_ignore'
-        }
+        self.bbox2label = {"gt_bboxes": "gt_labels", "gt_bboxes_ignore": "gt_labels_ignore"}
+        self.bbox2mask = {"gt_bboxes": "gt_masks", "gt_bboxes_ignore": "gt_masks_ignore"}
 
     def __call__(self, results):
         """Call function to crop images and bounding boxes with minimum IoU
@@ -1190,12 +1361,11 @@ def __call__(self, results):
                 'img_shape' key is updated.
         """
 
-        if 'img_fields' in results:
-            assert results['img_fields'] == ['img'], \
-                'Only single img_fields is allowed'
-        img = results['img']
-        assert 'bbox_fields' in results
-        boxes = [results[key] for key in results['bbox_fields']]
+        if "img_fields" in results:
+            assert results["img_fields"] == ["img"], "Only single img_fields is allowed"
+        img = results["img"]
+        assert "bbox_fields" in results
+        boxes = [results[key] for key in results["bbox_fields"]]
         boxes = np.concatenate(boxes, 0)
         h, w, c = img.shape
         while True:
@@ -1216,13 +1386,11 @@ def __call__(self, results):
                 left = random.uniform(w - new_w)
                 top = random.uniform(h - new_h)
 
-                patch = np.array(
-                    (int(left), int(top), int(left + new_w), int(top + new_h)))
+                patch = np.array((int(left), int(top), int(left + new_w), int(top + new_h)))
                 # Line or point crop is not allowed
                 if patch[2] == patch[0] or patch[3] == patch[1]:
                     continue
-                overlaps = bbox_overlaps(
-                    patch.reshape(-1, 4), boxes.reshape(-1, 4)).reshape(-1)
+                overlaps = bbox_overlaps(patch.reshape(-1, 4), boxes.reshape(-1, 4)).reshape(-1)
                 if len(overlaps) > 0 and overlaps.min() < min_iou:
                     continue
 
@@ -1232,16 +1400,18 @@ def __call__(self, results):
                     # adjust boxes
                     def is_center_of_bboxes_in_patch(boxes, patch):
                         center = (boxes[:, :2] + boxes[:, 2:]) / 2
-                        mask = ((center[:, 0] > patch[0]) *
-                                (center[:, 1] > patch[1]) *
-                                (center[:, 0] < patch[2]) *
-                                (center[:, 1] < patch[3]))
+                        mask = (
+                            (center[:, 0] > patch[0])
+                            * (center[:, 1] > patch[1])
+                            * (center[:, 0] < patch[2])
+                            * (center[:, 1] < patch[3])
+                        )
                         return mask
 
                     mask = is_center_of_bboxes_in_patch(boxes, patch)
                     if not mask.any():
                         continue
-                    for key in results.get('bbox_fields', []):
+                    for key in results.get("bbox_fields", []):
                         boxes = results[key].copy()
                         mask = is_center_of_bboxes_in_patch(boxes, patch)
                         boxes = boxes[mask]
@@ -1259,24 +1429,22 @@ def is_center_of_bboxes_in_patch(boxes, patch):
                         # mask fields
                         mask_key = self.bbox2mask.get(key)
                         if mask_key in results:
-                            results[mask_key] = results[mask_key][
-                                mask.nonzero()[0]].crop(patch)
+                            results[mask_key] = results[mask_key][mask.nonzero()[0]].crop(patch)
                 # adjust the img no matter whether the gt is empty before crop
-                img = img[patch[1]:patch[3], patch[0]:patch[2]]
-                results['img'] = img
-                results['img_shape'] = img.shape
+                img = img[patch[1] : patch[3], patch[0] : patch[2]]
+                results["img"] = img
+                results["img_shape"] = img.shape
 
                 # seg fields
-                for key in results.get('seg_fields', []):
-                    results[key] = results[key][patch[1]:patch[3],
-                                                patch[0]:patch[2]]
+                for key in results.get("seg_fields", []):
+                    results[key] = results[key][patch[1] : patch[3], patch[0] : patch[2]]
                 return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(min_ious={self.min_ious}, '
-        repr_str += f'min_crop_size={self.min_crop_size}, '
-        repr_str += f'bbox_clip_border={self.bbox_clip_border})'
+        repr_str += f"(min_ious={self.min_ious}, "
+        repr_str += f"min_crop_size={self.min_crop_size}, "
+        repr_str += f"bbox_clip_border={self.bbox_clip_border})"
         return repr_str
 
 
@@ -1307,20 +1475,18 @@ def __call__(self, results):
         """
 
         if corrupt is None:
-            raise RuntimeError('imagecorruptions is not installed')
-        if 'img_fields' in results:
-            assert results['img_fields'] == ['img'], \
-                'Only single img_fields is allowed'
-        results['img'] = corrupt(
-            results['img'].astype(np.uint8),
-            corruption_name=self.corruption,
-            severity=self.severity)
+            raise RuntimeError("imagecorruptions is not installed")
+        if "img_fields" in results:
+            assert results["img_fields"] == ["img"], "Only single img_fields is allowed"
+        results["img"] = corrupt(
+            results["img"].astype(np.uint8), corruption_name=self.corruption, severity=self.severity
+        )
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(corruption={self.corruption}, '
-        repr_str += f'severity={self.severity})'
+        repr_str += f"(corruption={self.corruption}, "
+        repr_str += f"severity={self.severity})"
         return repr_str
 
 
@@ -1367,14 +1533,16 @@ class Albu:
             after aug
     """
 
-    def __init__(self,
-                 transforms,
-                 bbox_params=None,
-                 keymap=None,
-                 update_pad_shape=False,
-                 skip_img_without_anno=False):
+    def __init__(
+        self,
+        transforms,
+        bbox_params=None,
+        keymap=None,
+        update_pad_shape=False,
+        skip_img_without_anno=False,
+    ):
         if Compose is None:
-            raise RuntimeError('albumentations is not installed')
+            raise RuntimeError("albumentations is not installed")
 
         # Args will be modified later, copying it will be safer
         transforms = copy.deepcopy(transforms)
@@ -1388,24 +1556,23 @@ def __init__(self,
         self.skip_img_without_anno = skip_img_without_anno
 
         # A simple workaround to remove masks without boxes
-        if (isinstance(bbox_params, dict) and 'label_fields' in bbox_params
-                and 'filter_lost_elements' in bbox_params):
+        if (
+            isinstance(bbox_params, dict)
+            and "label_fields" in bbox_params
+            and "filter_lost_elements" in bbox_params
+        ):
             self.filter_lost_elements = True
-            self.origin_label_fields = bbox_params['label_fields']
-            bbox_params['label_fields'] = ['idx_mapper']
-            del bbox_params['filter_lost_elements']
+            self.origin_label_fields = bbox_params["label_fields"]
+            bbox_params["label_fields"] = ["idx_mapper"]
+            del bbox_params["filter_lost_elements"]
 
-        self.bbox_params = (
-            self.albu_builder(bbox_params) if bbox_params else None)
-        self.aug = Compose([self.albu_builder(t) for t in self.transforms],
-                           bbox_params=self.bbox_params)
+        self.bbox_params = self.albu_builder(bbox_params) if bbox_params else None
+        self.aug = Compose(
+            [self.albu_builder(t) for t in self.transforms], bbox_params=self.bbox_params
+        )
 
         if not keymap:
-            self.keymap_to_albu = {
-                'img': 'image',
-                'gt_masks': 'masks',
-                'gt_bboxes': 'bboxes'
-            }
+            self.keymap_to_albu = {"img": "image", "gt_masks": "masks", "gt_bboxes": "bboxes"}
         else:
             self.keymap_to_albu = keymap
         self.keymap_back = {v: k for k, v in self.keymap_to_albu.items()}
@@ -1422,25 +1589,21 @@ def albu_builder(self, cfg):
             obj: The constructed object.
         """
 
-        assert isinstance(cfg, dict) and 'type' in cfg
+        assert isinstance(cfg, dict) and "type" in cfg
         args = cfg.copy()
 
-        obj_type = args.pop('type')
+        obj_type = args.pop("type")
         if mmcv.is_str(obj_type):
             if albumentations is None:
-                raise RuntimeError('albumentations is not installed')
+                raise RuntimeError("albumentations is not installed")
             obj_cls = getattr(albumentations, obj_type)
         elif inspect.isclass(obj_type):
             obj_cls = obj_type
         else:
-            raise TypeError(
-                f'type must be a str or valid type, but got {type(obj_type)}')
+            raise TypeError(f"type must be a str or valid type, but got {type(obj_type)}")
 
-        if 'transforms' in args:
-            args['transforms'] = [
-                self.albu_builder(transform)
-                for transform in args['transforms']
-            ]
+        if "transforms" in args:
+            args["transforms"] = [self.albu_builder(transform) for transform in args["transforms"]]
 
         return obj_cls(**args)
 
@@ -1465,66 +1628,62 @@ def __call__(self, results):
         # dict to albumentations format
         results = self.mapper(results, self.keymap_to_albu)
         # TODO: add bbox_fields
-        if 'bboxes' in results:
+        if "bboxes" in results:
             # to list of boxes
-            if isinstance(results['bboxes'], np.ndarray):
-                results['bboxes'] = [x for x in results['bboxes']]
+            if isinstance(results["bboxes"], np.ndarray):
+                results["bboxes"] = [x for x in results["bboxes"]]
             # add pseudo-field for filtration
             if self.filter_lost_elements:
-                results['idx_mapper'] = np.arange(len(results['bboxes']))
+                results["idx_mapper"] = np.arange(len(results["bboxes"]))
 
         # TODO: Support mask structure in albu
-        if 'masks' in results:
-            if isinstance(results['masks'], PolygonMasks):
-                raise NotImplementedError(
-                    'Albu only supports BitMap masks now')
-            ori_masks = results['masks']
-            if albumentations.__version__ < '0.5':
-                results['masks'] = results['masks'].masks
+        if "masks" in results:
+            if isinstance(results["masks"], PolygonMasks):
+                raise NotImplementedError("Albu only supports BitMap masks now")
+            ori_masks = results["masks"]
+            if albumentations.__version__ < "0.5":
+                results["masks"] = results["masks"].masks
             else:
-                results['masks'] = [mask for mask in results['masks'].masks]
+                results["masks"] = [mask for mask in results["masks"].masks]
 
         results = self.aug(**results)
 
-        if 'bboxes' in results:
-            if isinstance(results['bboxes'], list):
-                results['bboxes'] = np.array(
-                    results['bboxes'], dtype=np.float32)
-            results['bboxes'] = results['bboxes'].reshape(-1, 4)
+        if "bboxes" in results:
+            if isinstance(results["bboxes"], list):
+                results["bboxes"] = np.array(results["bboxes"], dtype=np.float32)
+            results["bboxes"] = results["bboxes"].reshape(-1, 4)
 
             # filter label_fields
             if self.filter_lost_elements:
-
                 for label in self.origin_label_fields:
-                    results[label] = np.array(
-                        [results[label][i] for i in results['idx_mapper']])
-                if 'masks' in results:
-                    results['masks'] = np.array(
-                        [results['masks'][i] for i in results['idx_mapper']])
-                    results['masks'] = ori_masks.__class__(
-                        results['masks'], results['image'].shape[0],
-                        results['image'].shape[1])
-
-                if (not len(results['idx_mapper'])
-                        and self.skip_img_without_anno):
+                    results[label] = np.array([results[label][i] for i in results["idx_mapper"]])
+                if "masks" in results:
+                    results["masks"] = np.array(
+                        [results["masks"][i] for i in results["idx_mapper"]]
+                    )
+                    results["masks"] = ori_masks.__class__(
+                        results["masks"], results["image"].shape[0], results["image"].shape[1]
+                    )
+
+                if not len(results["idx_mapper"]) and self.skip_img_without_anno:
                     return None
 
-        if 'gt_labels' in results:
-            if isinstance(results['gt_labels'], list):
-                results['gt_labels'] = np.array(results['gt_labels'])
-            results['gt_labels'] = results['gt_labels'].astype(np.int64)
+        if "gt_labels" in results:
+            if isinstance(results["gt_labels"], list):
+                results["gt_labels"] = np.array(results["gt_labels"])
+            results["gt_labels"] = results["gt_labels"].astype(np.int64)
 
         # back to the original format
         results = self.mapper(results, self.keymap_back)
 
         # update final shape
         if self.update_pad_shape:
-            results['pad_shape'] = results['img'].shape
+            results["pad_shape"] = results["img"].shape
 
         return results
 
     def __repr__(self):
-        repr_str = self.__class__.__name__ + f'(transforms={self.transforms})'
+        repr_str = self.__class__.__name__ + f"(transforms={self.transforms})"
         return repr_str
 
 
@@ -1617,30 +1776,30 @@ class RandomCenterCropPad:
             the border of the image. Defaults to True.
     """
 
-    def __init__(self,
-                 crop_size=None,
-                 ratios=(0.9, 1.0, 1.1),
-                 border=128,
-                 mean=None,
-                 std=None,
-                 to_rgb=None,
-                 test_mode=False,
-                 test_pad_mode=('logical_or', 127),
-                 test_pad_add_pix=0,
-                 bbox_clip_border=True):
+    def __init__(
+        self,
+        crop_size=None,
+        ratios=(0.9, 1.0, 1.1),
+        border=128,
+        mean=None,
+        std=None,
+        to_rgb=None,
+        test_mode=False,
+        test_pad_mode=("logical_or", 127),
+        test_pad_add_pix=0,
+        bbox_clip_border=True,
+    ):
         if test_mode:
-            assert crop_size is None, 'crop_size must be None in test mode'
-            assert ratios is None, 'ratios must be None in test mode'
-            assert border is None, 'border must be None in test mode'
+            assert crop_size is None, "crop_size must be None in test mode"
+            assert ratios is None, "ratios must be None in test mode"
+            assert border is None, "border must be None in test mode"
             assert isinstance(test_pad_mode, (list, tuple))
-            assert test_pad_mode[0] in ['logical_or', 'size_divisor']
+            assert test_pad_mode[0] in ["logical_or", "size_divisor"]
         else:
             assert isinstance(crop_size, (list, tuple))
-            assert crop_size[0] > 0 and crop_size[1] > 0, (
-                'crop_size must > 0 in train mode')
+            assert crop_size[0] > 0 and crop_size[1] > 0, "crop_size must > 0 in train mode"
             assert isinstance(ratios, (list, tuple))
-            assert test_pad_mode is None, (
-                'test_pad_mode must be None in train mode')
+            assert test_pad_mode is None, "test_pad_mode must be None in train mode"
 
         self.crop_size = crop_size
         self.ratios = ratios
@@ -1693,9 +1852,12 @@ def _filter_boxes(self, patch, boxes):
             mask (numpy array, (N,)): Each box is inside or outside the patch.
         """
         center = (boxes[:, :2] + boxes[:, 2:]) / 2
-        mask = (center[:, 0] > patch[0]) * (center[:, 1] > patch[1]) * (
-            center[:, 0] < patch[2]) * (
-                center[:, 1] < patch[3])
+        mask = (
+            (center[:, 0] > patch[0])
+            * (center[:, 1] > patch[1])
+            * (center[:, 0] < patch[2])
+            * (center[:, 1] < patch[3])
+        )
         return mask
 
     def _crop_image_and_paste(self, image, center, size):
@@ -1741,11 +1903,15 @@ def _crop_image_and_paste(self, image, center, size):
         x_slice = slice(cropped_center_x - left, cropped_center_x + right)
         cropped_img[y_slice, x_slice, :] = image[y0:y1, x0:x1, :]
 
-        border = np.array([
-            cropped_center_y - top, cropped_center_y + bottom,
-            cropped_center_x - left, cropped_center_x + right
-        ],
-                          dtype=np.float32)
+        border = np.array(
+            [
+                cropped_center_y - top,
+                cropped_center_y + bottom,
+                cropped_center_x - left,
+                cropped_center_x + right,
+            ],
+            dtype=np.float32,
+        )
 
         return cropped_img, border, patch
 
@@ -1758,9 +1924,9 @@ def _train_aug(self, results):
         Returns:
             results (dict): The updated dict.
         """
-        img = results['img']
+        img = results["img"]
         h, w, c = img.shape
-        boxes = results['gt_bboxes']
+        boxes = results["gt_bboxes"]
         while True:
             scale = random.choice(self.ratios)
             new_h = int(self.crop_size[0] * scale)
@@ -1773,16 +1939,17 @@ def _train_aug(self, results):
                 center_y = random.randint(low=h_border, high=h - h_border)
 
                 cropped_img, border, patch = self._crop_image_and_paste(
-                    img, [center_y, center_x], [new_h, new_w])
+                    img, [center_y, center_x], [new_h, new_w]
+                )
 
                 mask = self._filter_boxes(patch, boxes)
                 # if image do not have valid bbox, any crop patch is valid.
                 if not mask.any() and len(boxes) > 0:
                     continue
 
-                results['img'] = cropped_img
-                results['img_shape'] = cropped_img.shape
-                results['pad_shape'] = cropped_img.shape
+                results["img"] = cropped_img
+                results["img_shape"] = cropped_img.shape
+                results["pad_shape"] = cropped_img.shape
 
                 x0, y0, x1, y1 = patch
 
@@ -1790,7 +1957,7 @@ def _train_aug(self, results):
                 cropped_center_x, cropped_center_y = new_w // 2, new_h // 2
 
                 # crop bboxes accordingly and clip to the image boundary
-                for key in results.get('bbox_fields', []):
+                for key in results.get("bbox_fields", []):
                     mask = self._filter_boxes(patch, results[key])
                     bboxes = results[key][mask]
                     bboxes[:, 0:4:2] += cropped_center_x - left_w - x0
@@ -1798,23 +1965,20 @@ def _train_aug(self, results):
                     if self.bbox_clip_border:
                         bboxes[:, 0:4:2] = np.clip(bboxes[:, 0:4:2], 0, new_w)
                         bboxes[:, 1:4:2] = np.clip(bboxes[:, 1:4:2], 0, new_h)
-                    keep = (bboxes[:, 2] > bboxes[:, 0]) & (
-                        bboxes[:, 3] > bboxes[:, 1])
+                    keep = (bboxes[:, 2] > bboxes[:, 0]) & (bboxes[:, 3] > bboxes[:, 1])
                     bboxes = bboxes[keep]
                     results[key] = bboxes
-                    if key in ['gt_bboxes']:
-                        if 'gt_labels' in results:
-                            labels = results['gt_labels'][mask]
+                    if key in ["gt_bboxes"]:
+                        if "gt_labels" in results:
+                            labels = results["gt_labels"][mask]
                             labels = labels[keep]
-                            results['gt_labels'] = labels
-                        if 'gt_masks' in results:
-                            raise NotImplementedError(
-                                'RandomCenterCropPad only supports bbox.')
+                            results["gt_labels"] = labels
+                        if "gt_masks" in results:
+                            raise NotImplementedError("RandomCenterCropPad only supports bbox.")
 
                 # crop semantic seg
-                for key in results.get('seg_fields', []):
-                    raise NotImplementedError(
-                        'RandomCenterCropPad only supports bbox.')
+                for key in results.get("seg_fields", []):
+                    raise NotImplementedError("RandomCenterCropPad only supports bbox.")
                 return results
 
     def _test_aug(self, results):
@@ -1828,34 +1992,37 @@ def _test_aug(self, results):
         Returns:
             results (dict): The updated dict.
         """
-        img = results['img']
+        img = results["img"]
         h, w, c = img.shape
-        results['img_shape'] = img.shape
-        if self.test_pad_mode[0] in ['logical_or']:
+        results["img_shape"] = img.shape
+        if self.test_pad_mode[0] in ["logical_or"]:
             # self.test_pad_add_pix is only used for centernet
             target_h = (h | self.test_pad_mode[1]) + self.test_pad_add_pix
             target_w = (w | self.test_pad_mode[1]) + self.test_pad_add_pix
-        elif self.test_pad_mode[0] in ['size_divisor']:
+        elif self.test_pad_mode[0] in ["size_divisor"]:
             divisor = self.test_pad_mode[1]
             target_h = int(np.ceil(h / divisor)) * divisor
             target_w = int(np.ceil(w / divisor)) * divisor
         else:
             raise NotImplementedError(
-                'RandomCenterCropPad only support two testing pad mode:'
-                'logical-or and size_divisor.')
+                "RandomCenterCropPad only support two testing pad mode:"
+                "logical-or and size_divisor."
+            )
 
         cropped_img, border, _ = self._crop_image_and_paste(
-            img, [h // 2, w // 2], [target_h, target_w])
-        results['img'] = cropped_img
-        results['pad_shape'] = cropped_img.shape
-        results['border'] = border
+            img, [h // 2, w // 2], [target_h, target_w]
+        )
+        results["img"] = cropped_img
+        results["pad_shape"] = cropped_img.shape
+        results["border"] = border
         return results
 
     def __call__(self, results):
-        img = results['img']
+        img = results["img"]
         assert img.dtype == np.float32, (
-            'RandomCenterCropPad needs the input image of dtype np.float32,'
-            ' please set "to_float32=True" in "LoadImageFromFile" pipeline')
+            "RandomCenterCropPad needs the input image of dtype np.float32,"
+            ' please set "to_float32=True" in "LoadImageFromFile" pipeline'
+        )
         h, w, c = img.shape
         assert c == len(self.mean)
         if self.test_mode:
@@ -1865,15 +2032,15 @@ def __call__(self, results):
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(crop_size={self.crop_size}, '
-        repr_str += f'ratios={self.ratios}, '
-        repr_str += f'border={self.border}, '
-        repr_str += f'mean={self.input_mean}, '
-        repr_str += f'std={self.input_std}, '
-        repr_str += f'to_rgb={self.to_rgb}, '
-        repr_str += f'test_mode={self.test_mode}, '
-        repr_str += f'test_pad_mode={self.test_pad_mode}, '
-        repr_str += f'bbox_clip_border={self.bbox_clip_border})'
+        repr_str += f"(crop_size={self.crop_size}, "
+        repr_str += f"ratios={self.ratios}, "
+        repr_str += f"border={self.border}, "
+        repr_str += f"mean={self.input_mean}, "
+        repr_str += f"std={self.input_std}, "
+        repr_str += f"to_rgb={self.to_rgb}, "
+        repr_str += f"test_mode={self.test_mode}, "
+        repr_str += f"test_pad_mode={self.test_pad_mode}, "
+        repr_str += f"bbox_clip_border={self.bbox_clip_border})"
         return repr_str
 
 
@@ -1901,16 +2068,11 @@ class CutOut:
             of pixel to fill in the dropped regions. Default: (0, 0, 0).
     """
 
-    def __init__(self,
-                 n_holes,
-                 cutout_shape=None,
-                 cutout_ratio=None,
-                 fill_in=(0, 0, 0)):
-
-        assert (cutout_shape is None) ^ (cutout_ratio is None), \
-            'Either cutout_shape or cutout_ratio should be specified.'
-        assert (isinstance(cutout_shape, (list, tuple))
-                or isinstance(cutout_ratio, (list, tuple)))
+    def __init__(self, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0)):
+        assert (cutout_shape is None) ^ (
+            cutout_ratio is None
+        ), "Either cutout_shape or cutout_ratio should be specified."
+        assert isinstance(cutout_shape, (list, tuple)) or isinstance(cutout_ratio, (list, tuple))
         if isinstance(n_holes, tuple):
             assert len(n_holes) == 2 and 0 <= n_holes[0] < n_holes[1]
         else:
@@ -1924,7 +2086,7 @@ def __init__(self,
 
     def __call__(self, results):
         """Call function to drop some regions of image."""
-        h, w, c = results['img'].shape
+        h, w, c = results["img"].shape
         n_holes = np.random.randint(self.n_holes[0], self.n_holes[1] + 1)
         for _ in range(n_holes):
             x1 = np.random.randint(0, w)
@@ -1938,16 +2100,19 @@ def __call__(self, results):
 
             x2 = np.clip(x1 + cutout_w, 0, w)
             y2 = np.clip(y1 + cutout_h, 0, h)
-            results['img'][y1:y2, x1:x2, :] = self.fill_in
+            results["img"][y1:y2, x1:x2, :] = self.fill_in
 
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(n_holes={self.n_holes}, '
-        repr_str += (f'cutout_ratio={self.candidates}, ' if self.with_ratio
-                     else f'cutout_shape={self.candidates}, ')
-        repr_str += f'fill_in={self.fill_in})'
+        repr_str += f"(n_holes={self.n_holes}, "
+        repr_str += (
+            f"cutout_ratio={self.candidates}, "
+            if self.with_ratio
+            else f"cutout_shape={self.candidates}, "
+        )
+        repr_str += f"fill_in={self.fill_in})"
         return repr_str
 
 
@@ -2005,17 +2170,18 @@ class Mosaic:
             Default to 1.0.
     """
 
-    def __init__(self,
-                 img_scale=(640, 640),
-                 center_ratio_range=(0.5, 1.5),
-                 min_bbox_size=0,
-                 bbox_clip_border=True,
-                 skip_filter=True,
-                 pad_val=114,
-                 prob=1.0):
+    def __init__(
+        self,
+        img_scale=(640, 640),
+        center_ratio_range=(0.5, 1.5),
+        min_bbox_size=0,
+        bbox_clip_border=True,
+        skip_filter=True,
+        pad_val=114,
+        prob=1.0,
+    ):
         assert isinstance(img_scale, tuple)
-        assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. '\
-            f'got {prob}.'
+        assert 0 <= prob <= 1.0, "The probability should be in range [0,1]. " f"got {prob}."
 
         log_img_scale(img_scale, skip_square=True)
         self.img_scale = img_scale
@@ -2065,45 +2231,44 @@ def _mosaic_transform(self, results):
             dict: Updated result dict.
         """
 
-        assert 'mix_results' in results
+        assert "mix_results" in results
         mosaic_labels = []
         mosaic_bboxes = []
-        if len(results['img'].shape) == 3:
+        if len(results["img"].shape) == 3:
             mosaic_img = np.full(
                 (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2), 3),
                 self.pad_val,
-                dtype=results['img'].dtype)
+                dtype=results["img"].dtype,
+            )
         else:
             mosaic_img = np.full(
                 (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2)),
                 self.pad_val,
-                dtype=results['img'].dtype)
+                dtype=results["img"].dtype,
+            )
 
         # mosaic center x, y
-        center_x = int(
-            random.uniform(*self.center_ratio_range) * self.img_scale[1])
-        center_y = int(
-            random.uniform(*self.center_ratio_range) * self.img_scale[0])
+        center_x = int(random.uniform(*self.center_ratio_range) * self.img_scale[1])
+        center_y = int(random.uniform(*self.center_ratio_range) * self.img_scale[0])
         center_position = (center_x, center_y)
 
-        loc_strs = ('top_left', 'top_right', 'bottom_left', 'bottom_right')
+        loc_strs = ("top_left", "top_right", "bottom_left", "bottom_right")
         for i, loc in enumerate(loc_strs):
-            if loc == 'top_left':
+            if loc == "top_left":
                 results_patch = copy.deepcopy(results)
             else:
-                results_patch = copy.deepcopy(results['mix_results'][i - 1])
+                results_patch = copy.deepcopy(results["mix_results"][i - 1])
 
-            img_i = results_patch['img']
+            img_i = results_patch["img"]
             h_i, w_i = img_i.shape[:2]
             # keep_ratio resize
-            scale_ratio_i = min(self.img_scale[0] / h_i,
-                                self.img_scale[1] / w_i)
-            img_i = mmcv.imresize(
-                img_i, (int(w_i * scale_ratio_i), int(h_i * scale_ratio_i)))
+            scale_ratio_i = min(self.img_scale[0] / h_i, self.img_scale[1] / w_i)
+            img_i = mmcv.imresize(img_i, (int(w_i * scale_ratio_i), int(h_i * scale_ratio_i)))
 
             # compute the combine parameters
             paste_coord, crop_coord = self._mosaic_combine(
-                loc, center_position, img_i.shape[:2][::-1])
+                loc, center_position, img_i.shape[:2][::-1]
+            )
             x1_p, y1_p, x2_p, y2_p = paste_coord
             x1_c, y1_c, x2_c, y2_c = crop_coord
 
@@ -2111,16 +2276,14 @@ def _mosaic_transform(self, results):
             mosaic_img[y1_p:y2_p, x1_p:x2_p] = img_i[y1_c:y2_c, x1_c:x2_c]
 
             # adjust coordinate
-            gt_bboxes_i = results_patch['gt_bboxes']
-            gt_labels_i = results_patch['gt_labels']
+            gt_bboxes_i = results_patch["gt_bboxes"]
+            gt_labels_i = results_patch["gt_labels"]
 
             if gt_bboxes_i.shape[0] > 0:
                 padw = x1_p - x1_c
                 padh = y1_p - y1_c
-                gt_bboxes_i[:, 0::2] = \
-                    scale_ratio_i * gt_bboxes_i[:, 0::2] + padw
-                gt_bboxes_i[:, 1::2] = \
-                    scale_ratio_i * gt_bboxes_i[:, 1::2] + padh
+                gt_bboxes_i[:, 0::2] = scale_ratio_i * gt_bboxes_i[:, 0::2] + padw
+                gt_bboxes_i[:, 1::2] = scale_ratio_i * gt_bboxes_i[:, 1::2] + padh
 
             mosaic_bboxes.append(gt_bboxes_i)
             mosaic_labels.append(gt_labels_i)
@@ -2130,25 +2293,25 @@ def _mosaic_transform(self, results):
             mosaic_labels = np.concatenate(mosaic_labels, 0)
 
             if self.bbox_clip_border:
-                mosaic_bboxes[:, 0::2] = np.clip(mosaic_bboxes[:, 0::2], 0,
-                                                 2 * self.img_scale[1])
-                mosaic_bboxes[:, 1::2] = np.clip(mosaic_bboxes[:, 1::2], 0,
-                                                 2 * self.img_scale[0])
+                mosaic_bboxes[:, 0::2] = np.clip(mosaic_bboxes[:, 0::2], 0, 2 * self.img_scale[1])
+                mosaic_bboxes[:, 1::2] = np.clip(mosaic_bboxes[:, 1::2], 0, 2 * self.img_scale[0])
 
             if not self.skip_filter:
-                mosaic_bboxes, mosaic_labels = \
-                    self._filter_box_candidates(mosaic_bboxes, mosaic_labels)
+                mosaic_bboxes, mosaic_labels = self._filter_box_candidates(
+                    mosaic_bboxes, mosaic_labels
+                )
 
         # remove outside bboxes
-        inside_inds = find_inside_bboxes(mosaic_bboxes, 2 * self.img_scale[0],
-                                         2 * self.img_scale[1])
+        inside_inds = find_inside_bboxes(
+            mosaic_bboxes, 2 * self.img_scale[0], 2 * self.img_scale[1]
+        )
         mosaic_bboxes = mosaic_bboxes[inside_inds]
         mosaic_labels = mosaic_labels[inside_inds]
 
-        results['img'] = mosaic_img
-        results['img_shape'] = mosaic_img.shape
-        results['gt_bboxes'] = mosaic_bboxes
-        results['gt_labels'] = mosaic_labels
+        results["img"] = mosaic_img
+        results["img_shape"] = mosaic_img.shape
+        results["gt_bboxes"] = mosaic_bboxes
+        results["gt_labels"] = mosaic_labels
 
         return results
 
@@ -2169,46 +2332,61 @@ def _mosaic_combine(self, loc, center_position_xy, img_shape_wh):
                 - paste_coord (tuple): paste corner coordinate in mosaic image.
                 - crop_coord (tuple): crop corner coordinate in mosaic image.
         """
-        assert loc in ('top_left', 'top_right', 'bottom_left', 'bottom_right')
-        if loc == 'top_left':
+        assert loc in ("top_left", "top_right", "bottom_left", "bottom_right")
+        if loc == "top_left":
             # index0 to top left part of image
-            x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
-                             max(center_position_xy[1] - img_shape_wh[1], 0), \
-                             center_position_xy[0], \
-                             center_position_xy[1]
-            crop_coord = img_shape_wh[0] - (x2 - x1), img_shape_wh[1] - (
-                y2 - y1), img_shape_wh[0], img_shape_wh[1]
-
-        elif loc == 'top_right':
+            x1, y1, x2, y2 = (
+                max(center_position_xy[0] - img_shape_wh[0], 0),
+                max(center_position_xy[1] - img_shape_wh[1], 0),
+                center_position_xy[0],
+                center_position_xy[1],
+            )
+            crop_coord = (
+                img_shape_wh[0] - (x2 - x1),
+                img_shape_wh[1] - (y2 - y1),
+                img_shape_wh[0],
+                img_shape_wh[1],
+            )
+
+        elif loc == "top_right":
             # index1 to top right part of image
-            x1, y1, x2, y2 = center_position_xy[0], \
-                             max(center_position_xy[1] - img_shape_wh[1], 0), \
-                             min(center_position_xy[0] + img_shape_wh[0],
-                                 self.img_scale[1] * 2), \
-                             center_position_xy[1]
-            crop_coord = 0, img_shape_wh[1] - (y2 - y1), min(
-                img_shape_wh[0], x2 - x1), img_shape_wh[1]
-
-        elif loc == 'bottom_left':
+            x1, y1, x2, y2 = (
+                center_position_xy[0],
+                max(center_position_xy[1] - img_shape_wh[1], 0),
+                min(center_position_xy[0] + img_shape_wh[0], self.img_scale[1] * 2),
+                center_position_xy[1],
+            )
+            crop_coord = (
+                0,
+                img_shape_wh[1] - (y2 - y1),
+                min(img_shape_wh[0], x2 - x1),
+                img_shape_wh[1],
+            )
+
+        elif loc == "bottom_left":
             # index2 to bottom left part of image
-            x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
-                             center_position_xy[1], \
-                             center_position_xy[0], \
-                             min(self.img_scale[0] * 2, center_position_xy[1] +
-                                 img_shape_wh[1])
-            crop_coord = img_shape_wh[0] - (x2 - x1), 0, img_shape_wh[0], min(
-                y2 - y1, img_shape_wh[1])
+            x1, y1, x2, y2 = (
+                max(center_position_xy[0] - img_shape_wh[0], 0),
+                center_position_xy[1],
+                center_position_xy[0],
+                min(self.img_scale[0] * 2, center_position_xy[1] + img_shape_wh[1]),
+            )
+            crop_coord = (
+                img_shape_wh[0] - (x2 - x1),
+                0,
+                img_shape_wh[0],
+                min(y2 - y1, img_shape_wh[1]),
+            )
 
         else:
             # index3 to bottom right part of image
-            x1, y1, x2, y2 = center_position_xy[0], \
-                             center_position_xy[1], \
-                             min(center_position_xy[0] + img_shape_wh[0],
-                                 self.img_scale[1] * 2), \
-                             min(self.img_scale[0] * 2, center_position_xy[1] +
-                                 img_shape_wh[1])
-            crop_coord = 0, 0, min(img_shape_wh[0],
-                                   x2 - x1), min(y2 - y1, img_shape_wh[1])
+            x1, y1, x2, y2 = (
+                center_position_xy[0],
+                center_position_xy[1],
+                min(center_position_xy[0] + img_shape_wh[0], self.img_scale[1] * 2),
+                min(self.img_scale[0] * 2, center_position_xy[1] + img_shape_wh[1]),
+            )
+            crop_coord = 0, 0, min(img_shape_wh[0], x2 - x1), min(y2 - y1, img_shape_wh[1])
 
         paste_coord = x1, y1, x2, y2
         return paste_coord, crop_coord
@@ -2217,18 +2395,17 @@ def _filter_box_candidates(self, bboxes, labels):
         """Filter out bboxes too small after Mosaic."""
         bbox_w = bboxes[:, 2] - bboxes[:, 0]
         bbox_h = bboxes[:, 3] - bboxes[:, 1]
-        valid_inds = (bbox_w > self.min_bbox_size) & \
-                     (bbox_h > self.min_bbox_size)
+        valid_inds = (bbox_w > self.min_bbox_size) & (bbox_h > self.min_bbox_size)
         valid_inds = np.nonzero(valid_inds)[0]
         return bboxes[valid_inds], labels[valid_inds]
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'img_scale={self.img_scale}, '
-        repr_str += f'center_ratio_range={self.center_ratio_range}, '
-        repr_str += f'pad_val={self.pad_val}, '
-        repr_str += f'min_bbox_size={self.min_bbox_size}, '
-        repr_str += f'skip_filter={self.skip_filter})'
+        repr_str += f"img_scale={self.img_scale}, "
+        repr_str += f"center_ratio_range={self.center_ratio_range}, "
+        repr_str += f"pad_val={self.pad_val}, "
+        repr_str += f"min_bbox_size={self.min_bbox_size}, "
+        repr_str += f"skip_filter={self.skip_filter})"
         return repr_str
 
 
@@ -2289,17 +2466,19 @@ class MixUp:
             is invalid. Default to True.
     """
 
-    def __init__(self,
-                 img_scale=(640, 640),
-                 ratio_range=(0.5, 1.5),
-                 flip_ratio=0.5,
-                 pad_val=114,
-                 max_iters=15,
-                 min_bbox_size=5,
-                 min_area_ratio=0.2,
-                 max_aspect_ratio=20,
-                 bbox_clip_border=True,
-                 skip_filter=True):
+    def __init__(
+        self,
+        img_scale=(640, 640),
+        ratio_range=(0.5, 1.5),
+        flip_ratio=0.5,
+        pad_val=114,
+        max_iters=15,
+        min_bbox_size=5,
+        min_area_ratio=0.2,
+        max_aspect_ratio=20,
+        bbox_clip_border=True,
+        skip_filter=True,
+    ):
         assert isinstance(img_scale, tuple)
         log_img_scale(img_scale, skip_square=True)
         self.dynamic_scale = img_scale
@@ -2338,7 +2517,7 @@ def get_indexes(self, dataset):
 
         for i in range(self.max_iters):
             index = random.randint(0, len(dataset))
-            gt_bboxes_i = dataset.get_ann_info(index)['bboxes']
+            gt_bboxes_i = dataset.get_ann_info(index)["bboxes"]
             if len(gt_bboxes_i) != 0:
                 break
 
@@ -2354,54 +2533,57 @@ def _mixup_transform(self, results):
             dict: Updated result dict.
         """
 
-        assert 'mix_results' in results
-        assert len(
-            results['mix_results']) == 1, 'MixUp only support 2 images now !'
+        assert "mix_results" in results
+        assert len(results["mix_results"]) == 1, "MixUp only support 2 images now !"
 
-        if results['mix_results'][0]['gt_bboxes'].shape[0] == 0:
+        if results["mix_results"][0]["gt_bboxes"].shape[0] == 0:
             # empty bbox
             return results
 
-        retrieve_results = results['mix_results'][0]
-        retrieve_img = retrieve_results['img']
+        retrieve_results = results["mix_results"][0]
+        retrieve_img = retrieve_results["img"]
 
         jit_factor = random.uniform(*self.ratio_range)
         is_filp = random.uniform(0, 1) < self.flip_ratio
 
         if len(retrieve_img.shape) == 3:
-            out_img = np.ones(
-                (self.dynamic_scale[0], self.dynamic_scale[1], 3),
-                dtype=retrieve_img.dtype) * self.pad_val
+            out_img = (
+                np.ones((self.dynamic_scale[0], self.dynamic_scale[1], 3), dtype=retrieve_img.dtype)
+                * self.pad_val
+            )
         else:
-            out_img = np.ones(
-                self.dynamic_scale, dtype=retrieve_img.dtype) * self.pad_val
+            out_img = np.ones(self.dynamic_scale, dtype=retrieve_img.dtype) * self.pad_val
 
         # 1. keep_ratio resize
-        scale_ratio = min(self.dynamic_scale[0] / retrieve_img.shape[0],
-                          self.dynamic_scale[1] / retrieve_img.shape[1])
+        scale_ratio = min(
+            self.dynamic_scale[0] / retrieve_img.shape[0],
+            self.dynamic_scale[1] / retrieve_img.shape[1],
+        )
         retrieve_img = mmcv.imresize(
-            retrieve_img, (int(retrieve_img.shape[1] * scale_ratio),
-                           int(retrieve_img.shape[0] * scale_ratio)))
+            retrieve_img,
+            (int(retrieve_img.shape[1] * scale_ratio), int(retrieve_img.shape[0] * scale_ratio)),
+        )
 
         # 2. paste
-        out_img[:retrieve_img.shape[0], :retrieve_img.shape[1]] = retrieve_img
+        out_img[: retrieve_img.shape[0], : retrieve_img.shape[1]] = retrieve_img
 
         # 3. scale jit
         scale_ratio *= jit_factor
-        out_img = mmcv.imresize(out_img, (int(out_img.shape[1] * jit_factor),
-                                          int(out_img.shape[0] * jit_factor)))
+        out_img = mmcv.imresize(
+            out_img, (int(out_img.shape[1] * jit_factor), int(out_img.shape[0] * jit_factor))
+        )
 
         # 4. flip
         if is_filp:
             out_img = out_img[:, ::-1, :]
 
         # 5. random crop
-        ori_img = results['img']
+        ori_img = results["img"]
         origin_h, origin_w = out_img.shape[:2]
         target_h, target_w = ori_img.shape[:2]
-        padded_img = np.zeros(
-            (max(origin_h, target_h), max(origin_w,
-                                          target_w), 3)).astype(np.uint8)
+        padded_img = np.zeros((max(origin_h, target_h), max(origin_w, target_w), 3)).astype(
+            np.uint8
+        )
         padded_img[:origin_h, :origin_w] = out_img
 
         x_offset, y_offset = 0, 0
@@ -2409,61 +2591,52 @@ def _mixup_transform(self, results):
             y_offset = random.randint(0, padded_img.shape[0] - target_h)
         if padded_img.shape[1] > target_w:
             x_offset = random.randint(0, padded_img.shape[1] - target_w)
-        padded_cropped_img = padded_img[y_offset:y_offset + target_h,
-                                        x_offset:x_offset + target_w]
+        padded_cropped_img = padded_img[
+            y_offset : y_offset + target_h, x_offset : x_offset + target_w
+        ]
 
         # 6. adjust bbox
-        retrieve_gt_bboxes = retrieve_results['gt_bboxes']
+        retrieve_gt_bboxes = retrieve_results["gt_bboxes"]
         retrieve_gt_bboxes[:, 0::2] = retrieve_gt_bboxes[:, 0::2] * scale_ratio
         retrieve_gt_bboxes[:, 1::2] = retrieve_gt_bboxes[:, 1::2] * scale_ratio
         if self.bbox_clip_border:
-            retrieve_gt_bboxes[:, 0::2] = np.clip(retrieve_gt_bboxes[:, 0::2],
-                                                  0, origin_w)
-            retrieve_gt_bboxes[:, 1::2] = np.clip(retrieve_gt_bboxes[:, 1::2],
-                                                  0, origin_h)
+            retrieve_gt_bboxes[:, 0::2] = np.clip(retrieve_gt_bboxes[:, 0::2], 0, origin_w)
+            retrieve_gt_bboxes[:, 1::2] = np.clip(retrieve_gt_bboxes[:, 1::2], 0, origin_h)
 
         if is_filp:
-            retrieve_gt_bboxes[:, 0::2] = (
-                origin_w - retrieve_gt_bboxes[:, 0::2][:, ::-1])
+            retrieve_gt_bboxes[:, 0::2] = origin_w - retrieve_gt_bboxes[:, 0::2][:, ::-1]
 
         # 7. filter
         cp_retrieve_gt_bboxes = retrieve_gt_bboxes.copy()
-        cp_retrieve_gt_bboxes[:, 0::2] = \
-            cp_retrieve_gt_bboxes[:, 0::2] - x_offset
-        cp_retrieve_gt_bboxes[:, 1::2] = \
-            cp_retrieve_gt_bboxes[:, 1::2] - y_offset
+        cp_retrieve_gt_bboxes[:, 0::2] = cp_retrieve_gt_bboxes[:, 0::2] - x_offset
+        cp_retrieve_gt_bboxes[:, 1::2] = cp_retrieve_gt_bboxes[:, 1::2] - y_offset
         if self.bbox_clip_border:
-            cp_retrieve_gt_bboxes[:, 0::2] = np.clip(
-                cp_retrieve_gt_bboxes[:, 0::2], 0, target_w)
-            cp_retrieve_gt_bboxes[:, 1::2] = np.clip(
-                cp_retrieve_gt_bboxes[:, 1::2], 0, target_h)
+            cp_retrieve_gt_bboxes[:, 0::2] = np.clip(cp_retrieve_gt_bboxes[:, 0::2], 0, target_w)
+            cp_retrieve_gt_bboxes[:, 1::2] = np.clip(cp_retrieve_gt_bboxes[:, 1::2], 0, target_h)
 
         # 8. mix up
         ori_img = ori_img.astype(np.float32)
         mixup_img = 0.5 * ori_img + 0.5 * padded_cropped_img.astype(np.float32)
 
-        retrieve_gt_labels = retrieve_results['gt_labels']
+        retrieve_gt_labels = retrieve_results["gt_labels"]
         if not self.skip_filter:
-            keep_list = self._filter_box_candidates(retrieve_gt_bboxes.T,
-                                                    cp_retrieve_gt_bboxes.T)
+            keep_list = self._filter_box_candidates(retrieve_gt_bboxes.T, cp_retrieve_gt_bboxes.T)
 
             retrieve_gt_labels = retrieve_gt_labels[keep_list]
             cp_retrieve_gt_bboxes = cp_retrieve_gt_bboxes[keep_list]
 
-        mixup_gt_bboxes = np.concatenate(
-            (results['gt_bboxes'], cp_retrieve_gt_bboxes), axis=0)
-        mixup_gt_labels = np.concatenate(
-            (results['gt_labels'], retrieve_gt_labels), axis=0)
+        mixup_gt_bboxes = np.concatenate((results["gt_bboxes"], cp_retrieve_gt_bboxes), axis=0)
+        mixup_gt_labels = np.concatenate((results["gt_labels"], retrieve_gt_labels), axis=0)
 
         # remove outside bbox
         inside_inds = find_inside_bboxes(mixup_gt_bboxes, target_h, target_w)
         mixup_gt_bboxes = mixup_gt_bboxes[inside_inds]
         mixup_gt_labels = mixup_gt_labels[inside_inds]
 
-        results['img'] = mixup_img.astype(np.uint8)
-        results['img_shape'] = mixup_img.shape
-        results['gt_bboxes'] = mixup_gt_bboxes
-        results['gt_labels'] = mixup_gt_labels
+        results["img"] = mixup_img.astype(np.uint8)
+        results["img_shape"] = mixup_img.shape
+        results["gt_bboxes"] = mixup_gt_bboxes
+        results["gt_labels"] = mixup_gt_labels
 
         return results
 
@@ -2477,22 +2650,24 @@ def _filter_box_candidates(self, bbox1, bbox2):
         w1, h1 = bbox1[2] - bbox1[0], bbox1[3] - bbox1[1]
         w2, h2 = bbox2[2] - bbox2[0], bbox2[3] - bbox2[1]
         ar = np.maximum(w2 / (h2 + 1e-16), h2 / (w2 + 1e-16))
-        return ((w2 > self.min_bbox_size)
-                & (h2 > self.min_bbox_size)
-                & (w2 * h2 / (w1 * h1 + 1e-16) > self.min_area_ratio)
-                & (ar < self.max_aspect_ratio))
+        return (
+            (w2 > self.min_bbox_size)
+            & (h2 > self.min_bbox_size)
+            & (w2 * h2 / (w1 * h1 + 1e-16) > self.min_area_ratio)
+            & (ar < self.max_aspect_ratio)
+        )
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'dynamic_scale={self.dynamic_scale}, '
-        repr_str += f'ratio_range={self.ratio_range}, '
-        repr_str += f'flip_ratio={self.flip_ratio}, '
-        repr_str += f'pad_val={self.pad_val}, '
-        repr_str += f'max_iters={self.max_iters}, '
-        repr_str += f'min_bbox_size={self.min_bbox_size}, '
-        repr_str += f'min_area_ratio={self.min_area_ratio}, '
-        repr_str += f'max_aspect_ratio={self.max_aspect_ratio}, '
-        repr_str += f'skip_filter={self.skip_filter})'
+        repr_str += f"dynamic_scale={self.dynamic_scale}, "
+        repr_str += f"ratio_range={self.ratio_range}, "
+        repr_str += f"flip_ratio={self.flip_ratio}, "
+        repr_str += f"pad_val={self.pad_val}, "
+        repr_str += f"max_iters={self.max_iters}, "
+        repr_str += f"min_bbox_size={self.min_bbox_size}, "
+        repr_str += f"min_area_ratio={self.min_area_ratio}, "
+        repr_str += f"max_aspect_ratio={self.max_aspect_ratio}, "
+        repr_str += f"skip_filter={self.skip_filter})"
         return repr_str
 
 
@@ -2536,18 +2711,20 @@ class RandomAffine:
             is invalid. Default to True.
     """
 
-    def __init__(self,
-                 max_rotate_degree=10.0,
-                 max_translate_ratio=0.1,
-                 scaling_ratio_range=(0.5, 1.5),
-                 max_shear_degree=2.0,
-                 border=(0, 0),
-                 border_val=(114, 114, 114),
-                 min_bbox_size=2,
-                 min_area_ratio=0.2,
-                 max_aspect_ratio=20,
-                 bbox_clip_border=True,
-                 skip_filter=True):
+    def __init__(
+        self,
+        max_rotate_degree=10.0,
+        max_translate_ratio=0.1,
+        scaling_ratio_range=(0.5, 1.5),
+        max_shear_degree=2.0,
+        border=(0, 0),
+        border_val=(114, 114, 114),
+        min_bbox_size=2,
+        min_area_ratio=0.2,
+        max_aspect_ratio=20,
+        bbox_clip_border=True,
+        skip_filter=True,
+    ):
         assert 0 <= max_translate_ratio <= 1
         assert scaling_ratio_range[0] <= scaling_ratio_range[1]
         assert scaling_ratio_range[0] > 0
@@ -2564,46 +2741,37 @@ def __init__(self,
         self.skip_filter = skip_filter
 
     def __call__(self, results):
-        img = results['img']
+        img = results["img"]
         height = img.shape[0] + self.border[0] * 2
         width = img.shape[1] + self.border[1] * 2
 
         # Rotation
-        rotation_degree = random.uniform(-self.max_rotate_degree,
-                                         self.max_rotate_degree)
+        rotation_degree = random.uniform(-self.max_rotate_degree, self.max_rotate_degree)
         rotation_matrix = self._get_rotation_matrix(rotation_degree)
 
         # Scaling
-        scaling_ratio = random.uniform(self.scaling_ratio_range[0],
-                                       self.scaling_ratio_range[1])
+        scaling_ratio = random.uniform(self.scaling_ratio_range[0], self.scaling_ratio_range[1])
         scaling_matrix = self._get_scaling_matrix(scaling_ratio)
 
         # Shear
-        x_degree = random.uniform(-self.max_shear_degree,
-                                  self.max_shear_degree)
-        y_degree = random.uniform(-self.max_shear_degree,
-                                  self.max_shear_degree)
+        x_degree = random.uniform(-self.max_shear_degree, self.max_shear_degree)
+        y_degree = random.uniform(-self.max_shear_degree, self.max_shear_degree)
         shear_matrix = self._get_shear_matrix(x_degree, y_degree)
 
         # Translation
-        trans_x = random.uniform(-self.max_translate_ratio,
-                                 self.max_translate_ratio) * width
-        trans_y = random.uniform(-self.max_translate_ratio,
-                                 self.max_translate_ratio) * height
+        trans_x = random.uniform(-self.max_translate_ratio, self.max_translate_ratio) * width
+        trans_y = random.uniform(-self.max_translate_ratio, self.max_translate_ratio) * height
         translate_matrix = self._get_translation_matrix(trans_x, trans_y)
 
-        warp_matrix = (
-            translate_matrix @ shear_matrix @ rotation_matrix @ scaling_matrix)
+        warp_matrix = translate_matrix @ shear_matrix @ rotation_matrix @ scaling_matrix
 
         img = cv2.warpPerspective(
-            img,
-            warp_matrix,
-            dsize=(width, height),
-            borderValue=self.border_val)
-        results['img'] = img
-        results['img_shape'] = img.shape
-
-        for key in results.get('bbox_fields', []):
+            img, warp_matrix, dsize=(width, height), borderValue=self.border_val
+        )
+        results["img"] = img
+        results["img_shape"] = img.shape
+
+        for key in results.get("bbox_fields", []):
             bboxes = results[key]
             num_bboxes = len(bboxes)
             if num_bboxes:
@@ -2618,32 +2786,26 @@ def __call__(self, results):
                 xs = warp_points[0].reshape(num_bboxes, 4)
                 ys = warp_points[1].reshape(num_bboxes, 4)
 
-                warp_bboxes = np.vstack(
-                    (xs.min(1), ys.min(1), xs.max(1), ys.max(1))).T
+                warp_bboxes = np.vstack((xs.min(1), ys.min(1), xs.max(1), ys.max(1))).T
 
                 if self.bbox_clip_border:
-                    warp_bboxes[:, [0, 2]] = \
-                        warp_bboxes[:, [0, 2]].clip(0, width)
-                    warp_bboxes[:, [1, 3]] = \
-                        warp_bboxes[:, [1, 3]].clip(0, height)
+                    warp_bboxes[:, [0, 2]] = warp_bboxes[:, [0, 2]].clip(0, width)
+                    warp_bboxes[:, [1, 3]] = warp_bboxes[:, [1, 3]].clip(0, height)
 
                 # remove outside bbox
                 valid_index = find_inside_bboxes(warp_bboxes, height, width)
                 if not self.skip_filter:
                     # filter bboxes
-                    filter_index = self.filter_gt_bboxes(
-                        bboxes * scaling_ratio, warp_bboxes)
+                    filter_index = self.filter_gt_bboxes(bboxes * scaling_ratio, warp_bboxes)
                     valid_index = valid_index & filter_index
 
                 results[key] = warp_bboxes[valid_index]
-                if key in ['gt_bboxes']:
-                    if 'gt_labels' in results:
-                        results['gt_labels'] = results['gt_labels'][
-                            valid_index]
-
-                if 'gt_masks' in results:
-                    raise NotImplementedError(
-                        'RandomAffine only supports bbox.')
+                if key in ["gt_bboxes"]:
+                    if "gt_labels" in results:
+                        results["gt_labels"] = results["gt_labels"][valid_index]
+
+                if "gt_masks" in results:
+                    raise NotImplementedError("RandomAffine only supports bbox.")
         return results
 
     def filter_gt_bboxes(self, origin_bboxes, wrapped_bboxes):
@@ -2651,66 +2813,67 @@ def filter_gt_bboxes(self, origin_bboxes, wrapped_bboxes):
         origin_h = origin_bboxes[:, 3] - origin_bboxes[:, 1]
         wrapped_w = wrapped_bboxes[:, 2] - wrapped_bboxes[:, 0]
         wrapped_h = wrapped_bboxes[:, 3] - wrapped_bboxes[:, 1]
-        aspect_ratio = np.maximum(wrapped_w / (wrapped_h + 1e-16),
-                                  wrapped_h / (wrapped_w + 1e-16))
+        aspect_ratio = np.maximum(wrapped_w / (wrapped_h + 1e-16), wrapped_h / (wrapped_w + 1e-16))
 
-        wh_valid_idx = (wrapped_w > self.min_bbox_size) & \
-                       (wrapped_h > self.min_bbox_size)
-        area_valid_idx = wrapped_w * wrapped_h / (origin_w * origin_h +
-                                                  1e-16) > self.min_area_ratio
+        wh_valid_idx = (wrapped_w > self.min_bbox_size) & (wrapped_h > self.min_bbox_size)
+        area_valid_idx = wrapped_w * wrapped_h / (origin_w * origin_h + 1e-16) > self.min_area_ratio
         aspect_ratio_valid_idx = aspect_ratio < self.max_aspect_ratio
         return wh_valid_idx & area_valid_idx & aspect_ratio_valid_idx
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(max_rotate_degree={self.max_rotate_degree}, '
-        repr_str += f'max_translate_ratio={self.max_translate_ratio}, '
-        repr_str += f'scaling_ratio={self.scaling_ratio_range}, '
-        repr_str += f'max_shear_degree={self.max_shear_degree}, '
-        repr_str += f'border={self.border}, '
-        repr_str += f'border_val={self.border_val}, '
-        repr_str += f'min_bbox_size={self.min_bbox_size}, '
-        repr_str += f'min_area_ratio={self.min_area_ratio}, '
-        repr_str += f'max_aspect_ratio={self.max_aspect_ratio}, '
-        repr_str += f'skip_filter={self.skip_filter})'
+        repr_str += f"(max_rotate_degree={self.max_rotate_degree}, "
+        repr_str += f"max_translate_ratio={self.max_translate_ratio}, "
+        repr_str += f"scaling_ratio={self.scaling_ratio_range}, "
+        repr_str += f"max_shear_degree={self.max_shear_degree}, "
+        repr_str += f"border={self.border}, "
+        repr_str += f"border_val={self.border_val}, "
+        repr_str += f"min_bbox_size={self.min_bbox_size}, "
+        repr_str += f"min_area_ratio={self.min_area_ratio}, "
+        repr_str += f"max_aspect_ratio={self.max_aspect_ratio}, "
+        repr_str += f"skip_filter={self.skip_filter})"
         return repr_str
 
     @staticmethod
     def _get_rotation_matrix(rotate_degrees):
         radian = math.radians(rotate_degrees)
         rotation_matrix = np.array(
-            [[np.cos(radian), -np.sin(radian), 0.],
-             [np.sin(radian), np.cos(radian), 0.], [0., 0., 1.]],
-            dtype=np.float32)
+            [
+                [np.cos(radian), -np.sin(radian), 0.0],
+                [np.sin(radian), np.cos(radian), 0.0],
+                [0.0, 0.0, 1.0],
+            ],
+            dtype=np.float32,
+        )
         return rotation_matrix
 
     @staticmethod
     def _get_scaling_matrix(scale_ratio):
         scaling_matrix = np.array(
-            [[scale_ratio, 0., 0.], [0., scale_ratio, 0.], [0., 0., 1.]],
-            dtype=np.float32)
+            [[scale_ratio, 0.0, 0.0], [0.0, scale_ratio, 0.0], [0.0, 0.0, 1.0]], dtype=np.float32
+        )
         return scaling_matrix
 
     @staticmethod
     def _get_share_matrix(scale_ratio):
         scaling_matrix = np.array(
-            [[scale_ratio, 0., 0.], [0., scale_ratio, 0.], [0., 0., 1.]],
-            dtype=np.float32)
+            [[scale_ratio, 0.0, 0.0], [0.0, scale_ratio, 0.0], [0.0, 0.0, 1.0]], dtype=np.float32
+        )
         return scaling_matrix
 
     @staticmethod
     def _get_shear_matrix(x_shear_degrees, y_shear_degrees):
         x_radian = math.radians(x_shear_degrees)
         y_radian = math.radians(y_shear_degrees)
-        shear_matrix = np.array([[1, np.tan(x_radian), 0.],
-                                 [np.tan(y_radian), 1, 0.], [0., 0., 1.]],
-                                dtype=np.float32)
+        shear_matrix = np.array(
+            [[1, np.tan(x_radian), 0.0], [np.tan(y_radian), 1, 0.0], [0.0, 0.0, 1.0]],
+            dtype=np.float32,
+        )
         return shear_matrix
 
     @staticmethod
     def _get_translation_matrix(x, y):
-        translation_matrix = np.array([[1, 0., x], [0., 1, y], [0., 0., 1.]],
-                                      dtype=np.float32)
+        translation_matrix = np.array([[1, 0.0, x], [0.0, 1, y], [0.0, 0.0, 1.0]], dtype=np.float32)
         return translation_matrix
 
 
@@ -2732,9 +2895,11 @@ def __init__(self, hue_delta=5, saturation_delta=30, value_delta=30):
         self.value_delta = value_delta
 
     def __call__(self, results):
-        img = results['img']
+        img = results["img"]
         hsv_gains = np.random.uniform(-1, 1, 3) * [
-            self.hue_delta, self.saturation_delta, self.value_delta
+            self.hue_delta,
+            self.saturation_delta,
+            self.value_delta,
         ]
         # random selection of h, s, v
         hsv_gains *= np.random.randint(0, 2, 3)
@@ -2747,14 +2912,14 @@ def __call__(self, results):
         img_hsv[..., 2] = np.clip(img_hsv[..., 2] + hsv_gains[2], 0, 255)
         cv2.cvtColor(img_hsv.astype(img.dtype), cv2.COLOR_HSV2BGR, dst=img)
 
-        results['img'] = img
+        results["img"] = img
         return results
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'(hue_delta={self.hue_delta}, '
-        repr_str += f'saturation_delta={self.saturation_delta}, '
-        repr_str += f'value_delta={self.value_delta})'
+        repr_str += f"(hue_delta={self.hue_delta}, "
+        repr_str += f"saturation_delta={self.saturation_delta}, "
+        repr_str += f"value_delta={self.value_delta})"
         return repr_str
 
 
@@ -2829,9 +2994,7 @@ def gen_masks_from_bboxes(self, bboxes, img_shape):
         xmax, ymax = bboxes[:, 2:3], bboxes[:, 3:4]
         gt_masks = np.zeros((len(bboxes), img_h, img_w), dtype=np.uint8)
         for i in range(len(bboxes)):
-            gt_masks[i,
-                     int(ymin[i]):int(ymax[i]),
-                     int(xmin[i]):int(xmax[i])] = 1
+            gt_masks[i, int(ymin[i]) : int(ymax[i]), int(xmin[i]) : int(xmax[i])] = 1
         return BitmapMasks(gt_masks, img_h, img_w)
 
     def get_gt_masks(self, results):
@@ -2844,11 +3007,10 @@ def get_gt_masks(self, results):
         Returns:
             BitmapMasks: gt_masks, originally or generated based on bboxes.
         """
-        if results.get('gt_masks', None) is not None:
-            return results['gt_masks']
+        if results.get("gt_masks", None) is not None:
+            return results["gt_masks"]
         else:
-            return self.gen_masks_from_bboxes(
-                results.get('gt_bboxes', []), results['img'].shape)
+            return self.gen_masks_from_bboxes(results.get("gt_bboxes", []), results["img"].shape)
 
     def __call__(self, results):
         """Call function to make a copy-paste of image.
@@ -2859,40 +3021,37 @@ def __call__(self, results):
             dict: Result dict with copy-paste transformed.
         """
 
-        assert 'mix_results' in results
-        num_images = len(results['mix_results'])
-        assert num_images == 1, \
-            f'CopyPaste only supports processing 2 images, got {num_images}'
+        assert "mix_results" in results
+        num_images = len(results["mix_results"])
+        assert num_images == 1, f"CopyPaste only supports processing 2 images, got {num_images}"
 
         # Get gt_masks originally or generated based on bboxes.
-        results['gt_masks'] = self.get_gt_masks(results)
+        results["gt_masks"] = self.get_gt_masks(results)
         # only one mix picture
-        results['mix_results'][0]['gt_masks'] = self.get_gt_masks(
-            results['mix_results'][0])
+        results["mix_results"][0]["gt_masks"] = self.get_gt_masks(results["mix_results"][0])
 
         if self.selected:
-            selected_results = self._select_object(results['mix_results'][0])
+            selected_results = self._select_object(results["mix_results"][0])
         else:
-            selected_results = results['mix_results'][0]
+            selected_results = results["mix_results"][0]
         return self._copy_paste(results, selected_results)
 
     def _select_object(self, results):
         """Select some objects from the source results."""
-        bboxes = results['gt_bboxes']
-        labels = results['gt_labels']
-        masks = results['gt_masks']
+        bboxes = results["gt_bboxes"]
+        labels = results["gt_labels"]
+        masks = results["gt_masks"]
         max_num_pasted = min(bboxes.shape[0] + 1, self.max_num_pasted)
         num_pasted = np.random.randint(0, max_num_pasted)
-        selected_inds = np.random.choice(
-            bboxes.shape[0], size=num_pasted, replace=False)
+        selected_inds = np.random.choice(bboxes.shape[0], size=num_pasted, replace=False)
 
         selected_bboxes = bboxes[selected_inds]
         selected_labels = labels[selected_inds]
         selected_masks = masks[selected_inds]
 
-        results['gt_bboxes'] = selected_bboxes
-        results['gt_labels'] = selected_labels
-        results['gt_masks'] = selected_masks
+        results["gt_bboxes"] = selected_bboxes
+        results["gt_labels"] = selected_labels
+        results["gt_masks"] = selected_masks
         return results
 
     def _copy_paste(self, dst_results, src_results):
@@ -2904,19 +3063,19 @@ def _copy_paste(self, dst_results, src_results):
         Returns:
             dict: Updated result dict.
         """
-        dst_img = dst_results['img']
-        dst_bboxes = dst_results['gt_bboxes']
-        dst_labels = dst_results['gt_labels']
-        dst_masks = dst_results['gt_masks']
+        dst_img = dst_results["img"]
+        dst_bboxes = dst_results["gt_bboxes"]
+        dst_labels = dst_results["gt_labels"]
+        dst_masks = dst_results["gt_masks"]
 
-        src_img = src_results['img']
-        src_bboxes = src_results['gt_bboxes']
-        src_labels = src_results['gt_labels']
-        src_masks = src_results['gt_masks']
+        src_img = src_results["img"]
+        src_bboxes = src_results["gt_bboxes"]
+        src_labels = src_results["gt_labels"]
+        src_masks = src_results["gt_masks"]
 
         if len(src_bboxes) == 0:
             if self.paste_by_box:
-                dst_results.pop('gt_masks')
+                dst_results.pop("gt_masks")
             return dst_results
 
         # update masks and generate bboxes from updated masks
@@ -2927,42 +3086,449 @@ def _copy_paste(self, dst_results, src_results):
 
         # filter totally occluded objects
         bboxes_inds = np.all(
-            np.abs(
-                (updated_dst_bboxes - dst_bboxes)) <= self.bbox_occluded_thr,
-            axis=-1)
-        masks_inds = updated_dst_masks.masks.sum(
-            axis=(1, 2)) > self.mask_occluded_thr
+            np.abs((updated_dst_bboxes - dst_bboxes)) <= self.bbox_occluded_thr, axis=-1
+        )
+        masks_inds = updated_dst_masks.masks.sum(axis=(1, 2)) > self.mask_occluded_thr
         valid_inds = bboxes_inds | masks_inds
 
         # Paste source objects to destination image directly
-        img = dst_img * (1 - composed_mask[..., np.newaxis]
-                         ) + src_img * composed_mask[..., np.newaxis]
+        img = (
+            dst_img * (1 - composed_mask[..., np.newaxis])
+            + src_img * composed_mask[..., np.newaxis]
+        )
         bboxes = np.concatenate([updated_dst_bboxes[valid_inds], src_bboxes])
         labels = np.concatenate([dst_labels[valid_inds], src_labels])
-        masks = np.concatenate(
-            [updated_dst_masks.masks[valid_inds], src_masks.masks])
+        masks = np.concatenate([updated_dst_masks.masks[valid_inds], src_masks.masks])
 
-        dst_results['img'] = img
-        dst_results['gt_bboxes'] = bboxes
-        dst_results['gt_labels'] = labels
+        dst_results["img"] = img
+        dst_results["gt_bboxes"] = bboxes
+        dst_results["gt_labels"] = labels
         if self.paste_by_box:
-            dst_results.pop('gt_masks')
+            dst_results.pop("gt_masks")
         else:
-            dst_results['gt_masks'] = BitmapMasks(masks, masks.shape[1],
-                                                  masks.shape[2])
+            dst_results["gt_masks"] = BitmapMasks(masks, masks.shape[1], masks.shape[2])
 
         return dst_results
 
     def get_updated_masks(self, masks, composed_mask):
-        assert masks.masks.shape[-2:] == composed_mask.shape[-2:], \
-            'Cannot compare two arrays of different size'
+        assert (
+            masks.masks.shape[-2:] == composed_mask.shape[-2:]
+        ), "Cannot compare two arrays of different size"
         masks.masks = np.where(composed_mask, 0, masks.masks)
         return masks
 
     def __repr__(self):
         repr_str = self.__class__.__name__
-        repr_str += f'max_num_pasted={self.max_num_pasted}, '
-        repr_str += f'bbox_occluded_thr={self.bbox_occluded_thr}, '
-        repr_str += f'mask_occluded_thr={self.mask_occluded_thr}, '
-        repr_str += f'selected={self.selected}, '
+        repr_str += f"max_num_pasted={self.max_num_pasted}, "
+        repr_str += f"bbox_occluded_thr={self.bbox_occluded_thr}, "
+        repr_str += f"mask_occluded_thr={self.mask_occluded_thr}, "
+        repr_str += f"selected={self.selected}, "
+        return repr_str
+
+
+@PIPELINES.register_module
+class Pointobb2RBBox(object):
+    """convert pointobb to corresponding regression-based obbs"""
+
+    def __init__(self, encoding_method="thetaobb"):
+        self.encoding_method = encoding_method
+
+    def _pointobb2bbox(self, pointobb):
+        """
+        docstring here
+            :param self:
+            :param pointobb: list, [x1, y1, x2, y2, x3, y3, x4, y4]
+            return [xmin, ymin, xmax, ymax]
+        """
+        xmin = min(pointobb[0::2])
+        ymin = min(pointobb[1::2])
+        xmax = max(pointobb[0::2])
+        ymax = max(pointobb[1::2])
+        bbox = [xmin, ymin, xmax, ymax]
+
+        return bbox
+
+    def _pointobb_best_point_sort(self, pointobb):
+        """
+        Find the "best" point and sort all points as the order that best point is first point
+            :param self: self
+            :param pointobb (list): unsorted pointobb, (1*8)
+        """
+        xmin, ymin, xmax, ymax = self._pointobb2bbox(pointobb)
+        reference_bbox = np.array([xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax])
+        normalize = np.array([1.0, 1.0] * 4)
+        combinate = [
+            np.roll(pointobb, 0),
+            np.roll(pointobb, 2),
+            np.roll(pointobb, 4),
+            np.roll(pointobb, 6),
+        ]
+        distances = np.array(
+            [np.sum(((coord - reference_bbox) / normalize) ** 2) for coord in combinate]
+        )
+        order = distances.argsort()
+        return combinate[order[0]].tolist()
+
+    def _pointobb2thetaobb(self, results):
+        """
+        convert pointobb to thetaobb
+            :param self:
+            :param pointobb: list, [x1, y1, x2, y2, x3, y3, x4, y4]
+        """
+        for key in results.get("rbbox_fields", []):
+            rbboxes = results[key]
+            thetaobbs = []
+            for pointobb in rbboxes.tolist():
+                pointobb = np.int0(np.array(pointobb))
+                pointobb.resize(4, 2)
+                rect = cv2.minAreaRect(pointobb)
+                x, y, w, h, theta = rect[0][0], rect[0][1], rect[1][0], rect[1][1], rect[2]
+                # theta = theta / 180.0 * np.pi
+                thetaobbs.append([x, y, w, h, theta])
+
+            results[key] = np.array(thetaobbs, dtype=np.float32)
+
+    def _pointobb2hobb(self, results):
+        """
+        convert pointobb to thetaobb
+            :param self:
+            :param pointobb: list, [x1, y1, x2, y2, x3, y3, x4, y4]
+        """
+        for key in results.get("rbbox_fields", []):
+            rbboxes = results[key]
+            hobbs = []
+            for pointobb in rbboxes.tolist():
+                sorted_pointobb = self._pointobb_best_point_sort(pointobb)
+                first_point = [sorted_pointobb[0], sorted_pointobb[1]]
+                second_point = [sorted_pointobb[2], sorted_pointobb[3]]
+
+                end_point = [sorted_pointobb[6], sorted_pointobb[7]]
+
+                h = np.sqrt(
+                    (end_point[0] - first_point[0]) ** 2 + (end_point[1] - first_point[1]) ** 2
+                )
+
+                hobbs.append(first_point + second_point + [h])
+
+            results[key] = np.array(hobbs, dtype=np.float32)
+
+    def __call__(self, results):
+        if self.encoding_method == "thetaobb":
+            self._pointobb2thetaobb(results)
+        if self.encoding_method == "hobb":
+            self._pointobb2hobb(results)
+        if self.encoding_method == "pointobb":
+            pass
+        return results
+
+    def __repr__(self):
+        repr_str = self.__class__.__name__
+        repr_str += ("(encoding_method={})").format(self.encoding_method)
         return repr_str
+
+
+@PIPELINES.register_module
+class RandomRotate(object):
+    """Rotate the image & bbox & mask.
+
+    If the input dict contains the key "rotate", then the flag will be used,
+    otherwise it will be randomly decided by a ratio specified in the init
+    method.
+
+    Args:
+        rotate_ratio (float, optional): The rotation probability.
+        choice (float, optional): The rotation angle choice.
+    """
+
+    def __init__(self, rotate_ratio=None, choice=(0, 90, 180, 270)):
+        self.rotate_ratio = rotate_ratio
+        if isinstance(choice, (list, tuple)):
+            self.choice = choice
+        elif isinstance(choice, str):
+            self.choice = list(range(0, 359, 1))
+        else:
+            raise NotImplementedError
+
+        if rotate_ratio is not None:
+            assert rotate_ratio >= 0 and rotate_ratio <= 1
+        assert isinstance(self.choice, (list, tuple))
+
+    def get_corners(self, bboxes):
+        """Get corners of bounding boxes
+
+        Parameters
+        ----------
+
+        bboxes: numpy.ndarray
+            Numpy array containing bounding boxes of shape `N X 4` where N is the
+            number of bounding boxes and the bounding boxes are represented in the
+            format `x1 y1 x2 y2`
+
+        returns
+        -------
+
+        numpy.ndarray
+            Numpy array of shape `N x 8` containing N bounding boxes each described by their
+            corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`
+
+        """
+        width = (bboxes[:, 2] - bboxes[:, 0]).reshape(-1, 1)
+        height = (bboxes[:, 3] - bboxes[:, 1]).reshape(-1, 1)
+
+        x1 = bboxes[:, 0].reshape(-1, 1)
+        y1 = bboxes[:, 1].reshape(-1, 1)
+
+        x2 = x1 + width
+        y2 = y1
+
+        x3 = x1
+        y3 = y1 + height
+
+        x4 = bboxes[:, 2].reshape(-1, 1)
+        y4 = bboxes[:, 3].reshape(-1, 1)
+
+        corners = np.hstack((x1, y1, x2, y2, x3, y3, x4, y4))
+
+        return corners
+
+    def offset_coordinate_transform(self, offset, transform_flag="xy2la"):
+        """transform the coordinate of offsets
+
+        Args:
+            offset (list): list of offset
+            transform_flag (str, optional): flag of transform. Defaults to 'xy2la'.
+
+        Raises:
+            NotImplementedError: [description]
+
+        Returns:
+            list: transformed offsets
+        """
+        if transform_flag == "xy2la":
+            offset_x, offset_y = offset
+            length = math.sqrt(offset_x**2 + offset_y**2)
+            angle = math.atan2(offset_y, offset_x)
+            offset = [length, angle]
+        elif transform_flag == "la2xy":
+            length, angle = offset
+            offset_x = length * np.cos(angle)
+            offset_y = length * np.sin(angle)
+            offset = [offset_x, offset_y]
+        else:
+            raise NotImplementedError
+
+        return offset
+
+    def offset_field_coordinate_transform(self, offset_field, transform_flag="xy2la"):
+        """transform the coordinate of offsets
+
+        Args:
+            offset (list): list of offset
+            transform_flag (str, optional): flag of transform. Defaults to 'xy2la'.
+
+        Raises:
+            NotImplementedError: [description]
+
+        Returns:
+            list: transformed offsets
+        """
+        offset_field_transformed = np.zeros((offset_field.shape[0], offset_field.shape[1], 2))
+        if transform_flag == "xy2la":
+            offset_x, offset_y = offset_field[..., 0], offset_field[..., 1]
+            length = np.sqrt(offset_x**2 + offset_y**2)
+            angle = np.arctan2(offset_y, offset_x)
+            offset_field_transformed[:, :, 0], offset_field_transformed[:, :, 1] = length, angle
+        elif transform_flag == "la2xy":
+            length, angle = offset_field[..., 0], offset_field[..., 1]
+            offset_x = length * np.cos(angle)
+            offset_y = length * np.sin(angle)
+            offset_field_transformed[:, :, 0], offset_field_transformed[:, :, 1] = (
+                offset_x,
+                offset_y,
+            )
+        else:
+            raise NotImplementedError
+
+        return offset_field_transformed
+
+    def offset_rotate(self, offsets, img_shape, rotate_angle):
+        offsets = [
+            self.offset_coordinate_transform(offset, transform_flag="xy2la") for offset in offsets
+        ]
+
+        offsets = [[offset[0], offset[1] + rotate_angle * math.pi / 180.0] for offset in offsets]
+
+        offsets = [
+            self.offset_coordinate_transform(offset, transform_flag="la2xy") for offset in offsets
+        ]
+
+        return np.array(offsets, dtype=np.float32)
+
+    def offset_field_rotate(self, offset_field, img_shape, rotate_angle):
+        offset_field = self.offset_field_coordinate_transform(offset_field, transform_flag="xy2la")
+
+        offset_field[..., 1] = offset_field[..., 1] + rotate_angle * math.pi / 180.0
+
+        offset_field = self.offset_field_coordinate_transform(offset_field, transform_flag="la2xy")
+
+        return offset_field.astype(np.float32)
+
+    def bbox_rotate(self, bboxes, img_shape, rotate_angle):
+        """rotate bboxes.
+
+        Args:
+            bboxes(ndarray): shape (..., 4*k)
+            img_shape(tuple): (height, width)
+        """
+        assert bboxes.shape[-1] % 4 == 0
+        if bboxes.shape[0] == 0:
+            return bboxes
+        corners = self.get_corners(bboxes)
+        corners = np.hstack((corners, bboxes[:, 4:]))
+
+        corners = corners.reshape(-1, 2)
+        corners = np.hstack((corners, np.ones((corners.shape[0], 1), dtype=type(corners[0][0]))))
+        angle = rotate_angle
+        h, w, _ = img_shape
+        cx, cy = w / 2, h / 2
+        M = cv2.getRotationMatrix2D((cx, cy), -angle, 1.0)
+        cos = np.abs(M[0, 0])
+        sin = np.abs(M[0, 1])
+        nW = int((h * sin) + (w * cos))
+        nH = int((h * cos) + (w * sin))
+        M[0, 2] += (nW / 2) - cx
+        M[1, 2] += (nH / 2) - cy
+
+        calculated = np.dot(M, corners.T).T
+        calculated = np.array(calculated, dtype=np.float32)
+        calculated = calculated.reshape(-1, 8)
+
+        x_ = calculated[:, [0, 2, 4, 6]]
+        y_ = calculated[:, [1, 3, 5, 7]]
+
+        xmin = np.min(x_, 1).reshape(-1, 1)
+        ymin = np.min(y_, 1).reshape(-1, 1)
+        xmax = np.max(x_, 1).reshape(-1, 1)
+        ymax = np.max(y_, 1).reshape(-1, 1)
+
+        rotated = np.hstack((xmin, ymin, xmax, ymax))
+        return rotated
+
+    def __call__(self, results):
+        if "rotate" not in results:
+            rotate = True if np.random.rand() < self.rotate_ratio else False
+            results["rotate"] = rotate
+        if "rotate_angle" not in results:
+            if results["rotate"]:
+                results["rotate_angle"] = np.random.choice(self.choice)
+            else:
+                results["rotate_angle"] = 0
+        if results["rotate"]:
+            # rotate image
+            results["img"] = mmcv.imrotate(
+                results["img"], results["rotate_angle"], auto_bound=False
+            )
+            # rotate bboxes
+            for key in results.get("bbox_fields", []):
+                results[key] = self.bbox_rotate(
+                    results[key], results["img_shape"], results["rotate_angle"]
+                )
+            # rotate masks
+            for key in results.get("mask_fields", []):
+                masks = [
+                    mmcv.imrotate(mask, results["rotate_angle"], auto_bound=False)
+                    for mask in results[key]
+                ]
+                if masks:
+                    rotated_masks = np.stack(masks)
+                else:
+                    rotated_masks = np.empty((0,) + results["img_shape"], dtype=np.uint8)
+
+                results[key] = BitmapMasks(
+                    rotated_masks, results["img_shape"][0], results["img_shape"][1]
+                )
+
+            # rotate edge maps
+            for key in results.get("edge_fields", []):
+                masks = [
+                    mmcv.imrotate(mask, results["rotate_angle"], auto_bound=False)
+                    for mask in results[key]
+                ]
+                if masks:
+                    rotated_masks = np.stack(masks)
+                else:
+                    rotated_masks = np.empty((0,) + results["img_shape"], dtype=np.uint8)
+
+                results[key] = BitmapMasks(
+                    rotated_masks, results["img_shape"][0], results["img_shape"][1]
+                )
+
+            # rotate edge maps
+            for key in results.get("side_face_fields", []):
+                masks = [
+                    mmcv.imrotate(mask, results["rotate_angle"], auto_bound=False)
+                    for mask in results[key]
+                ]
+                if masks:
+                    rotated_masks = np.stack(masks)
+                else:
+                    rotated_masks = np.empty((0,) + results["img_shape"], dtype=np.uint8)
+
+                results[key] = BitmapMasks(
+                    rotated_masks, results["img_shape"][0], results["img_shape"][1]
+                )
+
+            # rotate segs
+            for key in results.get("seg_fields", []):
+                results[key] = mmcv.imrotate(
+                    results[key], results["rotate_angle"], auto_bound=False
+                )
+
+            # rotate offset fields
+            for key in results.get("offset_field_fields", []):
+                results[key] = self.offset_field_rotate(
+                    results[key], results["img_shape"], results["rotate_angle"]
+                )
+
+            # rotate offsets
+            for key in results.get("offset_fields", []):
+                results[key] = self.offset_rotate(
+                    results[key], results["img_shape"], results["rotate_angle"]
+                )
+        return results
+
+    def __repr__(self):
+        return self.__class__.__name__ + "(rotate_ratio={})".format(self.rotate_ratio)
+
+
+@PIPELINES.register_module
+class OffsetTransform(object):
+    """offset transformer"""
+
+    def __init__(self, offset_coordinate="rectangle"):
+        self.offset_coordinate = offset_coordinate
+
+    def transform_offset(self, offsets, offset_coordinate="rectangle"):
+        if offset_coordinate == "rectangle":
+            transformed = offsets
+        elif offset_coordinate == "polar":
+            offset_x, offset_y = offsets[:, 0], offsets[:, 1]
+            length = np.sqrt(offset_x**2 + offset_y**2)
+            angle = np.arctan2(offset_y, offset_x)
+
+            transformed = np.stack((length, angle), axis=-1)
+        else:
+            raise NotImplementedError
+
+        return transformed
+
+    def __call__(self, results):
+        if "offset_coordinate" not in results:
+            results["offset_coordinate"] = self.offset_coordinate
+        # transform offsets
+        for key in results.get("offset_fields", []):
+            results[key] = self.transform_offset(results[key], results["offset_coordinate"])
+        return results
+
+    def __repr__(self):
+        return self.__class__.__name__ + "(offset_coordinate={})".format(self.offset_coordinate)
diff --git a/mmdet/datasets/utils.py b/mmdet/datasets/utils.py
index 26e922d2..feabf651 100644
--- a/mmdet/datasets/utils.py
+++ b/mmdet/datasets/utils.py
@@ -4,10 +4,8 @@
 
 from mmcv.cnn import VGG
 from mmcv.runner.hooks import HOOKS, Hook
-
 from mmdet.datasets.builder import PIPELINES
-from mmdet.datasets.pipelines import (LoadAnnotations, LoadImageFromFile,
-                                      LoadPanopticAnnotations)
+from mmdet.datasets.pipelines import LoadAnnotations, LoadImageFromFile, LoadPanopticAnnotations
 from mmdet.models.dense_heads import GARPNHead, RPNHead
 from mmdet.models.roi_heads.mask_heads import FusedSemanticHead
 
diff --git a/mmdet/models/builder.py b/mmdet/models/builder.py
index ace6209f..b28a0758 100644
--- a/mmdet/models/builder.py
+++ b/mmdet/models/builder.py
@@ -4,7 +4,7 @@
 from mmcv.cnn import MODELS as MMCV_MODELS
 from mmcv.utils import Registry
 
-MODELS = Registry('models', parent=MMCV_MODELS)
+MODELS = Registry("models", parent=MMCV_MODELS)
 
 BACKBONES = MODELS
 NECKS = MODELS
@@ -47,13 +47,14 @@ def build_loss(cfg):
 
 def build_detector(cfg, train_cfg=None, test_cfg=None):
     """Build detector."""
-    if train_cfg is not None or test_cfg is not None:
-        warnings.warn(
-            'train_cfg and test_cfg is deprecated, '
-            'please specify them in model', UserWarning)
-    assert cfg.get('train_cfg') is None or train_cfg is None, \
-        'train_cfg specified in both outer field and model field '
-    assert cfg.get('test_cfg') is None or test_cfg is None, \
-        'test_cfg specified in both outer field and model field '
-    return DETECTORS.build(
-        cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
+    # if train_cfg is not None or test_cfg is not None:
+    #     warnings.warn(
+    #         'train_cfg and test_cfg is deprecated, '
+    #         'please specify them in model', UserWarning)
+    assert (
+        cfg.get("train_cfg") is None or train_cfg is None
+    ), "train_cfg specified in both outer field and model field "
+    assert (
+        cfg.get("test_cfg") is None or test_cfg is None
+    ), "test_cfg specified in both outer field and model field "
+    return DETECTORS.build(cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
diff --git a/mmdet/models/dense_heads/__init__.py b/mmdet/models/dense_heads/__init__.py
index 9c60ae14..9c5faf9c 100644
--- a/mmdet/models/dense_heads/__init__.py
+++ b/mmdet/models/dense_heads/__init__.py
@@ -35,6 +35,7 @@
 from .retina_sepbn_head import RetinaSepBNHead
 from .rpn_head import RPNHead
 from .sabl_retina_head import SABLRetinaHead
+from .semi_rpn_head import SemiRPNHead
 from .solo_head import DecoupledSOLOHead, DecoupledSOLOLightHead, SOLOHead
 from .solov2_head import SOLOV2Head
 from .ssd_head import SSDHead
@@ -46,17 +47,56 @@
 from .yolox_head import YOLOXHead
 
 __all__ = [
-    'AnchorFreeHead', 'AnchorHead', 'GuidedAnchorHead', 'FeatureAdaption',
-    'RPNHead', 'GARPNHead', 'RetinaHead', 'RetinaSepBNHead', 'GARetinaHead',
-    'SSDHead', 'FCOSHead', 'RepPointsHead', 'FoveaHead',
-    'FreeAnchorRetinaHead', 'ATSSHead', 'FSAFHead', 'NASFCOSHead',
-    'PISARetinaHead', 'PISASSDHead', 'GFLHead', 'CornerHead', 'YOLACTHead',
-    'YOLACTSegmHead', 'YOLACTProtonet', 'YOLOV3Head', 'PAAHead',
-    'SABLRetinaHead', 'CentripetalHead', 'VFNetHead', 'StageCascadeRPNHead',
-    'CascadeRPNHead', 'EmbeddingRPNHead', 'LDHead', 'AutoAssignHead',
-    'DETRHead', 'YOLOFHead', 'DeformableDETRHead', 'SOLOHead',
-    'DecoupledSOLOHead', 'CenterNetHead', 'YOLOXHead',
-    'DecoupledSOLOLightHead', 'LADHead', 'TOODHead', 'MaskFormerHead',
-    'Mask2FormerHead', 'SOLOV2Head', 'DDODHead', 'AscendAnchorHead',
-    'AscendRetinaHead', 'AscendSSDHead'
+    "AnchorFreeHead",
+    "AnchorHead",
+    "GuidedAnchorHead",
+    "FeatureAdaption",
+    "RPNHead",
+    "GARPNHead",
+    "RetinaHead",
+    "RetinaSepBNHead",
+    "GARetinaHead",
+    "SSDHead",
+    "FCOSHead",
+    "RepPointsHead",
+    "FoveaHead",
+    "FreeAnchorRetinaHead",
+    "ATSSHead",
+    "FSAFHead",
+    "NASFCOSHead",
+    "PISARetinaHead",
+    "PISASSDHead",
+    "GFLHead",
+    "CornerHead",
+    "YOLACTHead",
+    "YOLACTSegmHead",
+    "YOLACTProtonet",
+    "YOLOV3Head",
+    "PAAHead",
+    "SABLRetinaHead",
+    "CentripetalHead",
+    "VFNetHead",
+    "StageCascadeRPNHead",
+    "CascadeRPNHead",
+    "EmbeddingRPNHead",
+    "LDHead",
+    "AutoAssignHead",
+    "DETRHead",
+    "YOLOFHead",
+    "DeformableDETRHead",
+    "SOLOHead",
+    "DecoupledSOLOHead",
+    "CenterNetHead",
+    "YOLOXHead",
+    "DecoupledSOLOLightHead",
+    "LADHead",
+    "TOODHead",
+    "MaskFormerHead",
+    "Mask2FormerHead",
+    "SOLOV2Head",
+    "DDODHead",
+    "AscendAnchorHead",
+    "AscendRetinaHead",
+    "AscendSSDHead",
+    "SemiRPNHead",
 ]
diff --git a/mmdet/models/dense_heads/anchor_head.py b/mmdet/models/dense_heads/anchor_head.py
index d1bfab62..7a1caf31 100644
--- a/mmdet/models/dense_heads/anchor_head.py
+++ b/mmdet/models/dense_heads/anchor_head.py
@@ -1,13 +1,21 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 import warnings
 
+import numpy as np
 import torch
 import torch.nn as nn
 from mmcv.runner import force_fp32
 
-from mmdet.core import (anchor_inside_flags, build_assigner, build_bbox_coder,
-                        build_prior_generator, build_sampler, images_to_levels,
-                        multi_apply, unmap)
+from mmdet.core import (
+    anchor_inside_flags,
+    build_assigner,
+    build_bbox_coder,
+    build_prior_generator,
+    build_sampler,
+    images_to_levels,
+    multi_apply,
+    unmap,
+)
 from ..builder import HEADS, build_loss
 from .base_dense_head import BaseDenseHead
 from .dense_test_mixins import BBoxTestMixin
@@ -36,42 +44,42 @@ class AnchorHead(BaseDenseHead, BBoxTestMixin):
         init_cfg (dict or list[dict], optional): Initialization config dict.
     """  # noqa: W605
 
-    def __init__(self,
-                 num_classes,
-                 in_channels,
-                 feat_channels=256,
-                 anchor_generator=dict(
-                     type='AnchorGenerator',
-                     scales=[8, 16, 32],
-                     ratios=[0.5, 1.0, 2.0],
-                     strides=[4, 8, 16, 32, 64]),
-                 bbox_coder=dict(
-                     type='DeltaXYWHBBoxCoder',
-                     clip_border=True,
-                     target_means=(.0, .0, .0, .0),
-                     target_stds=(1.0, 1.0, 1.0, 1.0)),
-                 reg_decoded_bbox=False,
-                 loss_cls=dict(
-                     type='CrossEntropyLoss',
-                     use_sigmoid=True,
-                     loss_weight=1.0),
-                 loss_bbox=dict(
-                     type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
-                 train_cfg=None,
-                 test_cfg=None,
-                 init_cfg=dict(type='Normal', layer='Conv2d', std=0.01)):
+    def __init__(
+        self,
+        num_classes,
+        in_channels,
+        feat_channels=256,
+        anchor_generator=dict(
+            type="AnchorGenerator",
+            scales=[8, 16, 32],
+            ratios=[0.5, 1.0, 2.0],
+            strides=[4, 8, 16, 32, 64],
+        ),
+        bbox_coder=dict(
+            type="DeltaXYWHBBoxCoder",
+            clip_border=True,
+            target_means=(0.0, 0.0, 0.0, 0.0),
+            target_stds=(1.0, 1.0, 1.0, 1.0),
+        ),
+        reg_decoded_bbox=False,
+        loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0),
+        loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0),
+        train_cfg=None,
+        test_cfg=None,
+        init_cfg=dict(type="Normal", layer="Conv2d", std=0.01),
+    ):
         super(AnchorHead, self).__init__(init_cfg)
         self.in_channels = in_channels
         self.num_classes = num_classes
         self.feat_channels = feat_channels
-        self.use_sigmoid_cls = loss_cls.get('use_sigmoid', False)
+        self.use_sigmoid_cls = loss_cls.get("use_sigmoid", False)
         if self.use_sigmoid_cls:
             self.cls_out_channels = num_classes
         else:
             self.cls_out_channels = num_classes + 1
 
         if self.cls_out_channels <= 0:
-            raise ValueError(f'num_classes={num_classes} is too small')
+            raise ValueError(f"num_classes={num_classes} is too small")
         self.reg_decoded_bbox = reg_decoded_bbox
 
         self.bbox_coder = build_bbox_coder(bbox_coder)
@@ -81,25 +89,25 @@ def __init__(self,
         self.test_cfg = test_cfg
         if self.train_cfg:
             self.assigner = build_assigner(self.train_cfg.assigner)
-            if hasattr(self.train_cfg,
-                       'sampler') and self.train_cfg.sampler.type.split(
-                           '.')[-1] != 'PseudoSampler':
+            if (
+                hasattr(self.train_cfg, "sampler")
+                and self.train_cfg.sampler.type.split(".")[-1] != "PseudoSampler"
+            ):
                 self.sampling = True
                 sampler_cfg = self.train_cfg.sampler
                 # avoid BC-breaking
-                if loss_cls['type'] in [
-                        'FocalLoss', 'GHMC', 'QualityFocalLoss'
-                ]:
+                if loss_cls["type"] in ["FocalLoss", "GHMC", "QualityFocalLoss"]:
                     warnings.warn(
-                        'DeprecationWarning: Determining whether to sampling'
-                        'by loss type is deprecated, please delete sampler in'
-                        'your config when using `FocalLoss`, `GHMC`, '
-                        '`QualityFocalLoss` or other FocalLoss variant.')
+                        "DeprecationWarning: Determining whether to sampling"
+                        "by loss type is deprecated, please delete sampler in"
+                        "your config when using `FocalLoss`, `GHMC`, "
+                        "`QualityFocalLoss` or other FocalLoss variant."
+                    )
                     self.sampling = False
-                    sampler_cfg = dict(type='PseudoSampler')
+                    sampler_cfg = dict(type="PseudoSampler")
             else:
                 self.sampling = False
-                sampler_cfg = dict(type='PseudoSampler')
+                sampler_cfg = dict(type="PseudoSampler")
             self.sampler = build_sampler(sampler_cfg, context=self)
         self.fp16_enabled = False
 
@@ -113,24 +121,25 @@ def __init__(self,
 
     @property
     def num_anchors(self):
-        warnings.warn('DeprecationWarning: `num_anchors` is deprecated, '
-                      'for consistency or also use '
-                      '`num_base_priors` instead')
+        warnings.warn(
+            "DeprecationWarning: `num_anchors` is deprecated, "
+            "for consistency or also use "
+            "`num_base_priors` instead"
+        )
         return self.prior_generator.num_base_priors[0]
 
     @property
     def anchor_generator(self):
-        warnings.warn('DeprecationWarning: anchor_generator is deprecated, '
-                      'please use "prior_generator" instead')
+        warnings.warn(
+            "DeprecationWarning: anchor_generator is deprecated, "
+            'please use "prior_generator" instead'
+        )
         return self.prior_generator
 
     def _init_layers(self):
         """Initialize layers of the head."""
-        self.conv_cls = nn.Conv2d(self.in_channels,
-                                  self.num_base_priors * self.cls_out_channels,
-                                  1)
-        self.conv_reg = nn.Conv2d(self.in_channels, self.num_base_priors * 4,
-                                  1)
+        self.conv_cls = nn.Conv2d(self.in_channels, self.num_base_priors * self.cls_out_channels, 1)
+        self.conv_reg = nn.Conv2d(self.in_channels, self.num_base_priors * 4, 1)
 
     def forward_single(self, x):
         """Forward feature of a single scale level.
@@ -168,7 +177,7 @@ def forward(self, feats):
         """
         return multi_apply(self.forward_single, feats)
 
-    def get_anchors(self, featmap_sizes, img_metas, device='cuda'):
+    def get_anchors(self, featmap_sizes, img_metas, device="cuda"):
         """Get anchors according to feature map sizes.
 
         Args:
@@ -185,28 +194,30 @@ def get_anchors(self, featmap_sizes, img_metas, device='cuda'):
 
         # since feature map sizes of all images are the same, we only compute
         # anchors for one time
-        multi_level_anchors = self.prior_generator.grid_priors(
-            featmap_sizes, device=device)
+        multi_level_anchors = self.prior_generator.grid_priors(featmap_sizes, device=device)
         anchor_list = [multi_level_anchors for _ in range(num_imgs)]
 
         # for each image, we compute valid flags of multi level anchors
         valid_flag_list = []
         for img_id, img_meta in enumerate(img_metas):
             multi_level_flags = self.prior_generator.valid_flags(
-                featmap_sizes, img_meta['pad_shape'], device)
+                featmap_sizes, img_meta["pad_shape"], device
+            )
             valid_flag_list.append(multi_level_flags)
 
         return anchor_list, valid_flag_list
 
-    def _get_targets_single(self,
-                            flat_anchors,
-                            valid_flags,
-                            gt_bboxes,
-                            gt_bboxes_ignore,
-                            gt_labels,
-                            img_meta,
-                            label_channels=1,
-                            unmap_outputs=True):
+    def _get_targets_single(
+        self,
+        flat_anchors,
+        valid_flags,
+        gt_bboxes,
+        gt_bboxes_ignore,
+        gt_labels,
+        img_meta,
+        label_channels=1,
+        unmap_outputs=True,
+    ):
         """Compute regression and classification targets for anchors in a
         single image.
 
@@ -236,26 +247,24 @@ def _get_targets_single(self,
                 num_total_pos (int): Number of positive samples in all images
                 num_total_neg (int): Number of negative samples in all images
         """
-        inside_flags = anchor_inside_flags(flat_anchors, valid_flags,
-                                           img_meta['img_shape'][:2],
-                                           self.train_cfg.allowed_border)
+        inside_flags = anchor_inside_flags(
+            flat_anchors, valid_flags, img_meta["img_shape"][:2], self.train_cfg.allowed_border
+        )
         if not inside_flags.any():
-            return (None, ) * 7
+            return (None,) * 7
+
         # assign gt and sample anchors
         anchors = flat_anchors[inside_flags, :]
 
         assign_result = self.assigner.assign(
-            anchors, gt_bboxes, gt_bboxes_ignore,
-            None if self.sampling else gt_labels)
-        sampling_result = self.sampler.sample(assign_result, anchors,
-                                              gt_bboxes)
+            anchors, gt_bboxes, gt_bboxes_ignore, None if self.sampling else gt_labels
+        )
+        sampling_result = self.sampler.sample(assign_result, anchors, gt_bboxes)
 
         num_valid_anchors = anchors.shape[0]
         bbox_targets = torch.zeros_like(anchors)
         bbox_weights = torch.zeros_like(anchors)
-        labels = anchors.new_full((num_valid_anchors, ),
-                                  self.num_classes,
-                                  dtype=torch.long)
+        labels = anchors.new_full((num_valid_anchors,), self.num_classes, dtype=torch.long)
         label_weights = anchors.new_zeros(num_valid_anchors, dtype=torch.float)
 
         pos_inds = sampling_result.pos_inds
@@ -263,7 +272,8 @@ def _get_targets_single(self,
         if len(pos_inds) > 0:
             if not self.reg_decoded_bbox:
                 pos_bbox_targets = self.bbox_coder.encode(
-                    sampling_result.pos_bboxes, sampling_result.pos_gt_bboxes)
+                    sampling_result.pos_bboxes, sampling_result.pos_gt_bboxes
+                )
             else:
                 pos_bbox_targets = sampling_result.pos_gt_bboxes
             bbox_targets[pos_inds, :] = pos_bbox_targets
@@ -273,8 +283,7 @@ def _get_targets_single(self,
                 # Foreground is the first class since v2.5.0
                 labels[pos_inds] = 0
             else:
-                labels[pos_inds] = gt_labels[
-                    sampling_result.pos_assigned_gt_inds]
+                labels[pos_inds] = gt_labels[sampling_result.pos_assigned_gt_inds]
             if self.train_cfg.pos_weight <= 0:
                 label_weights[pos_inds] = 1.0
             else:
@@ -286,26 +295,34 @@ def _get_targets_single(self,
         if unmap_outputs:
             num_total_anchors = flat_anchors.size(0)
             labels = unmap(
-                labels, num_total_anchors, inside_flags,
-                fill=self.num_classes)  # fill bg label
-            label_weights = unmap(label_weights, num_total_anchors,
-                                  inside_flags)
+                labels, num_total_anchors, inside_flags, fill=self.num_classes
+            )  # fill bg label
+            label_weights = unmap(label_weights, num_total_anchors, inside_flags)
             bbox_targets = unmap(bbox_targets, num_total_anchors, inside_flags)
             bbox_weights = unmap(bbox_weights, num_total_anchors, inside_flags)
 
-        return (labels, label_weights, bbox_targets, bbox_weights, pos_inds,
-                neg_inds, sampling_result)
-
-    def get_targets(self,
-                    anchor_list,
-                    valid_flag_list,
-                    gt_bboxes_list,
-                    img_metas,
-                    gt_bboxes_ignore_list=None,
-                    gt_labels_list=None,
-                    label_channels=1,
-                    unmap_outputs=True,
-                    return_sampling_results=False):
+        return (
+            labels,
+            label_weights,
+            bbox_targets,
+            bbox_weights,
+            pos_inds,
+            neg_inds,
+            sampling_result,
+        )
+
+    def get_targets(
+        self,
+        anchor_list,
+        valid_flag_list,
+        gt_bboxes_list,
+        img_metas,
+        gt_bboxes_ignore_list=None,
+        gt_labels_list=None,
+        label_channels=1,
+        unmap_outputs=True,
+        return_sampling_results=False,
+    ):
         """Compute regression and classification targets for anchors in
         multiple images.
 
@@ -372,9 +389,17 @@ def get_targets(self,
             gt_labels_list,
             img_metas,
             label_channels=label_channels,
-            unmap_outputs=unmap_outputs)
-        (all_labels, all_label_weights, all_bbox_targets, all_bbox_weights,
-         pos_inds_list, neg_inds_list, sampling_results_list) = results[:7]
+            unmap_outputs=unmap_outputs,
+        )
+        (
+            all_labels,
+            all_label_weights,
+            all_bbox_targets,
+            all_bbox_weights,
+            pos_inds_list,
+            neg_inds_list,
+            sampling_results_list,
+        ) = results[:7]
         rest_results = list(results[7:])  # user-added return values
         # no valid anchors
         if any([labels is None for labels in all_labels]):
@@ -384,23 +409,35 @@ def get_targets(self,
         num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list])
         # split targets to a list w.r.t. multiple levels
         labels_list = images_to_levels(all_labels, num_level_anchors)
-        label_weights_list = images_to_levels(all_label_weights,
-                                              num_level_anchors)
-        bbox_targets_list = images_to_levels(all_bbox_targets,
-                                             num_level_anchors)
-        bbox_weights_list = images_to_levels(all_bbox_weights,
-                                             num_level_anchors)
-        res = (labels_list, label_weights_list, bbox_targets_list,
-               bbox_weights_list, num_total_pos, num_total_neg)
+        label_weights_list = images_to_levels(all_label_weights, num_level_anchors)
+        bbox_targets_list = images_to_levels(all_bbox_targets, num_level_anchors)
+        bbox_weights_list = images_to_levels(all_bbox_weights, num_level_anchors)
+        res = (
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_pos,
+            num_total_neg,
+        )
         if return_sampling_results:
-            res = res + (sampling_results_list, )
+            res = res + (sampling_results_list,)
         for i, r in enumerate(rest_results):  # user-added return values
             rest_results[i] = images_to_levels(r, num_level_anchors)
 
         return res + tuple(rest_results)
 
-    def loss_single(self, cls_score, bbox_pred, anchors, labels, label_weights,
-                    bbox_targets, bbox_weights, num_total_samples):
+    def loss_single(
+        self,
+        cls_score,
+        bbox_pred,
+        anchors,
+        labels,
+        label_weights,
+        bbox_targets,
+        bbox_weights,
+        num_total_samples,
+    ):
         """Compute loss of a single scale level.
 
         Args:
@@ -428,10 +465,8 @@ def loss_single(self, cls_score, bbox_pred, anchors, labels, label_weights,
         # classification loss
         labels = labels.reshape(-1)
         label_weights = label_weights.reshape(-1)
-        cls_score = cls_score.permute(0, 2, 3,
-                                      1).reshape(-1, self.cls_out_channels)
-        loss_cls = self.loss_cls(
-            cls_score, labels, label_weights, avg_factor=num_total_samples)
+        cls_score = cls_score.permute(0, 2, 3, 1).reshape(-1, self.cls_out_channels)
+        loss_cls = self.loss_cls(cls_score, labels, label_weights, avg_factor=num_total_samples)
         # regression loss
         bbox_targets = bbox_targets.reshape(-1, 4)
         bbox_weights = bbox_weights.reshape(-1, 4)
@@ -443,20 +478,12 @@ def loss_single(self, cls_score, bbox_pred, anchors, labels, label_weights,
             anchors = anchors.reshape(-1, 4)
             bbox_pred = self.bbox_coder.decode(anchors, bbox_pred)
         loss_bbox = self.loss_bbox(
-            bbox_pred,
-            bbox_targets,
-            bbox_weights,
-            avg_factor=num_total_samples)
+            bbox_pred, bbox_targets, bbox_weights, avg_factor=num_total_samples
+        )
         return loss_cls, loss_bbox
 
-    @force_fp32(apply_to=('cls_scores', 'bbox_preds'))
-    def loss(self,
-             cls_scores,
-             bbox_preds,
-             gt_bboxes,
-             gt_labels,
-             img_metas,
-             gt_bboxes_ignore=None):
+    @force_fp32(apply_to=("cls_scores", "bbox_preds"))
+    def loss(self, cls_scores, bbox_preds, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None):
         """Compute losses of the head.
 
         Args:
@@ -480,8 +507,7 @@ def loss(self,
 
         device = cls_scores[0].device
 
-        anchor_list, valid_flag_list = self.get_anchors(
-            featmap_sizes, img_metas, device=device)
+        anchor_list, valid_flag_list = self.get_anchors(featmap_sizes, img_metas, device=device)
         label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
         cls_reg_targets = self.get_targets(
             anchor_list,
@@ -490,13 +516,19 @@ def loss(self,
             img_metas,
             gt_bboxes_ignore_list=gt_bboxes_ignore,
             gt_labels_list=gt_labels,
-            label_channels=label_channels)
+            label_channels=label_channels,
+        )
         if cls_reg_targets is None:
             return None
-        (labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
-         num_total_pos, num_total_neg) = cls_reg_targets
-        num_total_samples = (
-            num_total_pos + num_total_neg if self.sampling else num_total_pos)
+        (
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_pos,
+            num_total_neg,
+        ) = cls_reg_targets
+        num_total_samples = num_total_pos + num_total_neg if self.sampling else num_total_pos
 
         # anchor number of multi levels
         num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
@@ -504,8 +536,7 @@ def loss(self,
         concat_anchor_list = []
         for i in range(len(anchor_list)):
             concat_anchor_list.append(torch.cat(anchor_list[i]))
-        all_anchor_list = images_to_levels(concat_anchor_list,
-                                           num_level_anchors)
+        all_anchor_list = images_to_levels(concat_anchor_list, num_level_anchors)
 
         losses_cls, losses_bbox = multi_apply(
             self.loss_single,
@@ -516,7 +547,8 @@ def loss(self,
             label_weights_list,
             bbox_targets_list,
             bbox_weights_list,
-            num_total_samples=num_total_samples)
+            num_total_samples=num_total_samples,
+        )
         return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)
 
     def aug_test(self, feats, img_metas, rescale=False):
diff --git a/mmdet/models/dense_heads/base_dense_head.py b/mmdet/models/dense_heads/base_dense_head.py
index 0c7abb7b..6503fe48 100644
--- a/mmdet/models/dense_heads/base_dense_head.py
+++ b/mmdet/models/dense_heads/base_dense_head.py
@@ -20,7 +20,7 @@ def init_weights(self):
         # avoid init_cfg overwrite the initialization of `conv_offset`
         for m in self.modules():
             # DeformConv2dPack, ModulatedDeformConv2dPack
-            if hasattr(m, 'conv_offset'):
+            if hasattr(m, "conv_offset"):
                 constant_init(m.conv_offset, 0)
 
     @abstractmethod
@@ -28,16 +28,18 @@ def loss(self, **kwargs):
         """Compute losses of the head."""
         pass
 
-    @force_fp32(apply_to=('cls_scores', 'bbox_preds'))
-    def get_bboxes(self,
-                   cls_scores,
-                   bbox_preds,
-                   score_factors=None,
-                   img_metas=None,
-                   cfg=None,
-                   rescale=False,
-                   with_nms=True,
-                   **kwargs):
+    @force_fp32(apply_to=("cls_scores", "bbox_preds"))
+    def get_bboxes(
+        self,
+        cls_scores,
+        bbox_preds,
+        score_factors=None,
+        img_metas=None,
+        cfg=None,
+        rescale=False,
+        with_nms=True,
+        **kwargs
+    ):
         """Transform network outputs of a batch into bbox results.
 
         Note: When score_factors is not None, the cls_scores are
@@ -84,9 +86,8 @@ def get_bboxes(self,
 
         featmap_sizes = [cls_scores[i].shape[-2:] for i in range(num_levels)]
         mlvl_priors = self.prior_generator.grid_priors(
-            featmap_sizes,
-            dtype=cls_scores[0].dtype,
-            device=cls_scores[0].device)
+            featmap_sizes, dtype=cls_scores[0].dtype, device=cls_scores[0].device
+        )
 
         result_list = []
 
@@ -99,23 +100,32 @@ def get_bboxes(self,
             else:
                 score_factor_list = [None for _ in range(num_levels)]
 
-            results = self._get_bboxes_single(cls_score_list, bbox_pred_list,
-                                              score_factor_list, mlvl_priors,
-                                              img_meta, cfg, rescale, with_nms,
-                                              **kwargs)
+            results = self._get_bboxes_single(
+                cls_score_list,
+                bbox_pred_list,
+                score_factor_list,
+                mlvl_priors,
+                img_meta,
+                cfg,
+                rescale,
+                with_nms,
+                **kwargs
+            )
             result_list.append(results)
         return result_list
 
-    def _get_bboxes_single(self,
-                           cls_score_list,
-                           bbox_pred_list,
-                           score_factor_list,
-                           mlvl_priors,
-                           img_meta,
-                           cfg,
-                           rescale=False,
-                           with_nms=True,
-                           **kwargs):
+    def _get_bboxes_single(
+        self,
+        cls_score_list,
+        bbox_pred_list,
+        score_factor_list,
+        mlvl_priors,
+        img_meta,
+        cfg,
+        rescale=False,
+        with_nms=True,
+        **kwargs
+    ):
         """Transform outputs of a single image into bbox predictions.
 
         Args:
@@ -164,8 +174,8 @@ def _get_bboxes_single(self,
             with_score_factors = True
 
         cfg = self.test_cfg if cfg is None else cfg
-        img_shape = img_meta['img_shape']
-        nms_pre = cfg.get('nms_pre', -1)
+        img_shape = img_meta["img_shape"]
+        nms_pre = cfg.get("nms_pre", -1)
 
         mlvl_bboxes = []
         mlvl_scores = []
@@ -174,18 +184,15 @@ def _get_bboxes_single(self,
             mlvl_score_factors = []
         else:
             mlvl_score_factors = None
-        for level_idx, (cls_score, bbox_pred, score_factor, priors) in \
-                enumerate(zip(cls_score_list, bbox_pred_list,
-                              score_factor_list, mlvl_priors)):
-
+        for level_idx, (cls_score, bbox_pred, score_factor, priors) in enumerate(
+            zip(cls_score_list, bbox_pred_list, score_factor_list, mlvl_priors)
+        ):
             assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
 
             bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, 4)
             if with_score_factors:
-                score_factor = score_factor.permute(1, 2,
-                                                    0).reshape(-1).sigmoid()
-            cls_score = cls_score.permute(1, 2,
-                                          0).reshape(-1, self.cls_out_channels)
+                score_factor = score_factor.permute(1, 2, 0).reshape(-1).sigmoid()
+            cls_score = cls_score.permute(1, 2, 0).reshape(-1, self.cls_out_channels)
             if self.use_sigmoid_cls:
                 scores = cls_score.sigmoid()
             else:
@@ -200,18 +207,17 @@ def _get_bboxes_single(self,
             # find a slight drop in performance, you can set a larger
             # `nms_pre` than before.
             results = filter_scores_and_topk(
-                scores, cfg.score_thr, nms_pre,
-                dict(bbox_pred=bbox_pred, priors=priors))
+                scores, cfg.score_thr, nms_pre, dict(bbox_pred=bbox_pred, priors=priors)
+            )
             scores, labels, keep_idxs, filtered_results = results
 
-            bbox_pred = filtered_results['bbox_pred']
-            priors = filtered_results['priors']
+            bbox_pred = filtered_results["bbox_pred"]
+            priors = filtered_results["priors"]
 
             if with_score_factors:
                 score_factor = score_factor[keep_idxs]
 
-            bboxes = self.bbox_coder.decode(
-                priors, bbox_pred, max_shape=img_shape)
+            bboxes = self.bbox_coder.decode(priors, bbox_pred, max_shape=img_shape)
 
             mlvl_bboxes.append(bboxes)
             mlvl_scores.append(scores)
@@ -219,20 +225,30 @@ def _get_bboxes_single(self,
             if with_score_factors:
                 mlvl_score_factors.append(score_factor)
 
-        return self._bbox_post_process(mlvl_scores, mlvl_labels, mlvl_bboxes,
-                                       img_meta['scale_factor'], cfg, rescale,
-                                       with_nms, mlvl_score_factors, **kwargs)
-
-    def _bbox_post_process(self,
-                           mlvl_scores,
-                           mlvl_labels,
-                           mlvl_bboxes,
-                           scale_factor,
-                           cfg,
-                           rescale=False,
-                           with_nms=True,
-                           mlvl_score_factors=None,
-                           **kwargs):
+        return self._bbox_post_process(
+            mlvl_scores,
+            mlvl_labels,
+            mlvl_bboxes,
+            img_meta["scale_factor"],
+            cfg,
+            rescale,
+            with_nms,
+            mlvl_score_factors,
+            **kwargs
+        )
+
+    def _bbox_post_process(
+        self,
+        mlvl_scores,
+        mlvl_labels,
+        mlvl_bboxes,
+        scale_factor,
+        cfg,
+        rescale=False,
+        with_nms=True,
+        mlvl_score_factors=None,
+        **kwargs
+    ):
         """bbox post-processing method.
 
         The boxes would be rescaled to the original image scale and do
@@ -292,22 +308,23 @@ def _bbox_post_process(self,
                 det_bboxes = torch.cat([mlvl_bboxes, mlvl_scores[:, None]], -1)
                 return det_bboxes, mlvl_labels
 
-            det_bboxes, keep_idxs = batched_nms(mlvl_bboxes, mlvl_scores,
-                                                mlvl_labels, cfg.nms)
-            det_bboxes = det_bboxes[:cfg.max_per_img]
-            det_labels = mlvl_labels[keep_idxs][:cfg.max_per_img]
+            det_bboxes, keep_idxs = batched_nms(mlvl_bboxes, mlvl_scores, mlvl_labels, cfg.nms)
+            det_bboxes = det_bboxes[: cfg.max_per_img]
+            det_labels = mlvl_labels[keep_idxs][: cfg.max_per_img]
             return det_bboxes, det_labels
         else:
             return mlvl_bboxes, mlvl_scores, mlvl_labels
 
-    def forward_train(self,
-                      x,
-                      img_metas,
-                      gt_bboxes,
-                      gt_labels=None,
-                      gt_bboxes_ignore=None,
-                      proposal_cfg=None,
-                      **kwargs):
+    def forward_train(
+        self,
+        x,
+        img_metas,
+        gt_bboxes,
+        gt_labels=None,
+        gt_bboxes_ignore=None,
+        proposal_cfg=None,
+        **kwargs
+    ):
         """
         Args:
             x (list[Tensor]): Features from FPN.
@@ -336,8 +353,7 @@ def forward_train(self,
         if proposal_cfg is None:
             return losses
         else:
-            proposal_list = self.get_bboxes(
-                *outs, img_metas=img_metas, cfg=proposal_cfg)
+            proposal_list = self.get_bboxes(*outs, img_metas=img_metas, cfg=proposal_cfg)
             return losses, proposal_list
 
     def simple_test(self, feats, img_metas, rescale=False):
@@ -359,13 +375,10 @@ def simple_test(self, feats, img_metas, rescale=False):
         """
         return self.simple_test_bboxes(feats, img_metas, rescale=rescale)
 
-    @force_fp32(apply_to=('cls_scores', 'bbox_preds'))
-    def onnx_export(self,
-                    cls_scores,
-                    bbox_preds,
-                    score_factors=None,
-                    img_metas=None,
-                    with_nms=True):
+    @force_fp32(apply_to=("cls_scores", "bbox_preds"))
+    def onnx_export(
+        self, cls_scores, bbox_preds, score_factors=None, img_metas=None, with_nms=True
+    ):
         """Transform network output for a batch into bbox predictions.
 
         Args:
@@ -395,25 +408,21 @@ def onnx_export(self,
 
         featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
         mlvl_priors = self.prior_generator.grid_priors(
-            featmap_sizes,
-            dtype=bbox_preds[0].dtype,
-            device=bbox_preds[0].device)
+            featmap_sizes, dtype=bbox_preds[0].dtype, device=bbox_preds[0].device
+        )
 
         mlvl_cls_scores = [cls_scores[i].detach() for i in range(num_levels)]
         mlvl_bbox_preds = [bbox_preds[i].detach() for i in range(num_levels)]
 
-        assert len(
-            img_metas
-        ) == 1, 'Only support one input image while in exporting to ONNX'
-        img_shape = img_metas[0]['img_shape_for_onnx']
+        assert len(img_metas) == 1, "Only support one input image while in exporting to ONNX"
+        img_shape = img_metas[0]["img_shape_for_onnx"]
 
         cfg = self.test_cfg
         assert len(cls_scores) == len(bbox_preds) == len(mlvl_priors)
         device = cls_scores[0].device
         batch_size = cls_scores[0].shape[0]
         # convert to tensor to keep tracing
-        nms_pre_tensor = torch.tensor(
-            cfg.get('nms_pre', -1), device=device, dtype=torch.long)
+        nms_pre_tensor = torch.tensor(cfg.get("nms_pre", -1), device=device, dtype=torch.long)
 
         # e.g. Retina, FreeAnchor, etc.
         if score_factors is None:
@@ -422,22 +431,18 @@ def onnx_export(self,
         else:
             # e.g. FCOS, PAA, ATSS, etc.
             with_score_factors = True
-            mlvl_score_factor = [
-                score_factors[i].detach() for i in range(num_levels)
-            ]
+            mlvl_score_factor = [score_factors[i].detach() for i in range(num_levels)]
             mlvl_score_factors = []
 
         mlvl_batch_bboxes = []
         mlvl_scores = []
 
         for cls_score, bbox_pred, score_factors, priors in zip(
-                mlvl_cls_scores, mlvl_bbox_preds, mlvl_score_factor,
-                mlvl_priors):
+            mlvl_cls_scores, mlvl_bbox_preds, mlvl_score_factor, mlvl_priors
+        ):
             assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
 
-            scores = cls_score.permute(0, 2, 3,
-                                       1).reshape(batch_size, -1,
-                                                  self.cls_out_channels)
+            scores = cls_score.permute(0, 2, 3, 1).reshape(batch_size, -1, self.cls_out_channels)
             if self.use_sigmoid_cls:
                 scores = scores.sigmoid()
                 nms_pre_score = scores
@@ -446,18 +451,16 @@ def onnx_export(self,
                 nms_pre_score = scores
 
             if with_score_factors:
-                score_factors = score_factors.permute(0, 2, 3, 1).reshape(
-                    batch_size, -1).sigmoid()
-            bbox_pred = bbox_pred.permute(0, 2, 3,
-                                          1).reshape(batch_size, -1, 4)
+                score_factors = score_factors.permute(0, 2, 3, 1).reshape(batch_size, -1).sigmoid()
+            bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(batch_size, -1, 4)
             priors = priors.expand(batch_size, -1, priors.size(-1))
             # Get top-k predictions
             from mmdet.core.export import get_k_for_topk
+
             nms_pre = get_k_for_topk(nms_pre_tensor, bbox_pred.shape[1])
             if nms_pre > 0:
-
                 if with_score_factors:
-                    nms_pre_score = (nms_pre_score * score_factors[..., None])
+                    nms_pre_score = nms_pre_score * score_factors[..., None]
                 else:
                     nms_pre_score = nms_pre_score
 
@@ -471,26 +474,27 @@ def onnx_export(self,
                     max_scores, _ = nms_pre_score[..., :-1].max(-1)
                 _, topk_inds = max_scores.topk(nms_pre)
 
-                batch_inds = torch.arange(
-                    batch_size, device=bbox_pred.device).view(
-                        -1, 1).expand_as(topk_inds).long()
+                batch_inds = (
+                    torch.arange(batch_size, device=bbox_pred.device)
+                    .view(-1, 1)
+                    .expand_as(topk_inds)
+                    .long()
+                )
                 # Avoid onnx2tensorrt issue in https://github.com/NVIDIA/TensorRT/issues/1134 # noqa: E501
                 transformed_inds = bbox_pred.shape[1] * batch_inds + topk_inds
-                priors = priors.reshape(
-                    -1, priors.size(-1))[transformed_inds, :].reshape(
-                        batch_size, -1, priors.size(-1))
-                bbox_pred = bbox_pred.reshape(-1,
-                                              4)[transformed_inds, :].reshape(
-                                                  batch_size, -1, 4)
-                scores = scores.reshape(
-                    -1, self.cls_out_channels)[transformed_inds, :].reshape(
-                        batch_size, -1, self.cls_out_channels)
+                priors = priors.reshape(-1, priors.size(-1))[transformed_inds, :].reshape(
+                    batch_size, -1, priors.size(-1)
+                )
+                bbox_pred = bbox_pred.reshape(-1, 4)[transformed_inds, :].reshape(batch_size, -1, 4)
+                scores = scores.reshape(-1, self.cls_out_channels)[transformed_inds, :].reshape(
+                    batch_size, -1, self.cls_out_channels
+                )
                 if with_score_factors:
-                    score_factors = score_factors.reshape(
-                        -1, 1)[transformed_inds].reshape(batch_size, -1)
+                    score_factors = score_factors.reshape(-1, 1)[transformed_inds].reshape(
+                        batch_size, -1
+                    )
 
-            bboxes = self.bbox_coder.decode(
-                priors, bbox_pred, max_shape=img_shape)
+            bboxes = self.bbox_coder.decode(priors, bbox_pred, max_shape=img_shape)
 
             mlvl_batch_bboxes.append(bboxes)
             mlvl_scores.append(scores)
@@ -507,20 +511,24 @@ def onnx_export(self,
         from mmdet.core.export import add_dummy_nms_for_onnx
 
         if not self.use_sigmoid_cls:
-            batch_scores = batch_scores[..., :self.num_classes]
+            batch_scores = batch_scores[..., : self.num_classes]
 
         if with_score_factors:
             batch_scores = batch_scores * (batch_score_factors.unsqueeze(2))
 
         if with_nms:
-            max_output_boxes_per_class = cfg.nms.get(
-                'max_output_boxes_per_class', 200)
-            iou_threshold = cfg.nms.get('iou_threshold', 0.5)
+            max_output_boxes_per_class = cfg.nms.get("max_output_boxes_per_class", 200)
+            iou_threshold = cfg.nms.get("iou_threshold", 0.5)
             score_threshold = cfg.score_thr
-            nms_pre = cfg.get('deploy_nms_pre', -1)
-            return add_dummy_nms_for_onnx(batch_bboxes, batch_scores,
-                                          max_output_boxes_per_class,
-                                          iou_threshold, score_threshold,
-                                          nms_pre, cfg.max_per_img)
+            nms_pre = cfg.get("deploy_nms_pre", -1)
+            return add_dummy_nms_for_onnx(
+                batch_bboxes,
+                batch_scores,
+                max_output_boxes_per_class,
+                iou_threshold,
+                score_threshold,
+                nms_pre,
+                cfg.max_per_img,
+            )
         else:
             return batch_bboxes, batch_scores
diff --git a/mmdet/models/dense_heads/rpn_head.py b/mmdet/models/dense_heads/rpn_head.py
index 54cd39a2..e9c06541 100644
--- a/mmdet/models/dense_heads/rpn_head.py
+++ b/mmdet/models/dense_heads/rpn_head.py
@@ -21,14 +21,15 @@ class RPNHead(AnchorHead):
         num_convs (int): Number of convolution layers in the head. Default 1.
     """  # noqa: W605
 
-    def __init__(self,
-                 in_channels,
-                 init_cfg=dict(type='Normal', layer='Conv2d', std=0.01),
-                 num_convs=1,
-                 **kwargs):
+    def __init__(
+        self,
+        in_channels,
+        init_cfg=dict(type="Normal", layer="Conv2d", std=0.01),
+        num_convs=1,
+        **kwargs
+    ):
         self.num_convs = num_convs
-        super(RPNHead, self).__init__(
-            1, in_channels, init_cfg=init_cfg, **kwargs)
+        super(RPNHead, self).__init__(1, in_channels, init_cfg=init_cfg, **kwargs)
 
     def _init_layers(self):
         """Initialize layers of the head."""
@@ -43,21 +44,15 @@ def _init_layers(self):
                 # needed for gradient computation has been modified by an
                 # inplace operation.
                 rpn_convs.append(
-                    ConvModule(
-                        in_channels,
-                        self.feat_channels,
-                        3,
-                        padding=1,
-                        inplace=False))
+                    ConvModule(in_channels, self.feat_channels, 3, padding=1, inplace=False)
+                )
             self.rpn_conv = nn.Sequential(*rpn_convs)
         else:
-            self.rpn_conv = nn.Conv2d(
-                self.in_channels, self.feat_channels, 3, padding=1)
-        self.rpn_cls = nn.Conv2d(self.feat_channels,
-                                 self.num_base_priors * self.cls_out_channels,
-                                 1)
-        self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_base_priors * 4,
-                                 1)
+            self.rpn_conv = nn.Conv2d(self.in_channels, self.feat_channels, 3, padding=1)
+        self.rpn_cls = nn.Conv2d(
+            self.feat_channels, self.num_base_priors * self.cls_out_channels, 1
+        )
+        self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_base_priors * 4, 1)
 
     def forward_single(self, x):
         """Forward feature map of a single scale level."""
@@ -67,12 +62,7 @@ def forward_single(self, x):
         rpn_bbox_pred = self.rpn_reg(x)
         return rpn_cls_score, rpn_bbox_pred
 
-    def loss(self,
-             cls_scores,
-             bbox_preds,
-             gt_bboxes,
-             img_metas,
-             gt_bboxes_ignore=None):
+    def loss(self, cls_scores, bbox_preds, gt_bboxes, img_metas, gt_bboxes_ignore=None):
         """Compute losses of the head.
 
         Args:
@@ -91,25 +81,25 @@ def loss(self,
             dict[str, Tensor]: A dictionary of loss components.
         """
         losses = super(RPNHead, self).loss(
-            cls_scores,
-            bbox_preds,
-            gt_bboxes,
-            None,
-            img_metas,
-            gt_bboxes_ignore=gt_bboxes_ignore)
-        return dict(
-            loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])
+            cls_scores, bbox_preds, gt_bboxes, None, img_metas, gt_bboxes_ignore=gt_bboxes_ignore
+        )
+        if losses:
+            return dict(loss_rpn_cls=losses["loss_cls"], loss_rpn_bbox=losses["loss_bbox"])
+        else:
+            return dict(loss_rpn_cls=None, loss_rpn_bbox=None)
 
-    def _get_bboxes_single(self,
-                           cls_score_list,
-                           bbox_pred_list,
-                           score_factor_list,
-                           mlvl_anchors,
-                           img_meta,
-                           cfg,
-                           rescale=False,
-                           with_nms=True,
-                           **kwargs):
+    def _get_bboxes_single(
+        self,
+        cls_score_list,
+        bbox_pred_list,
+        score_factor_list,
+        mlvl_anchors,
+        img_meta,
+        cfg,
+        rescale=False,
+        with_nms=True,
+        **kwargs
+    ):
         """Transform outputs of a single image into bbox predictions.
 
         Args:
@@ -138,7 +128,7 @@ def _get_bboxes_single(self,
         """
         cfg = self.test_cfg if cfg is None else cfg
         cfg = copy.deepcopy(cfg)
-        img_shape = img_meta['img_shape']
+        img_shape = img_meta["img_shape"]
 
         # bboxes from different level should be independent during NMS,
         # level_ids are used as labels for batched NMS to separate them
@@ -146,7 +136,7 @@ def _get_bboxes_single(self,
         mlvl_scores = []
         mlvl_bbox_preds = []
         mlvl_valid_anchors = []
-        nms_pre = cfg.get('nms_pre', -1)
+        nms_pre = cfg.get("nms_pre", -1)
         for level_idx in range(len(cls_score_list)):
             rpn_cls_score = cls_score_list[level_idx]
             rpn_bbox_pred = bbox_pred_list[level_idx]
@@ -177,17 +167,15 @@ def _get_bboxes_single(self,
             mlvl_scores.append(scores)
             mlvl_bbox_preds.append(rpn_bbox_pred)
             mlvl_valid_anchors.append(anchors)
-            level_ids.append(
-                scores.new_full((scores.size(0), ),
-                                level_idx,
-                                dtype=torch.long))
+            level_ids.append(scores.new_full((scores.size(0),), level_idx, dtype=torch.long))
 
-        return self._bbox_post_process(mlvl_scores, mlvl_bbox_preds,
-                                       mlvl_valid_anchors, level_ids, cfg,
-                                       img_shape)
+        return self._bbox_post_process(
+            mlvl_scores, mlvl_bbox_preds, mlvl_valid_anchors, level_ids, cfg, img_shape
+        )
 
-    def _bbox_post_process(self, mlvl_scores, mlvl_bboxes, mlvl_valid_anchors,
-                           level_ids, cfg, img_shape, **kwargs):
+    def _bbox_post_process(
+        self, mlvl_scores, mlvl_bboxes, mlvl_valid_anchors, level_ids, cfg, img_shape, **kwargs
+    ):
         """bbox post-processing method.
 
         Do the nms operation for bboxes in same level.
@@ -214,8 +202,7 @@ def _bbox_post_process(self, mlvl_scores, mlvl_bboxes, mlvl_valid_anchors,
         scores = torch.cat(mlvl_scores)
         anchors = torch.cat(mlvl_valid_anchors)
         rpn_bbox_pred = torch.cat(mlvl_bboxes)
-        proposals = self.bbox_coder.decode(
-            anchors, rpn_bbox_pred, max_shape=img_shape)
+        proposals = self.bbox_coder.decode(anchors, rpn_bbox_pred, max_shape=img_shape)
         ids = torch.cat(level_ids)
 
         if cfg.min_bbox_size >= 0:
@@ -232,7 +219,7 @@ def _bbox_post_process(self, mlvl_scores, mlvl_bboxes, mlvl_valid_anchors,
         else:
             return proposals.new_zeros(0, 5)
 
-        return dets[:cfg.max_per_img]
+        return dets[: cfg.max_per_img]
 
     def onnx_export(self, x, img_metas):
         """Test without augmentation.
@@ -249,17 +236,23 @@ def onnx_export(self, x, img_metas):
         assert len(cls_scores) == len(bbox_preds)
 
         batch_bboxes, batch_scores = super(RPNHead, self).onnx_export(
-            cls_scores, bbox_preds, img_metas=img_metas, with_nms=False)
+            cls_scores, bbox_preds, img_metas=img_metas, with_nms=False
+        )
         # Use ONNX::NonMaxSuppression in deployment
         from mmdet.core.export import add_dummy_nms_for_onnx
+
         cfg = copy.deepcopy(self.test_cfg)
-        score_threshold = cfg.nms.get('score_thr', 0.0)
-        nms_pre = cfg.get('deploy_nms_pre', -1)
+        score_threshold = cfg.nms.get("score_thr", 0.0)
+        nms_pre = cfg.get("deploy_nms_pre", -1)
         # Different from the normal forward doing NMS level by level,
         # we do NMS across all levels when exporting ONNX.
-        dets, _ = add_dummy_nms_for_onnx(batch_bboxes, batch_scores,
-                                         cfg.max_per_img,
-                                         cfg.nms.iou_threshold,
-                                         score_threshold, nms_pre,
-                                         cfg.max_per_img)
+        dets, _ = add_dummy_nms_for_onnx(
+            batch_bboxes,
+            batch_scores,
+            cfg.max_per_img,
+            cfg.nms.iou_threshold,
+            score_threshold,
+            nms_pre,
+            cfg.max_per_img,
+        )
         return dets
diff --git a/mmdet/models/dense_heads/semi_rpn_head.py b/mmdet/models/dense_heads/semi_rpn_head.py
new file mode 100644
index 00000000..494f1d8a
--- /dev/null
+++ b/mmdet/models/dense_heads/semi_rpn_head.py
@@ -0,0 +1,387 @@
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcv.cnn import normal_init
+from mmcv.ops import batched_nms
+from mmcv.runner import force_fp32
+
+from mmdet.core import (
+    anchor_inside_flags,
+    build_anchor_generator,
+    build_assigner,
+    build_bbox_coder,
+    build_sampler,
+    images_to_levels,
+    multi_apply,
+    multiclass_nms,
+    unmap,
+)
+from ..builder import HEADS
+from .rpn_head import RPNHead
+
+
+@HEADS.register_module()
+class SemiRPNHead(RPNHead):
+    def forward_train(
+        self,
+        x,
+        img_metas,
+        gt_bboxes,
+        gt_labels=None,
+        gt_bboxes_ignore=None,
+        proposal_cfg=None,
+        gt_only_footprint_flag=None,
+        gt_footprint_bboxes=None,
+        angle_pred=None,
+        **kwargs
+    ):
+        """
+        Args:
+            x (list[Tensor]): Features from FPN.
+            img_metas (list[dict]): Meta information of each image, e.g.,
+                image size, scaling factor, etc.
+            gt_bboxes (Tensor): Ground truth bboxes of the image,
+                shape (num_gts, 4).
+            gt_labels (Tensor): Ground truth labels of each box,
+                shape (num_gts,).
+            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
+                ignored, shape (num_ignored_gts, 4).
+            proposal_cfg (mmcv.Config): Test / postprocessing configuration,
+                if None, test_cfg would be used
+
+        Returns:
+            tuple:
+                losses: (dict[str, Tensor]): A dictionary of loss components.
+                proposal_list (list[Tensor]): Proposals of each image.
+        """
+        outs = self(x)
+
+        if angle_pred is None:
+            only_footprint_flag = gt_only_footprint_flag
+        else:
+            only_footprint_flag = gt_only_footprint_flag[:]
+            angle_pred_ = angle_pred.clone()
+            for idx in range(len(only_footprint_flag)):
+                if (
+                    only_footprint_flag[idx].cpu().numpy()[0] == 1
+                    and angle_pred_.cpu().detach().numpy()[idx][0] * 180 / np.pi < 10
+                ):
+                    only_footprint_flag[idx] = only_footprint_flag[idx] * 0
+
+                    gt_bboxes[idx] = gt_footprint_bboxes[idx]
+
+        if gt_labels is None:
+            loss_inputs = outs + (gt_bboxes, None, img_metas)
+        else:
+            loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
+        losses = self.loss(
+            *loss_inputs,
+            gt_bboxes_ignore=gt_bboxes_ignore,
+            gt_only_footprint_flag=only_footprint_flag
+        )
+        if proposal_cfg is None:
+            return losses
+        else:
+            proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
+            return losses, proposal_list
+
+    def _get_targets_single(
+        self,
+        flat_anchors,
+        valid_flags,
+        gt_bboxes,
+        gt_bboxes_ignore,
+        gt_labels,
+        img_meta,
+        gt_only_footprint_flag,
+        label_channels=1,
+        unmap_outputs=True,
+    ):
+        """Compute regression and classification targets for anchors in a
+        single image.
+
+        Args:
+            flat_anchors (Tensor): Multi-level anchors of the image, which are
+                concatenated into a single tensor of shape (num_anchors ,4)
+            valid_flags (Tensor): Multi level valid flags of the image,
+                which are concatenated into a single tensor of
+                    shape (num_anchors,).
+            gt_bboxes (Tensor): Ground truth bboxes of the image,
+                shape (num_gts, 4).
+            img_meta (dict): Meta info of the image.
+            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
+                ignored, shape (num_ignored_gts, 4).
+            img_meta (dict): Meta info of the image.
+            gt_labels (Tensor): Ground truth labels of each box,
+                shape (num_gts,).
+            label_channels (int): Channel of label.
+            unmap_outputs (bool): Whether to map outputs back to the original
+                set of anchors.
+
+        Returns:
+            tuple:
+                labels_list (list[Tensor]): Labels of each level
+                label_weights_list (list[Tensor]): Label weights of each level
+                bbox_targets_list (list[Tensor]): BBox targets of each level
+                bbox_weights_list (list[Tensor]): BBox weights of each level
+                num_total_pos (int): Number of positive samples in all images
+                num_total_neg (int): Number of negative samples in all images
+        """
+        gt_only_footprint_flag = gt_only_footprint_flag.cpu().numpy()[0]
+        inside_flags = anchor_inside_flags(
+            flat_anchors, valid_flags, img_meta["img_shape"][:2], self.train_cfg.allowed_border
+        )
+        if not inside_flags.any():
+            return (None,) * 7
+        # assign gt and sample anchors
+        anchors = flat_anchors[inside_flags, :]
+
+        assign_result = self.assigner.assign(
+            anchors, gt_bboxes, gt_bboxes_ignore, None if self.sampling else gt_labels
+        )
+        sampling_result = self.sampler.sample(assign_result, anchors, gt_bboxes)
+
+        num_valid_anchors = anchors.shape[0]
+        bbox_targets = torch.zeros_like(anchors)
+        bbox_weights = torch.zeros_like(anchors)
+        labels = anchors.new_full((num_valid_anchors,), self.background_label, dtype=torch.long)
+        label_weights = anchors.new_zeros(num_valid_anchors, dtype=torch.float)
+
+        pos_inds = sampling_result.pos_inds
+        neg_inds = sampling_result.neg_inds
+        if len(pos_inds) > 0:
+            if not self.reg_decoded_bbox:
+                pos_bbox_targets = self.bbox_coder.encode(
+                    sampling_result.pos_bboxes, sampling_result.pos_gt_bboxes
+                )
+            else:
+                pos_bbox_targets = sampling_result.pos_gt_bboxes
+            bbox_targets[pos_inds, :] = pos_bbox_targets
+            bbox_weights[pos_inds, :] = 1.0 - gt_only_footprint_flag
+            if gt_labels is None:
+                # only rpn gives gt_labels as None, this time FG is 1
+                labels[pos_inds] = 1
+            else:
+                labels[pos_inds] = gt_labels[sampling_result.pos_assigned_gt_inds]
+            if self.train_cfg.pos_weight <= 0:
+                label_weights[pos_inds] = 1.0
+            else:
+                label_weights[pos_inds] = self.train_cfg.pos_weight
+        if len(neg_inds) > 0:
+            label_weights[neg_inds] = 1.0
+
+        # map up to original set of anchors
+        if unmap_outputs:
+            num_total_anchors = flat_anchors.size(0)
+            labels = unmap(
+                labels, num_total_anchors, inside_flags, fill=self.background_label
+            )  # fill bg label
+            label_weights = unmap(label_weights, num_total_anchors, inside_flags)
+            bbox_targets = unmap(bbox_targets, num_total_anchors, inside_flags)
+            bbox_weights = unmap(bbox_weights, num_total_anchors, inside_flags)
+
+        return (
+            labels,
+            label_weights,
+            bbox_targets,
+            bbox_weights,
+            pos_inds,
+            neg_inds,
+            sampling_result,
+        )
+
+    def get_targets(
+        self,
+        anchor_list,
+        valid_flag_list,
+        gt_bboxes_list,
+        img_metas,
+        gt_bboxes_ignore_list=None,
+        gt_labels_list=None,
+        label_channels=1,
+        unmap_outputs=True,
+        return_sampling_results=False,
+        gt_only_footprint_flag=None,
+    ):
+        """Compute regression and classification targets for anchors in
+        multiple images.
+
+        Args:
+            anchor_list (list[list[Tensor]]): Multi level anchors of each
+                image. The outer list indicates images, and the inner list
+                corresponds to feature levels of the image. Each element of
+                the inner list is a tensor of shape (num_anchors, 4).
+            valid_flag_list (list[list[Tensor]]): Multi level valid flags of
+                each image. The outer list indicates images, and the inner list
+                corresponds to feature levels of the image. Each element of
+                the inner list is a tensor of shape (num_anchors, )
+            gt_bboxes_list (list[Tensor]): Ground truth bboxes of each image.
+            img_metas (list[dict]): Meta info of each image.
+            gt_bboxes_ignore_list (list[Tensor]): Ground truth bboxes to be
+                ignored.
+            gt_labels_list (list[Tensor]): Ground truth labels of each box.
+            label_channels (int): Channel of label.
+            unmap_outputs (bool): Whether to map outputs back to the original
+                set of anchors.
+
+        Returns:
+            tuple: Usually returns a tuple containing learning targets.
+
+                - labels_list (list[Tensor]): Labels of each level.
+                - label_weights_list (list[Tensor]): Label weights of each \
+                    level.
+                - bbox_targets_list (list[Tensor]): BBox targets of each level.
+                - bbox_weights_list (list[Tensor]): BBox weights of each level.
+                - num_total_pos (int): Number of positive samples in all \
+                    images.
+                - num_total_neg (int): Number of negative samples in all \
+                    images.
+            additional_returns: This function enables user-defined returns from
+                `self._get_targets_single`. These returns are currently refined
+                to properties at each feature map (i.e. having HxW dimension).
+                The results will be concatenated after the end
+        """
+        num_imgs = len(img_metas)
+        assert len(anchor_list) == len(valid_flag_list) == num_imgs
+
+        # anchor number of multi levels
+        num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
+        # concat all level anchors to a single tensor
+        concat_anchor_list = []
+        concat_valid_flag_list = []
+        for i in range(num_imgs):
+            assert len(anchor_list[i]) == len(valid_flag_list[i])
+            concat_anchor_list.append(torch.cat(anchor_list[i]))
+            concat_valid_flag_list.append(torch.cat(valid_flag_list[i]))
+
+        # compute targets for each image
+        if gt_bboxes_ignore_list is None:
+            gt_bboxes_ignore_list = [None for _ in range(num_imgs)]
+        if gt_labels_list is None:
+            gt_labels_list = [None for _ in range(num_imgs)]
+        results = multi_apply(
+            self._get_targets_single,
+            concat_anchor_list,
+            concat_valid_flag_list,
+            gt_bboxes_list,
+            gt_bboxes_ignore_list,
+            gt_labels_list,
+            img_metas,
+            gt_only_footprint_flag,
+            label_channels=label_channels,
+            unmap_outputs=unmap_outputs,
+        )
+        (
+            all_labels,
+            all_label_weights,
+            all_bbox_targets,
+            all_bbox_weights,
+            pos_inds_list,
+            neg_inds_list,
+            sampling_results_list,
+        ) = results[:7]
+        rest_results = list(results[7:])  # user-added return values
+        # no valid anchors
+        if any([labels is None for labels in all_labels]):
+            return None
+        # sampled anchors of all images
+        num_total_pos = sum([max(inds.numel(), 1) for inds in pos_inds_list])
+        num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list])
+        # split targets to a list w.r.t. multiple levels
+        labels_list = images_to_levels(all_labels, num_level_anchors)
+        label_weights_list = images_to_levels(all_label_weights, num_level_anchors)
+        bbox_targets_list = images_to_levels(all_bbox_targets, num_level_anchors)
+        bbox_weights_list = images_to_levels(all_bbox_weights, num_level_anchors)
+        res = (
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_pos,
+            num_total_neg,
+        )
+        if return_sampling_results:
+            res = res + (sampling_results_list,)
+        for i, r in enumerate(rest_results):  # user-added return values
+            rest_results[i] = images_to_levels(r, num_level_anchors)
+
+        return res + tuple(rest_results)
+
+    def loss(
+        self,
+        cls_scores,
+        bbox_preds,
+        gt_bboxes,
+        gt_labels,
+        img_metas,
+        gt_bboxes_ignore=None,
+        gt_only_footprint_flag=None,
+    ):
+        """Compute losses of the head.
+
+        Args:
+            cls_scores (list[Tensor]): Box scores for each scale level
+                Has shape (N, num_anchors * num_classes, H, W)
+            bbox_preds (list[Tensor]): Box energies / deltas for each scale
+                level with shape (N, num_anchors * 4, H, W)
+            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
+                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
+            gt_labels (list[Tensor]): class indices corresponding to each box
+            img_metas (list[dict]): Meta information of each image, e.g.,
+                image size, scaling factor, etc.
+            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
+                boxes can be ignored when computing the loss. Default: None
+
+        Returns:
+            dict[str, Tensor]: A dictionary of loss components.
+        """
+        featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
+        assert len(featmap_sizes) == self.anchor_generator.num_levels
+
+        device = cls_scores[0].device
+
+        anchor_list, valid_flag_list = self.get_anchors(featmap_sizes, img_metas, device=device)
+        label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
+        cls_reg_targets = self.get_targets(
+            anchor_list,
+            valid_flag_list,
+            gt_bboxes,
+            img_metas,
+            gt_bboxes_ignore_list=gt_bboxes_ignore,
+            gt_labels_list=gt_labels,
+            label_channels=label_channels,
+            gt_only_footprint_flag=gt_only_footprint_flag,
+        )
+        if cls_reg_targets is None:
+            return None
+        (
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_pos,
+            num_total_neg,
+        ) = cls_reg_targets
+        num_total_samples = num_total_pos + num_total_neg if self.sampling else num_total_pos
+
+        # anchor number of multi levels
+        num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
+        # concat all level anchors and flags to a single tensor
+        concat_anchor_list = []
+        for i in range(len(anchor_list)):
+            concat_anchor_list.append(torch.cat(anchor_list[i]))
+        all_anchor_list = images_to_levels(concat_anchor_list, num_level_anchors)
+
+        losses_cls, losses_bbox = multi_apply(
+            self.loss_single,
+            cls_scores,
+            bbox_preds,
+            all_anchor_list,
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_samples=num_total_samples,
+        )
+        return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)
diff --git a/mmdet/models/detectors/__init__.py b/mmdet/models/detectors/__init__.py
index a0a89b87..d9483523 100644
--- a/mmdet/models/detectors/__init__.py
+++ b/mmdet/models/detectors/__init__.py
@@ -18,6 +18,7 @@
 from .htc import HybridTaskCascade
 from .kd_one_stage import KnowledgeDistillationSingleStageDetector
 from .lad import LAD
+from .loft import LOFT
 from .mask2former import Mask2Former
 from .mask_rcnn import MaskRCNN
 from .mask_scoring_rcnn import MaskScoringRCNN
@@ -46,13 +47,50 @@
 from .yolox import YOLOX
 
 __all__ = [
-    'ATSS', 'BaseDetector', 'SingleStageDetector', 'TwoStageDetector', 'RPN',
-    'KnowledgeDistillationSingleStageDetector', 'FastRCNN', 'FasterRCNN',
-    'MaskRCNN', 'CascadeRCNN', 'HybridTaskCascade', 'RetinaNet', 'FCOS',
-    'GridRCNN', 'MaskScoringRCNN', 'RepPointsDetector', 'FOVEA', 'FSAF',
-    'NASFCOS', 'PointRend', 'GFL', 'CornerNet', 'PAA', 'YOLOV3', 'YOLACT',
-    'VFNet', 'DETR', 'TridentFasterRCNN', 'SparseRCNN', 'SCNet', 'SOLO',
-    'SOLOv2', 'DeformableDETR', 'AutoAssign', 'YOLOF', 'CenterNet', 'YOLOX',
-    'TwoStagePanopticSegmentor', 'PanopticFPN', 'QueryInst', 'LAD', 'TOOD',
-    'MaskFormer', 'DDOD', 'Mask2Former'
+    "ATSS",
+    "BaseDetector",
+    "SingleStageDetector",
+    "TwoStageDetector",
+    "RPN",
+    "KnowledgeDistillationSingleStageDetector",
+    "FastRCNN",
+    "FasterRCNN",
+    "MaskRCNN",
+    "CascadeRCNN",
+    "HybridTaskCascade",
+    "RetinaNet",
+    "FCOS",
+    "GridRCNN",
+    "MaskScoringRCNN",
+    "RepPointsDetector",
+    "FOVEA",
+    "FSAF",
+    "NASFCOS",
+    "PointRend",
+    "GFL",
+    "CornerNet",
+    "PAA",
+    "YOLOV3",
+    "YOLACT",
+    "VFNet",
+    "DETR",
+    "TridentFasterRCNN",
+    "SparseRCNN",
+    "SCNet",
+    "SOLO",
+    "SOLOv2",
+    "DeformableDETR",
+    "AutoAssign",
+    "YOLOF",
+    "CenterNet",
+    "YOLOX",
+    "TwoStagePanopticSegmentor",
+    "PanopticFPN",
+    "QueryInst",
+    "LAD",
+    "TOOD",
+    "MaskFormer",
+    "DDOD",
+    "Mask2Former",
+    "LOFT",
 ]
diff --git a/mmdet/models/detectors/base.py b/mmdet/models/detectors/base.py
index f87097b1..6a122d49 100644
--- a/mmdet/models/detectors/base.py
+++ b/mmdet/models/detectors/base.py
@@ -21,26 +21,28 @@ def __init__(self, init_cfg=None):
     @property
     def with_neck(self):
         """bool: whether the detector has a neck"""
-        return hasattr(self, 'neck') and self.neck is not None
+        return hasattr(self, "neck") and self.neck is not None
 
     # TODO: these properties need to be carefully handled
     # for both single stage & two stage detectors
     @property
     def with_shared_head(self):
         """bool: whether the detector has a shared head in the RoI Head"""
-        return hasattr(self, 'roi_head') and self.roi_head.with_shared_head
+        return hasattr(self, "roi_head") and self.roi_head.with_shared_head
 
     @property
     def with_bbox(self):
         """bool: whether the detector has a bbox head"""
-        return ((hasattr(self, 'roi_head') and self.roi_head.with_bbox)
-                or (hasattr(self, 'bbox_head') and self.bbox_head is not None))
+        return (hasattr(self, "roi_head") and self.roi_head.with_bbox) or (
+            hasattr(self, "bbox_head") and self.bbox_head is not None
+        )
 
     @property
     def with_mask(self):
         """bool: whether the detector has a mask head"""
-        return ((hasattr(self, 'roi_head') and self.roi_head.with_mask)
-                or (hasattr(self, 'mask_head') and self.mask_head is not None))
+        return (hasattr(self, "roi_head") and self.roi_head.with_mask) or (
+            hasattr(self, "mask_head") and self.mask_head is not None
+        )
 
     @abstractmethod
     def extract_feat(self, imgs):
@@ -77,7 +79,7 @@ def forward_train(self, imgs, img_metas, **kwargs):
         # then used for the transformer_head.
         batch_input_shape = tuple(imgs[0].size()[-2:])
         for img_meta in img_metas:
-            img_meta['batch_input_shape'] = batch_input_shape
+            img_meta["batch_input_shape"] = batch_input_shape
 
     async def async_simple_test(self, img, img_metas, **kwargs):
         raise NotImplementedError
@@ -92,14 +94,15 @@ def aug_test(self, imgs, img_metas, **kwargs):
         pass
 
     async def aforward_test(self, *, img, img_metas, **kwargs):
-        for var, name in [(img, 'img'), (img_metas, 'img_metas')]:
+        for var, name in [(img, "img"), (img_metas, "img_metas")]:
             if not isinstance(var, list):
-                raise TypeError(f'{name} must be a list, but got {type(var)}')
+                raise TypeError(f"{name} must be a list, but got {type(var)}")
 
         num_augs = len(img)
         if num_augs != len(img_metas):
-            raise ValueError(f'num of augmentations ({len(img)}) '
-                             f'!= num of image metas ({len(img_metas)})')
+            raise ValueError(
+                f"num of augmentations ({len(img)}) " f"!= num of image metas ({len(img_metas)})"
+            )
         # TODO: remove the restriction of samples_per_gpu == 1 when prepared
         samples_per_gpu = img[0].size(0)
         assert samples_per_gpu == 1
@@ -119,14 +122,15 @@ def forward_test(self, imgs, img_metas, **kwargs):
                 augs (multiscale, flip, etc.) and the inner list indicates
                 images in a batch.
         """
-        for var, name in [(imgs, 'imgs'), (img_metas, 'img_metas')]:
+        for var, name in [(imgs, "imgs"), (img_metas, "img_metas")]:
             if not isinstance(var, list):
-                raise TypeError(f'{name} must be a list, but got {type(var)}')
+                raise TypeError(f"{name} must be a list, but got {type(var)}")
 
         num_augs = len(imgs)
         if num_augs != len(img_metas):
-            raise ValueError(f'num of augmentations ({len(imgs)}) '
-                             f'!= num of image meta ({len(img_metas)})')
+            raise ValueError(
+                f"num of augmentations ({len(imgs)}) " f"!= num of image meta ({len(img_metas)})"
+            )
 
         # NOTE the batched image size information may be useful, e.g.
         # in DETR, this is needed for the construction of masks, which is
@@ -134,7 +138,7 @@ def forward_test(self, imgs, img_metas, **kwargs):
         for img, img_meta in zip(imgs, img_metas):
             batch_size = len(img_meta)
             for img_id in range(batch_size):
-                img_meta[img_id]['batch_input_shape'] = tuple(img.size()[-2:])
+                img_meta[img_id]["batch_input_shape"] = tuple(img.size()[-2:])
 
         if num_augs == 1:
             # proposals (List[List[Tensor]]): the outer list indicates
@@ -142,18 +146,18 @@ def forward_test(self, imgs, img_metas, **kwargs):
             # indicates images in a batch.
             # The Tensor should have a shape Px4, where P is the number of
             # proposals.
-            if 'proposals' in kwargs:
-                kwargs['proposals'] = kwargs['proposals'][0]
+            if "proposals" in kwargs:
+                kwargs["proposals"] = kwargs["proposals"][0]
             return self.simple_test(imgs[0], img_metas[0], **kwargs)
         else:
-            assert imgs[0].size(0) == 1, 'aug test does not support ' \
-                                         'inference with batch size ' \
-                                         f'{imgs[0].size(0)}'
+            assert imgs[0].size(0) == 1, (
+                "aug test does not support " "inference with batch size " f"{imgs[0].size(0)}"
+            )
             # TODO: support test augmentation for predefined proposals
-            assert 'proposals' not in kwargs
+            assert "proposals" not in kwargs
             return self.aug_test(imgs, img_metas, **kwargs)
 
-    @auto_fp16(apply_to=('img', ))
+    @auto_fp16(apply_to=("img",))
     def forward(self, img, img_metas, return_loss=True, **kwargs):
         """Calls either :func:`forward_train` or :func:`forward_test` depending
         on whether ``return_loss`` is ``True``.
@@ -192,23 +196,25 @@ def _parse_losses(self, losses):
             elif isinstance(loss_value, list):
                 log_vars[loss_name] = sum(_loss.mean() for _loss in loss_value)
             else:
-                raise TypeError(
-                    f'{loss_name} is not a tensor or list of tensors')
+                raise TypeError(f"{loss_name} is not a tensor or list of tensors")
 
-        loss = sum(_value for _key, _value in log_vars.items()
-                   if 'loss' in _key)
+        loss = sum(_value for _key, _value in log_vars.items() if "loss" in _key)
 
         # If the loss_vars has different length, GPUs will wait infinitely
         if dist.is_available() and dist.is_initialized():
             log_var_length = torch.tensor(len(log_vars), device=loss.device)
             dist.all_reduce(log_var_length)
-            message = (f'rank {dist.get_rank()}' +
-                       f' len(log_vars): {len(log_vars)}' + ' keys: ' +
-                       ','.join(log_vars.keys()))
-            assert log_var_length == len(log_vars) * dist.get_world_size(), \
-                'loss log variables are different across GPUs!\n' + message
-
-        log_vars['loss'] = loss
+            message = (
+                f"rank {dist.get_rank()}"
+                + f" len(log_vars): {len(log_vars)}"
+                + " keys: "
+                + ",".join(log_vars.keys())
+            )
+            assert log_var_length == len(log_vars) * dist.get_world_size(), (
+                "loss log variables are different across GPUs!\n" + message
+            )
+
+        log_vars["loss"] = loss
         for loss_name, loss_value in log_vars.items():
             # reduce loss when distributed training
             if dist.is_available() and dist.is_initialized():
@@ -248,8 +254,7 @@ def train_step(self, data, optimizer):
         losses = self(**data)
         loss, log_vars = self._parse_losses(losses)
 
-        outputs = dict(
-            loss=loss, log_vars=log_vars, num_samples=len(data['img_metas']))
+        outputs = dict(loss=loss, log_vars=log_vars, num_samples=len(data["img_metas"]))
 
         return outputs
 
@@ -265,27 +270,28 @@ def val_step(self, data, optimizer=None):
 
         log_vars_ = dict()
         for loss_name, loss_value in log_vars.items():
-            k = loss_name + '_val'
+            k = loss_name + "_val"
             log_vars_[k] = loss_value
 
-        outputs = dict(
-            loss=loss, log_vars=log_vars_, num_samples=len(data['img_metas']))
+        outputs = dict(loss=loss, log_vars=log_vars_, num_samples=len(data["img_metas"]))
 
         return outputs
 
-    def show_result(self,
-                    img,
-                    result,
-                    score_thr=0.3,
-                    bbox_color=(72, 101, 241),
-                    text_color=(72, 101, 241),
-                    mask_color=None,
-                    thickness=2,
-                    font_size=13,
-                    win_name='',
-                    show=False,
-                    wait_time=0,
-                    out_file=None):
+    def show_result(
+        self,
+        img,
+        result,
+        score_thr=0.3,
+        bbox_color=(72, 101, 241),
+        text_color=(72, 101, 241),
+        mask_color=None,
+        thickness=2,
+        font_size=13,
+        win_name="",
+        show=False,
+        wait_time=0,
+        out_file=None,
+    ):
         """Draw `result` over `img`.
 
         Args:
@@ -323,10 +329,7 @@ def show_result(self,
         else:
             bbox_result, segm_result = result, None
         bboxes = np.vstack(bbox_result)
-        labels = [
-            np.full(bbox.shape[0], i, dtype=np.int32)
-            for i, bbox in enumerate(bbox_result)
-        ]
+        labels = [np.full(bbox.shape[0], i, dtype=np.int32) for i, bbox in enumerate(bbox_result)]
         labels = np.concatenate(labels)
         # draw segmentation masks
         segms = None
@@ -355,11 +358,11 @@ def show_result(self,
             win_name=win_name,
             show=show,
             wait_time=wait_time,
-            out_file=out_file)
+            out_file=out_file,
+        )
 
         if not (show or out_file):
             return img
 
     def onnx_export(self, img, img_metas):
-        raise NotImplementedError(f'{self.__class__.__name__} does '
-                                  f'not support ONNX EXPORT')
+        raise NotImplementedError(f"{self.__class__.__name__} does " f"not support ONNX EXPORT")
diff --git a/mmdet/models/detectors/loft.py b/mmdet/models/detectors/loft.py
new file mode 100644
index 00000000..c2e4014a
--- /dev/null
+++ b/mmdet/models/detectors/loft.py
@@ -0,0 +1,760 @@
+import math
+import os
+
+import cv2
+import mmcv
+import numpy as np
+import torch
+from mmcv.runner import get_dist_info
+
+from ...core.mask.structures import BitmapMasks
+from ..builder import DETECTORS, build_head, build_loss
+from ..utils import offset_roof_to_footprint
+from .two_stage import TwoStageDetector
+
+
+@DETECTORS.register_module()
+class LOFT(TwoStageDetector):
+    def __init__(
+        self,
+        backbone,
+        rpn_head=None,
+        roi_head=None,
+        train_cfg=None,
+        test_cfg=None,
+        neck=None,
+        offset_angle_head=None,
+        nadir_angle_head=None,
+        loss_offset_angle_consistency=None,
+        pretrained=None,
+        init_cfg=None,
+    ):
+        super(LOFT, self).__init__(
+            backbone=backbone,
+            neck=neck,
+            rpn_head=rpn_head,
+            roi_head=roi_head,
+            train_cfg=train_cfg,
+            test_cfg=test_cfg,
+            pretrained=pretrained,
+            init_cfg=init_cfg,
+        )
+
+        if offset_angle_head is not None:
+            self.with_offset_angle_head = True
+            self.gt_footprint_mask_as_condition = offset_angle_head.pop(
+                "gt_footprint_mask_as_condition", False
+            )
+            self.gt_footprint_mask_repeat_num = offset_angle_head.pop(
+                "gt_footprint_mask_repeat_num", 32
+            )
+            if self.gt_footprint_mask_as_condition:
+                offset_angle_head["in_channels"] += self.gt_footprint_mask_repeat_num
+            self.offset_angle_head = build_head(offset_angle_head)
+            self.offset_angle_head.init_weights()
+        else:
+            self.with_offset_angle_head = False
+
+        if loss_offset_angle_consistency is not None:
+            self.loss_offset_angle_consistency_regular_lambda = loss_offset_angle_consistency.pop(
+                "regular_lambda"
+            )
+            self.loss_offset_angle_consistency = build_loss(loss_offset_angle_consistency)
+        else:
+            self.loss_offset_angle_consistency_regular_lambda = None
+            self.loss_offset_angle_consistency = None
+
+        if nadir_angle_head is not None:
+            self.with_nadir_angle_head = True
+            self.gt_height_mask_as_condition = nadir_angle_head.pop(
+                "gt_height_mask_as_condition", False
+            )
+            self.gt_height_mask_repeat_num = nadir_angle_head.pop("gt_height_mask_repeat_num", 32)
+            if self.gt_height_mask_as_condition:
+                nadir_angle_head["in_channels"] += self.gt_height_mask_repeat_num
+            self.nadir_angle_head = build_head(nadir_angle_head)
+            self.nadir_angle_head.init_weights()
+        else:
+            self.with_nadir_angle_head = False
+
+        self.anchor_bbox_vis = [[287, 433, 441, 541]]
+        self.with_vis_feat = True
+
+        if train_cfg:
+            self.pseudo_rpn_bboxes_wh_ratio = train_cfg.pseudo_rpn_bboxes_wh_ratio
+            self.offset_scale = train_cfg.offset_scale
+            self.resolution = train_cfg.resolution
+            self.shrunk_losses = train_cfg.shrunk_losses
+            self.shrunk_factor = train_cfg.shrunk_factor
+            self.pseudo_rpn_bbox_scale = train_cfg.pseudo_rpn_bbox_scale
+            self.use_pred_for_offset_angle_consistency = (
+                train_cfg.use_pred_for_offset_angle_consistency
+            )
+            self.footprint_mask_fro_loss_lambda = train_cfg.footprint_mask_fro_loss_lambda
+
+    def forward_train(
+        self,
+        img,
+        img_metas,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        proposals=None,
+        gt_offsets=None,
+        gt_heights=None,
+        gt_height_masks=None,
+        height_mask_shape=None,
+        gt_footprint_masks=None,
+        gt_image_scale_footprint_masks=None,
+        image_scale_footprint_mask_shape=None,
+        gt_footprint_bboxes=None,
+        gt_offset_angles=None,
+        gt_nadir_angles=None,
+        gt_is_semi_supervised_sample=None,
+        gt_is_valid_height_sample=None,
+        **kwargs,
+    ):
+        """
+        Args:
+            img (Tensor): of shape (N, C, H, W) encoding input images.
+                Typically these should be mean centered and std scaled.
+
+            img_metas (list[dict]): list of image info dict where each dict
+                has: 'img_shape', 'scale_factor', 'flip', and may also contain
+                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
+                For details on the values of these keys see
+                `mmdet/datasets/pipelines/formatting.py:Collect`.
+
+            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
+                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
+
+            gt_labels (list[Tensor]): class indices corresponding to each box
+
+            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
+                boxes can be ignored when computing the loss.
+
+            gt_masks (None | Tensor) : true segmentation masks for each box
+                used if the architecture supports a segmentation task.
+
+            proposals : override rpn proposals with custom proposals. Use when
+                `with_rpn` is False.
+
+            gt_offsets (None, list[Tensor]): offsets corresponding to each box
+
+            gt_heights (None, list[Tensor]): heights corresponding to each box
+
+            gt_footprint_masks (None, list[Tensor]): footprint mask corresponding to each box
+
+            gt_footprint_bboxes (None, list[Tensor]): footprint bboxes corresponding to each box
+
+            gt_is_semi_supervised_sample(None, int): whether this is a semi-supervised batch
+
+
+        Returns:
+            dict[str, Tensor]: a dictionary of loss components
+        """
+        x = self.extract_feat(img)
+
+        losses = dict()
+
+        # Judge whether in semi-supervised stage.
+        is_semi_supervised_stage = bool(kwargs.pop("is_semi_supervised_stage")[0][0])
+
+        # Judge whether this batch is a semi-supervised one.
+        is_semi_supervised_batch = bool(gt_is_semi_supervised_sample[0][0])
+
+        # Judge whether this batch has valid height annotation.
+        is_valid_height_batch = bool(gt_is_valid_height_sample[0][0])
+
+        # Judge whether the pseudo_gt can be used for BP.
+        is_valid_pseudo_gt_for_bp = (
+            self.with_offset_angle_head and self.with_nadir_angle_head and is_valid_height_batch
+        )
+
+        if self.with_offset_angle_head:
+            if self.gt_footprint_mask_as_condition:
+                image_scale_footprint_mask_shape = (
+                    image_scale_footprint_mask_shape[0]
+                    .detach()
+                    .cpu()
+                    .numpy()
+                    .copy()
+                    .astype(np.int32)
+                )
+                gt_image_scale_footprint_masks_np = np.stack(
+                    [
+                        m.resize(image_scale_footprint_mask_shape.tolist()).masks.repeat(32, axis=0)
+                        for m in gt_image_scale_footprint_masks
+                    ],
+                    axis=0,
+                )
+                gt_image_scale_footprint_masks_t = torch.from_numpy(
+                    gt_image_scale_footprint_masks_np
+                ).to(x[0].device)
+                offset_angle_head_input = torch.cat([x[0], gt_image_scale_footprint_masks_t], dim=1)
+            else:
+                offset_angle_head_input = x[0]
+            offset_angles_pred, offset_angle_loss = self.offset_angle_head.forward_train(
+                offset_angle_head_input, img_metas, gt_offset_angles
+            )
+            if is_semi_supervised_batch:
+                offset_angle_loss["loss_offset_angle"] *= 0.0
+
+            losses.update(offset_angle_loss)
+
+        if self.with_nadir_angle_head:
+            if self.gt_height_mask_as_condition:
+                height_mask_shape = (
+                    height_mask_shape[0].detach().cpu().numpy().copy().astype(np.int32)
+                )
+                gt_height_masks_np = np.stack(
+                    [
+                        m.resize(height_mask_shape.tolist()).masks.repeat(32, axis=0)
+                        for m in gt_height_masks
+                    ],
+                    axis=0,
+                )
+                gt_height_masks_t = torch.from_numpy(gt_height_masks_np).to(x[0].device)
+                nadir_angle_head_input = torch.cat([x[0], gt_height_masks_t], dim=1)
+            else:
+                nadir_angle_head_input = x[0]
+            nadir_angles_pred, nadir_angle_loss = self.nadir_angle_head.forward_train(
+                nadir_angle_head_input, img_metas, gt_nadir_angles
+            )
+
+            if is_semi_supervised_batch:
+                nadir_angle_loss["loss_nadir_angle"] *= 0.0
+
+            losses.update(nadir_angle_loss)
+
+        # Calculate pseudo_gt_bboxes when training a semi-supervised batch.
+        if is_semi_supervised_batch:
+            if is_valid_pseudo_gt_for_bp:
+                (
+                    pseudo_gt_bboxes,
+                    pseudo_gt_offsets,
+                ) = self._calculate_rpn_pseudo_bboxes_from_angle_and_height(
+                    gt_footprint_bboxes,
+                    gt_heights,
+                    offset_angles_pred,
+                    nadir_angles_pred,
+                    img_metas,
+                )
+            else:
+                pseudo_gt_bboxes, _ = self._calculate_rpn_pseudo_bboxes_from_scale_up_footprint(
+                    gt_footprint_bboxes, img_metas
+                )
+
+            gt_bboxes_for_vis = None
+        else:
+            pseudo_gt_bboxes = None
+            gt_bboxes_for_vis = gt_bboxes
+
+        # Visualization.
+        if self.with_offset_angle_head:
+            offset_angles_for_vis = offset_angles_pred
+        else:
+            offset_angles_for_vis = None
+
+        if self.with_nadir_angle_head:
+            nadir_angles_for_vis = nadir_angles_pred
+        else:
+            nadir_angles_for_vis = None
+
+        # visualize_bboxes_offset_angles(
+        #     img,
+        #     img_metas,
+        #     gt_bboxes_for_vis,
+        #     pseudo_gt_bboxes,
+        #     gt_footprint_bboxes,
+        #     gt_offset_angles,
+        #     offset_angles_for_vis,
+        #     gt_nadir_angles,
+        #     nadir_angles_for_vis,
+        #     is_semi_supervised_batch,
+        #     is_valid_height_batch,
+        # )
+
+        # Choose the gts for RPN and RoI.
+        if is_semi_supervised_batch:
+            gt_bboxes_ = pseudo_gt_bboxes
+            if is_valid_pseudo_gt_for_bp:
+                gt_offsets_ = pseudo_gt_offsets
+                gt_masks_ = _offset_footprint_to_roof(pseudo_gt_offsets, gt_footprint_masks)
+            else:
+                gt_offsets_ = [torch.zeros_like(gt_offset) for gt_offset in gt_offsets]
+                gt_masks_ = [gt_mask for gt_mask in gt_masks]
+        else:
+            gt_bboxes_ = gt_bboxes
+            gt_offsets_ = gt_offsets
+            gt_masks_ = gt_masks
+
+        # RPN forward and loss.
+        if self.with_rpn:
+            proposal_cfg = self.train_cfg.get("rpn_proposal", self.test_cfg.rpn)
+            rpn_losses, proposal_list = self.rpn_head.forward_train(
+                x,
+                img_metas,
+                gt_bboxes_,
+                gt_labels=None,
+                gt_bboxes_ignore=gt_bboxes_ignore,
+                proposal_cfg=proposal_cfg,
+                is_semi_supervised_batch=is_semi_supervised_batch,
+                **kwargs,
+            )
+
+            # Deal with rpn losses according to config and training samples.
+            if is_semi_supervised_batch:
+                if is_semi_supervised_stage:
+                    # RPN loss should be ignored when training height-invalid batch.
+                    if not is_valid_height_batch:
+                        rpn_losses["loss_rpn_cls"] = [e * 0.0 for e in rpn_losses["loss_rpn_cls"]]
+                        rpn_losses["loss_rpn_bbox"] = [e * 0.0 for e in rpn_losses["loss_rpn_bbox"]]
+                    # RPN loss shrinks or not.
+                    if "rpn_cls" in self.shrunk_losses:
+                        rpn_losses["loss_rpn_cls"] = [
+                            e * self.shrunk_factor for e in rpn_losses["loss_rpn_cls"]
+                        ]
+                    if "rpn_bbox" in self.shrunk_losses:
+                        rpn_losses["loss_rpn_bbox"] = [
+                            e * self.shrunk_factor for e in rpn_losses["loss_rpn_bbox"]
+                        ]
+                else:
+                    rpn_losses["loss_rpn_cls"] = [e * 0.0 for e in rpn_losses["loss_rpn_cls"]]
+                    rpn_losses["loss_rpn_bbox"] = [e * 0.0 for e in rpn_losses["loss_rpn_bbox"]]
+
+            losses.update(rpn_losses)
+        else:
+            proposal_list = proposals
+
+        if self.with_offset_angle_head:
+            if is_semi_supervised_batch:
+                offset_angles_for_roi = offset_angles_pred
+            else:
+                offset_angles_for_roi = gt_offset_angles
+        else:
+            offset_angles_for_roi = gt_offset_angles
+
+        roi_losses = self.roi_head.forward_train(
+            x,
+            img_metas,
+            proposal_list,
+            gt_bboxes_,
+            gt_labels,
+            gt_bboxes_ignore,
+            gt_masks_,
+            gt_offsets_,
+            gt_heights,
+            gt_footprint_masks,
+            gt_footprint_bboxes,
+            offset_angles_for_roi,
+            self.loss_offset_angle_consistency,
+            self.loss_offset_angle_consistency_regular_lambda,
+            is_semi_supervised_batch,
+            is_semi_supervised_stage,
+            is_valid_height_batch,
+            self.use_pred_for_offset_angle_consistency,
+            self.with_offset_angle_head,
+            self.shrunk_losses,
+            self.shrunk_factor,
+            self.footprint_mask_fro_loss_lambda,
+            img,
+            **kwargs,
+        )
+
+        losses.update(roi_losses)
+
+        return losses
+
+    def simple_test(self, img, img_metas, proposals=None, rescale=False):
+        """Test without augmentation."""
+
+        assert self.with_bbox, "Bbox head must be implemented."
+        x = self.extract_feat(img)
+        if proposals is None:
+            proposal_list = self.rpn_head.simple_test_rpn(x, img_metas)
+        else:
+            proposal_list = proposals
+        offset_angle = (
+            self.offset_angle_head.simple_test(x[0]).cpu().numpy()
+            if self.with_offset_angle_head
+            else [None for _ in img_metas]
+        )
+        nadir_angle = (
+            self.nadir_angle_head.simple_test(x[0]).cpu().numpy()
+            if self.with_nadir_angle_head
+            else [None for _ in img_metas]
+        )
+        original_results = self.roi_head.simple_test(x, proposal_list, img_metas, rescale=rescale)
+        original_results = list(original_results)
+        original_results.append(offset_angle)
+        original_results.append(nadir_angle)
+        return tuple(original_results)
+
+    def _calculate_rpn_pseudo_bboxes_from_scale_up_footprint(self, gt_footprint_bboxes, img_metas):
+        device = gt_footprint_bboxes[0].device
+        pseudo_gt_bboxes = []
+        for bboxes, img_meta in zip(gt_footprint_bboxes, img_metas):
+            # Get the box boundary of each footprint bbox.
+            footprint_x_tl = torch.unsqueeze(bboxes[:, 0], 1)
+            footprint_y_tl = torch.unsqueeze(bboxes[:, 1], 1)
+            footprint_x_br = torch.unsqueeze(bboxes[:, 2], 1)
+            footprint_y_br = torch.unsqueeze(bboxes[:, 3], 1)
+
+            # Get the wh of each footprint bbox.
+            footprint_w = footprint_x_br - footprint_x_tl
+            footprint_h = footprint_y_br - footprint_y_tl
+
+            # Calculate the center of each footprint bbox.
+            footprint_x_c = (footprint_x_br + footprint_x_tl) / 2
+            footprint_y_c = (footprint_y_br + footprint_y_tl) / 2
+
+            # The enlarged footprint bbox with tl point fixed.
+            bbox_x_tl = footprint_x_c - footprint_w * self.pseudo_rpn_bboxes_wh_ratio[0] / 2
+            bbox_y_tl = footprint_y_c - footprint_h * self.pseudo_rpn_bboxes_wh_ratio[1] / 2
+            bbox_x_br = footprint_x_c + footprint_w * self.pseudo_rpn_bboxes_wh_ratio[0] / 2
+            bbox_y_br = footprint_y_c + footprint_h * self.pseudo_rpn_bboxes_wh_ratio[1] / 2
+
+            # Calculate the image boundary of each image.
+            img_x_min = torch.zeros(footprint_x_tl.shape, device=device)
+            img_y_min = torch.zeros(footprint_y_tl.shape, device=device)
+            img_x_max = torch.ones(footprint_x_tl.shape, device=device)
+            img_y_max = torch.ones(footprint_y_tl.shape, device=device)
+
+            img_x_max *= img_meta["img_shape"][1]
+            img_y_max *= img_meta["img_shape"][0]
+
+            # Clip the building bbox with image.
+            bbox_x_tl = torch.max(torch.cat((bbox_x_tl, img_x_min), 1), 1, keepdim=True)[0]
+            bbox_y_tl = torch.max(torch.cat((bbox_y_tl, img_y_min), 1), 1, keepdim=True)[0]
+            bbox_x_br = torch.min(torch.cat((bbox_x_br, img_x_max), 1), 1, keepdim=True)[0]
+            bbox_y_br = torch.min(torch.cat((bbox_y_br, img_y_max), 1), 1, keepdim=True)[0]
+
+            pseudo_gt_bboxes.append(torch.concat((bbox_x_tl, bbox_y_tl, bbox_x_br, bbox_y_br), 1))
+
+        return pseudo_gt_bboxes, pseudo_gt_bboxes
+
+    def _calculate_rpn_pseudo_bboxes_from_angle_and_height(
+        self, gt_footprint_bboxes, gt_heights, offset_angles, nadir_angles, img_metas
+    ):
+        device = gt_footprint_bboxes[0].device
+        pseudo_gt_bboxes = []
+        pseudo_gt_roof_bboxes = []
+        pseudo_gt_offsets = []
+        for bboxes, heights, o_angle, n_angle, img_meta in zip(
+            gt_footprint_bboxes, gt_heights, offset_angles, nadir_angles, img_metas
+        ):
+            o_angle = o_angle.detach()
+            n_angle = n_angle.detach()
+
+            # Get the box boundary of each footprint bbox.
+            footprint_x_tl = torch.unsqueeze(bboxes[:, 0], 1)
+            footprint_y_tl = torch.unsqueeze(bboxes[:, 1], 1)
+            footprint_x_br = torch.unsqueeze(bboxes[:, 2], 1)
+            footprint_y_br = torch.unsqueeze(bboxes[:, 3], 1)
+
+            # Calculate the offset.
+            offset_norm = heights * n_angle / self.resolution
+            offset_norm /= math.sqrt(o_angle[0] ** 2 + o_angle[1] ** 2)
+            offset_norm *= self.offset_scale
+            offset_x = offset_norm * o_angle[1]
+            offset_y = offset_norm * o_angle[0]
+            offset = torch.cat((offset_x, offset_y), dim=1)
+
+            # Get the box boundary of each roof bbox.
+            roof_x_tl = footprint_x_tl + offset_x
+            roof_y_tl = footprint_y_tl + offset_y
+            roof_x_br = footprint_x_br + offset_x
+            roof_y_br = footprint_y_br + offset_y
+
+            # Get the box center of each roof bbox.
+            roof_x_c = (roof_x_tl + roof_x_br) / 2
+            roof_y_c = (roof_y_tl + roof_y_br) / 2
+
+            # Get the wh of each roof bbox.
+            roof_w = roof_x_br - roof_x_tl
+            roof_h = roof_y_br - roof_y_tl
+
+            # scale the box boundary of each roof bbox.
+            roof_x_tl = roof_x_c - roof_w / 2 * self.pseudo_rpn_bbox_scale
+            roof_x_br = roof_x_c + roof_w / 2 * self.pseudo_rpn_bbox_scale
+            roof_y_tl = roof_y_c - roof_h / 2 * self.pseudo_rpn_bbox_scale
+            roof_y_br = roof_y_c + roof_h / 2 * self.pseudo_rpn_bbox_scale
+
+            # Get the box boundary of each building bbox.
+            bbox_x_tl = torch.min(torch.cat((footprint_x_tl, roof_x_tl), 1), 1, keepdim=True)[0]
+            bbox_y_tl = torch.min(torch.cat((footprint_y_tl, roof_y_tl), 1), 1, keepdim=True)[0]
+            bbox_x_br = torch.max(torch.cat((footprint_x_br, roof_x_br), 1), 1, keepdim=True)[0]
+            bbox_y_br = torch.max(torch.cat((footprint_y_br, roof_y_br), 1), 1, keepdim=True)[0]
+
+            # Calculate the image boundary of each image.
+            img_x_min = torch.zeros(footprint_x_tl.shape, device=device)
+            img_y_min = torch.zeros(footprint_y_tl.shape, device=device)
+            img_x_max = torch.ones(footprint_x_tl.shape, device=device)
+            img_y_max = torch.ones(footprint_y_tl.shape, device=device)
+
+            img_x_max *= img_meta["img_shape"][1]
+            img_y_max *= img_meta["img_shape"][0]
+
+            # Clip the building bbox with image.
+            bbox_x_tl = torch.max(torch.cat((bbox_x_tl, img_x_min), 1), 1, keepdim=True)[0]
+            bbox_y_tl = torch.max(torch.cat((bbox_y_tl, img_y_min), 1), 1, keepdim=True)[0]
+            bbox_x_br = torch.min(torch.cat((bbox_x_br, img_x_max), 1), 1, keepdim=True)[0]
+            bbox_y_br = torch.min(torch.cat((bbox_y_br, img_y_max), 1), 1, keepdim=True)[0]
+
+            pseudo_gt_bboxes.append(torch.concat((bbox_x_tl, bbox_y_tl, bbox_x_br, bbox_y_br), 1))
+            pseudo_gt_roof_bboxes.append(
+                torch.concat((roof_x_tl, roof_y_tl, roof_x_br, roof_y_br), 1)
+            )
+            pseudo_gt_offsets.append(offset)
+
+        return pseudo_gt_bboxes, pseudo_gt_offsets
+
+    def show_result(
+        self,
+        img,
+        result,
+        score_thr=0.8,
+        bbox_color="green",
+        text_color="green",
+        thickness=1,
+        font_scale=0.5,
+        win_name="",
+        show=False,
+        wait_time=0,
+        out_file=None,
+    ):
+        img = mmcv.imread(img)
+        img = img.copy()
+        if isinstance(result, tuple):
+            if self.with_vis_feat:
+                bbox_result, segm_result, offset, offset_features = result
+            else:
+                bbox_result, segm_result, offset = result
+            if isinstance(segm_result, tuple):
+                segm_result = segm_result[0]  # ms rcnn
+        else:
+            bbox_result, segm_result = result, None
+        if isinstance(offset, tuple):
+            offsets = offset[0]
+        else:
+            offsets = offset
+
+        # rotate offset
+        # offsets = self.offset_rotate(offsets, 0)
+
+        bboxes = np.vstack(bbox_result)
+        scores = bboxes[:, -1]
+        bboxes = bboxes[:, 0:-1]
+
+        w, h = bboxes[:, 2] - bboxes[:, 0], bboxes[:, 3] - bboxes[:, 1]
+        area = w * h
+        # valid_inds = np.argsort(area, axis=0)[::-1].squeeze()
+        valid_inds = np.where(np.sqrt(area) > 50)[0]
+
+        if segm_result is not None:  # non empty
+            segms = mmcv.concat_list(segm_result)
+            inds = np.where(scores > 0.4)[0][:]
+
+            masks = []
+            offset_results = []
+            bbox_results = []
+            offset_feats = []
+            for i in inds:
+                if i not in valid_inds:
+                    continue
+                mask = segms[i]
+                offset = offsets[i]
+                if self.with_vis_feat:
+                    offset_feat = offset_features[i]
+                else:
+                    offset_feat = []
+                bbox = bboxes[i]
+
+                gray = np.array(mask * 255, dtype=np.uint8)
+
+                contours = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+                contours = contours[0] if len(contours) == 2 else contours[1]
+
+                if contours != []:
+                    cnt = max(contours, key=cv2.contourArea)
+                    mask = np.array(cnt).reshape(1, -1).tolist()[0]
+                else:
+                    continue
+
+                masks.append(mask)
+                offset_results.append(offset)
+                bbox_results.append(bbox)
+                offset_feats.append(offset_feat)
+
+    def offset_coordinate_transform(self, offset, transform_flag="xy2la"):
+        """transform the coordinate of offsets
+
+        Args:
+            offset (list): list of offset
+            transform_flag (str, optional): flag of transform. Defaults to 'xy2la'.
+
+        Raises:
+            NotImplementedError: [description]
+
+        Returns:
+            list: transformed offsets
+        """
+        if transform_flag == "xy2la":
+            offset_x, offset_y = offset
+            length = math.sqrt(offset_x**2 + offset_y**2)
+            angle = math.atan2(offset_y, offset_x)
+            offset = [length, angle]
+        elif transform_flag == "la2xy":
+            length, angle = offset
+            offset_x = length * np.cos(angle)
+            offset_y = length * np.sin(angle)
+            offset = [offset_x, offset_y]
+        else:
+            raise NotImplementedError
+
+        return offset
+
+    def offset_rotate(self, offsets, rotate_angle):
+        offsets = [
+            self.offset_coordinate_transform(offset, transform_flag="xy2la") for offset in offsets
+        ]
+
+        offsets = [[offset[0], offset[1] + rotate_angle * np.pi / 180.0] for offset in offsets]
+
+        offsets = [
+            self.offset_coordinate_transform(offset, transform_flag="la2xy") for offset in offsets
+        ]
+
+        return np.array(offsets, dtype=np.float32)
+
+
+def visualize_bboxes_offset_angles(
+    img,
+    img_metas,
+    gt_bboxes,
+    pseudo_gt_bboxes,
+    footprint_bboxes,
+    gt_offset_angles,
+    pred_offset_angles,
+    gt_nadir_angles,
+    pred_nadir_angles,
+    is_semi_supervised_batch,
+    is_valid_height_batch,
+):
+    """The func for visualizing gt or pseudo bboxes or angles."""
+    save_path = "tmp/RPN_bboxes_visualization_" + "4/"
+    rank, _ = get_dist_info()
+    if rank == 0:
+        if not os.path.exists(save_path):
+            os.mkdir(save_path)
+
+    mean = np.array([123.675, 116.28, 103.53], dtype=np.float64).reshape(1, -1)
+    std = np.array([58.395, 57.12, 57.375], dtype=np.float64).reshape(1, -1)
+
+    img_batch = img.detach().cpu().numpy().copy()
+
+    footprint_bboxes = [sb.detach().cpu().numpy().astype(np.int32) for sb in footprint_bboxes]
+
+    if gt_bboxes:
+        gt_bboxes = [sb.detach().cpu().numpy().astype(np.int32) for sb in gt_bboxes]
+    else:
+        gt_bboxes = [None for _ in footprint_bboxes]
+
+    if pseudo_gt_bboxes:
+        pseudo_gt_bboxes = [sb.detach().cpu().numpy().astype(np.int32) for sb in pseudo_gt_bboxes]
+    else:
+        pseudo_gt_bboxes = [None for _ in footprint_bboxes]
+
+    if pred_offset_angles is None:
+        pred_offset_angles = [None for _ in footprint_bboxes]
+
+    if pred_nadir_angles is None:
+        pred_nadir_angles = [None for _ in footprint_bboxes]
+
+    for img, sbs, spbs, sfbs, goa, poa, gna, pna, img_meta in zip(
+        img_batch,
+        gt_bboxes,
+        pseudo_gt_bboxes,
+        footprint_bboxes,
+        gt_offset_angles,
+        pred_offset_angles,
+        gt_nadir_angles,
+        pred_nadir_angles,
+        img_metas,
+    ):
+        img = np.ascontiguousarray(img.transpose(1, 2, 0))
+        cv2.multiply(img, std, img)
+        cv2.add(img, mean, img)
+        cv2.cvtColor(img, cv2.COLOR_RGB2BGR, img)
+        img = img.astype(np.uint8).copy()
+
+        if sbs is not None:
+            pts = [
+                np.array([[sb[0], sb[1]], [sb[2], sb[1]], [sb[2], sb[3]], [sb[0], sb[3]]])
+                for sb in sbs
+            ]
+            cv2.polylines(img, pts, True, (144, 238, 144), 2)
+
+        if spbs is not None:
+            pts = [
+                np.array([[sb[0], sb[1]], [sb[2], sb[1]], [sb[2], sb[3]], [sb[0], sb[3]]])
+                for sb in spbs
+            ]
+            cv2.polylines(img, pts, True, (0, 0, 200), 2)
+
+        pts = [
+            np.array([[sb[0], sb[1]], [sb[2], sb[1]], [sb[2], sb[3]], [sb[0], sb[3]]])
+            for sb in sfbs
+        ]
+        cv2.polylines(img, pts, True, (200, 0, 0), 2)
+
+        canvas = np.zeros_like(img, dtype=np.uint8)
+        go = _get_offset_from_offset_angle(goa[0])
+        cv2.polylines(canvas, go, True, (0, 0, 255), 4)
+        if poa is not None:
+            po = _get_offset_from_offset_angle(poa)
+            cv2.polylines(canvas, po, True, (0, 255, 0), 4)
+        gn = _get_offset_from_nadir_angle(gna[0])
+        cv2.polylines(canvas, gn, True, (0, 255, 255), 4)
+        if pna is not None:
+            pn = _get_offset_from_nadir_angle(pna)
+            cv2.polylines(canvas, pn, True, (128, 0, 0), 4)
+        cv2.circle(canvas, (512, 512), 500, (255, 255, 255))
+
+        img = np.concatenate((img, canvas), axis=1)
+
+        file_name = img_meta["filename"].rsplit("/", 1)[1]
+        file_name_split = file_name.rsplit(".", 1)
+        file_name_split[0] += "_pseudo" if is_semi_supervised_batch else "_gt"
+        file_name_split[0] += "_vh" if is_valid_height_batch else "_ih"
+        file_name = save_path + ".".join((file_name_split[0], file_name_split[1]))
+        cv2.imwrite(file_name, img)
+
+
+def _get_offset_from_offset_angle(angle):
+    angle = angle.detach().cpu().numpy().astype(np.float32)
+    start_x, start_y = [512, 512]
+    norm = 500
+    offset_y, offset_x = angle[0] * norm, angle[1] * norm
+    stop_point = [start_x + offset_x, start_y + offset_y]
+    return [np.array([[start_x, start_y], stop_point], dtype=np.int32)]
+
+
+def _get_offset_from_nadir_angle(angle):
+    angle = angle.detach().cpu().numpy().astype(np.float32)
+    start_x, start_y = [512, 512]
+    norm = 500
+    angle_norm = math.sqrt(angle**2 + 1.0)
+    angle = [angle / angle_norm, 1.0 / angle_norm]
+    offset_y, offset_x = angle[0] * norm, angle[1] * norm
+    stop_point = [start_x + offset_x, start_y + offset_y]
+    return [np.array([[start_x, start_y], stop_point], dtype=np.int32)]
+
+
+def _offset_footprint_to_roof(offsets, footprints):
+    footprints_ = [[[e for e in m.masks]] for m in footprints]
+    h, w = footprints_[0][0][0].shape
+    roofs = offset_roof_to_footprint(offsets, footprints_)
+    return [BitmapMasks(np.array(r)[0], h, w) for r in roofs]
diff --git a/mmdet/models/detectors/two_stage.py b/mmdet/models/detectors/two_stage.py
index 870e2b84..76fd8dfc 100644
--- a/mmdet/models/detectors/two_stage.py
+++ b/mmdet/models/detectors/two_stage.py
@@ -15,19 +15,22 @@ class TwoStageDetector(BaseDetector):
     task-specific regression head.
     """
 
-    def __init__(self,
-                 backbone,
-                 neck=None,
-                 rpn_head=None,
-                 roi_head=None,
-                 train_cfg=None,
-                 test_cfg=None,
-                 pretrained=None,
-                 init_cfg=None):
+    def __init__(
+        self,
+        backbone,
+        neck=None,
+        rpn_head=None,
+        roi_head=None,
+        train_cfg=None,
+        test_cfg=None,
+        pretrained=None,
+        init_cfg=None,
+    ):
         super(TwoStageDetector, self).__init__(init_cfg)
         if pretrained:
-            warnings.warn('DeprecationWarning: pretrained is deprecated, '
-                          'please use "init_cfg" instead')
+            warnings.warn(
+                "DeprecationWarning: pretrained is deprecated, " 'please use "init_cfg" instead'
+            )
             backbone.pretrained = pretrained
         self.backbone = build_backbone(backbone)
 
@@ -55,12 +58,12 @@ def __init__(self,
     @property
     def with_rpn(self):
         """bool: whether the detector has RPN"""
-        return hasattr(self, 'rpn_head') and self.rpn_head is not None
+        return hasattr(self, "rpn_head") and self.rpn_head is not None
 
     @property
     def with_roi_head(self):
         """bool: whether the detector has a RoI head"""
-        return hasattr(self, 'roi_head') and self.roi_head is not None
+        return hasattr(self, "roi_head") and self.roi_head is not None
 
     def extract_feat(self, img):
         """Directly extract features from the backbone+neck."""
@@ -80,22 +83,24 @@ def forward_dummy(self, img):
         # rpn
         if self.with_rpn:
             rpn_outs = self.rpn_head(x)
-            outs = outs + (rpn_outs, )
+            outs = outs + (rpn_outs,)
         proposals = torch.randn(1000, 4).to(img.device)
         # roi_head
         roi_outs = self.roi_head.forward_dummy(x, proposals)
-        outs = outs + (roi_outs, )
+        outs = outs + (roi_outs,)
         return outs
 
-    def forward_train(self,
-                      img,
-                      img_metas,
-                      gt_bboxes,
-                      gt_labels,
-                      gt_bboxes_ignore=None,
-                      gt_masks=None,
-                      proposals=None,
-                      **kwargs):
+    def forward_train(
+        self,
+        img,
+        img_metas,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        proposals=None,
+        **kwargs,
+    ):
         """
         Args:
             img (Tensor): of shape (N, C, H, W) encoding input images.
@@ -130,8 +135,7 @@ def forward_train(self,
 
         # RPN forward and loss
         if self.with_rpn:
-            proposal_cfg = self.train_cfg.get('rpn_proposal',
-                                              self.test_cfg.rpn)
+            proposal_cfg = self.train_cfg.get("rpn_proposal", self.test_cfg.rpn)
             rpn_losses, proposal_list = self.rpn_head.forward_train(
                 x,
                 img_metas,
@@ -139,49 +143,42 @@ def forward_train(self,
                 gt_labels=None,
                 gt_bboxes_ignore=gt_bboxes_ignore,
                 proposal_cfg=proposal_cfg,
-                **kwargs)
+                **kwargs,
+            )
             losses.update(rpn_losses)
         else:
             proposal_list = proposals
 
-        roi_losses = self.roi_head.forward_train(x, img_metas, proposal_list,
-                                                 gt_bboxes, gt_labels,
-                                                 gt_bboxes_ignore, gt_masks,
-                                                 **kwargs)
+        roi_losses = self.roi_head.forward_train(
+            x, img_metas, proposal_list, gt_bboxes, gt_labels, gt_bboxes_ignore, gt_masks, **kwargs
+        )
         losses.update(roi_losses)
 
         return losses
 
-    async def async_simple_test(self,
-                                img,
-                                img_meta,
-                                proposals=None,
-                                rescale=False):
+    async def async_simple_test(self, img, img_meta, proposals=None, rescale=False):
         """Async test without augmentation."""
-        assert self.with_bbox, 'Bbox head must be implemented.'
+        assert self.with_bbox, "Bbox head must be implemented."
         x = self.extract_feat(img)
 
         if proposals is None:
-            proposal_list = await self.rpn_head.async_simple_test_rpn(
-                x, img_meta)
+            proposal_list = await self.rpn_head.async_simple_test_rpn(x, img_meta)
         else:
             proposal_list = proposals
 
-        return await self.roi_head.async_simple_test(
-            x, proposal_list, img_meta, rescale=rescale)
+        return await self.roi_head.async_simple_test(x, proposal_list, img_meta, rescale=rescale)
 
     def simple_test(self, img, img_metas, proposals=None, rescale=False):
         """Test without augmentation."""
 
-        assert self.with_bbox, 'Bbox head must be implemented.'
+        assert self.with_bbox, "Bbox head must be implemented."
         x = self.extract_feat(img)
         if proposals is None:
             proposal_list = self.rpn_head.simple_test_rpn(x, img_metas)
         else:
             proposal_list = proposals
 
-        return self.roi_head.simple_test(
-            x, proposal_list, img_metas, rescale=rescale)
+        return self.roi_head.simple_test(x, proposal_list, img_metas, rescale=rescale)
 
     def aug_test(self, imgs, img_metas, rescale=False):
         """Test with augmentations.
@@ -191,21 +188,19 @@ def aug_test(self, imgs, img_metas, rescale=False):
         """
         x = self.extract_feats(imgs)
         proposal_list = self.rpn_head.aug_test_rpn(x, img_metas)
-        return self.roi_head.aug_test(
-            x, proposal_list, img_metas, rescale=rescale)
+        return self.roi_head.aug_test(x, proposal_list, img_metas, rescale=rescale)
 
     def onnx_export(self, img, img_metas):
-
         img_shape = torch._shape_as_tensor(img)[2:]
-        img_metas[0]['img_shape_for_onnx'] = img_shape
+        img_metas[0]["img_shape_for_onnx"] = img_shape
         x = self.extract_feat(img)
         proposals = self.rpn_head.onnx_export(x, img_metas)
-        if hasattr(self.roi_head, 'onnx_export'):
+        if hasattr(self.roi_head, "onnx_export"):
             return self.roi_head.onnx_export(x, proposals, img_metas)
         else:
             raise NotImplementedError(
-                f'{self.__class__.__name__} can not '
-                f'be exported to ONNX. Please refer to the '
-                f'list of supported models,'
-                f'https://mmdetection.readthedocs.io/en/latest/tutorials/pytorch2onnx.html#list-of-supported-models-exportable-to-onnx'  # noqa E501
+                f"{self.__class__.__name__} can not "
+                f"be exported to ONNX. Please refer to the "
+                f"list of supported models,"
+                f"https://mmdetection.readthedocs.io/en/latest/tutorials/pytorch2onnx.html#list-of-supported-models-exportable-to-onnx"  # noqa E501
             )
diff --git a/mmdet/models/losses/smooth_l1_loss.py b/mmdet/models/losses/smooth_l1_loss.py
index 55117467..dcf185d6 100644
--- a/mmdet/models/losses/smooth_l1_loss.py
+++ b/mmdet/models/losses/smooth_l1_loss.py
@@ -27,8 +27,7 @@ def smooth_l1_loss(pred, target, beta=1.0):
 
     assert pred.size() == target.size()
     diff = torch.abs(pred - target)
-    loss = torch.where(diff < beta, 0.5 * diff * diff / beta,
-                       diff - 0.5 * beta)
+    loss = torch.where(diff < beta, 0.5 * diff * diff / beta, diff - 0.5 * beta)
     return loss
 
 
@@ -46,7 +45,6 @@ def l1_loss(pred, target):
     """
     if target.numel() == 0:
         return pred.sum() * 0
-
     assert pred.size() == target.size()
     loss = torch.abs(pred - target)
     return loss
@@ -64,19 +62,15 @@ class SmoothL1Loss(nn.Module):
         loss_weight (float, optional): The weight of loss.
     """
 
-    def __init__(self, beta=1.0, reduction='mean', loss_weight=1.0):
+    def __init__(self, beta=1.0, reduction="mean", loss_weight=1.0):
         super(SmoothL1Loss, self).__init__()
         self.beta = beta
         self.reduction = reduction
         self.loss_weight = loss_weight
 
-    def forward(self,
-                pred,
-                target,
-                weight=None,
-                avg_factor=None,
-                reduction_override=None,
-                **kwargs):
+    def forward(
+        self, pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs
+    ):
         """Forward function.
 
         Args:
@@ -90,9 +84,8 @@ def forward(self,
                 override the original reduction method of the loss.
                 Defaults to None.
         """
-        assert reduction_override in (None, 'none', 'mean', 'sum')
-        reduction = (
-            reduction_override if reduction_override else self.reduction)
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
         loss_bbox = self.loss_weight * smooth_l1_loss(
             pred,
             target,
@@ -100,7 +93,8 @@ def forward(self,
             beta=self.beta,
             reduction=reduction,
             avg_factor=avg_factor,
-            **kwargs)
+            **kwargs,
+        )
         return loss_bbox
 
 
@@ -114,17 +108,12 @@ class L1Loss(nn.Module):
         loss_weight (float, optional): The weight of loss.
     """
 
-    def __init__(self, reduction='mean', loss_weight=1.0):
+    def __init__(self, reduction="mean", loss_weight=1.0):
         super(L1Loss, self).__init__()
         self.reduction = reduction
         self.loss_weight = loss_weight
 
-    def forward(self,
-                pred,
-                target,
-                weight=None,
-                avg_factor=None,
-                reduction_override=None):
+    def forward(self, pred, target, weight=None, avg_factor=None, reduction_override=None):
         """Forward function.
 
         Args:
@@ -138,9 +127,9 @@ def forward(self,
                 override the original reduction method of the loss.
                 Defaults to None.
         """
-        assert reduction_override in (None, 'none', 'mean', 'sum')
-        reduction = (
-            reduction_override if reduction_override else self.reduction)
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
         loss_bbox = self.loss_weight * l1_loss(
-            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
+            pred, target, weight, reduction=reduction, avg_factor=avg_factor
+        )
         return loss_bbox
diff --git a/mmdet/models/roi_heads/__init__.py b/mmdet/models/roi_heads/__init__.py
index baae2a05..f2e47eeb 100644
--- a/mmdet/models/roi_heads/__init__.py
+++ b/mmdet/models/roi_heads/__init__.py
@@ -1,22 +1,41 @@
 # Copyright (c) OpenMMLab. All rights reserved.
+from .attribute_heads import HeightHead, OffsetHead
 from .base_roi_head import BaseRoIHead
-from .bbox_heads import (BBoxHead, ConvFCBBoxHead, DIIHead,
-                         DoubleConvFCBBoxHead, SABLHead, SCNetBBoxHead,
-                         Shared2FCBBoxHead, Shared4Conv1FCBBoxHead)
+from .bbox_heads import (
+    BBoxHead,
+    ConvFCBBoxHead,
+    DIIHead,
+    DoubleConvFCBBoxHead,
+    SABLHead,
+    SCNetBBoxHead,
+    Shared2FCBBoxHead,
+    Shared4Conv1FCBBoxHead,
+)
 from .cascade_roi_head import CascadeRoIHead
 from .double_roi_head import DoubleHeadRoIHead
 from .dynamic_roi_head import DynamicRoIHead
 from .grid_roi_head import GridRoIHead
 from .htc_roi_head import HybridTaskCascadeRoIHead
-from .mask_heads import (CoarseMaskHead, FCNMaskHead, FeatureRelayHead,
-                         FusedSemanticHead, GlobalContextHead, GridHead,
-                         HTCMaskHead, MaskIoUHead, MaskPointHead,
-                         SCNetMaskHead, SCNetSemanticHead)
+from .loft_h_roi_head import LoftHRoIHead
+from .loft_hfm_roi_head import LoftHFMRoIHead
+from .loft_roi_head import LoftRoIHead
+from .mask_heads import (
+    CoarseMaskHead,
+    FCNMaskHead,
+    FeatureRelayHead,
+    FusedSemanticHead,
+    GlobalContextHead,
+    GridHead,
+    HTCMaskHead,
+    MaskIoUHead,
+    MaskPointHead,
+    SCNetMaskHead,
+    SCNetSemanticHead,
+)
 from .mask_scoring_roi_head import MaskScoringRoIHead
 from .pisa_roi_head import PISARoIHead
 from .point_rend_roi_head import PointRendRoIHead
-from .roi_extractors import (BaseRoIExtractor, GenericRoIExtractor,
-                             SingleRoIExtractor)
+from .roi_extractors import BaseRoIExtractor, GenericRoIExtractor, SingleRoIExtractor
 from .scnet_roi_head import SCNetRoIHead
 from .shared_heads import ResLayer
 from .sparse_roi_head import SparseRoIHead
@@ -24,14 +43,45 @@
 from .trident_roi_head import TridentRoIHead
 
 __all__ = [
-    'BaseRoIHead', 'CascadeRoIHead', 'DoubleHeadRoIHead', 'MaskScoringRoIHead',
-    'HybridTaskCascadeRoIHead', 'GridRoIHead', 'ResLayer', 'BBoxHead',
-    'ConvFCBBoxHead', 'DIIHead', 'SABLHead', 'Shared2FCBBoxHead',
-    'StandardRoIHead', 'Shared4Conv1FCBBoxHead', 'DoubleConvFCBBoxHead',
-    'FCNMaskHead', 'HTCMaskHead', 'FusedSemanticHead', 'GridHead',
-    'MaskIoUHead', 'BaseRoIExtractor', 'GenericRoIExtractor',
-    'SingleRoIExtractor', 'PISARoIHead', 'PointRendRoIHead', 'MaskPointHead',
-    'CoarseMaskHead', 'DynamicRoIHead', 'SparseRoIHead', 'TridentRoIHead',
-    'SCNetRoIHead', 'SCNetMaskHead', 'SCNetSemanticHead', 'SCNetBBoxHead',
-    'FeatureRelayHead', 'GlobalContextHead'
+    "BaseRoIHead",
+    "CascadeRoIHead",
+    "DoubleHeadRoIHead",
+    "MaskScoringRoIHead",
+    "HybridTaskCascadeRoIHead",
+    "GridRoIHead",
+    "ResLayer",
+    "BBoxHead",
+    "ConvFCBBoxHead",
+    "DIIHead",
+    "SABLHead",
+    "Shared2FCBBoxHead",
+    "StandardRoIHead",
+    "Shared4Conv1FCBBoxHead",
+    "DoubleConvFCBBoxHead",
+    "FCNMaskHead",
+    "HTCMaskHead",
+    "HeightHead",
+    "FusedSemanticHead",
+    "GridHead",
+    "MaskIoUHead",
+    "BaseRoIExtractor",
+    "GenericRoIExtractor",
+    "SingleRoIExtractor",
+    "PISARoIHead",
+    "PointRendRoIHead",
+    "MaskPointHead",
+    "CoarseMaskHead",
+    "DynamicRoIHead",
+    "SparseRoIHead",
+    "TridentRoIHead",
+    "SCNetRoIHead",
+    "SCNetMaskHead",
+    "SCNetSemanticHead",
+    "SCNetBBoxHead",
+    "FeatureRelayHead",
+    "GlobalContextHead",
+    "LoftRoIHead",
+    "LoftHRoIHead",
+    "OffsetHead",
+    "LoftHFMRoIHead",
 ]
diff --git a/mmdet/models/roi_heads/attribute_heads/__init__.py b/mmdet/models/roi_heads/attribute_heads/__init__.py
new file mode 100644
index 00000000..8e6804c7
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/__init__.py
@@ -0,0 +1,16 @@
+from .angle_head import NadirAngleHead, OffsetAngleHead
+from .footprint_mask_from_roof_offset_head import FootprintMaskFromRoofOffsetHead
+from .footprint_mask_head import FootprintMaskHead
+from .height_head import HeightHead
+from .offset_head import OffsetHead
+from .offset_head_expand_feature import OffsetHeadExpandFeature
+
+__all__ = [
+    "OffsetHead",
+    "HeightHead",
+    "OffsetHeadExpandFeature",
+    "OffsetAngleHead",
+    "NadirAngleHead",
+    "FootprintMaskHead",
+    "FootprintMaskFromRoofOffsetHead",
+]
diff --git a/mmdet/models/roi_heads/attribute_heads/angle_head.py b/mmdet/models/roi_heads/attribute_heads/angle_head.py
new file mode 100644
index 00000000..e86f558c
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/angle_head.py
@@ -0,0 +1,185 @@
+# -*- encoding: utf-8 -*-
+
+from functools import reduce
+
+import torch
+from mmcv.cnn import Conv2d, kaiming_init, normal_init
+from torch import nn
+
+from mmdet.models.builder import HEADS, build_loss
+
+
+@HEADS.register_module
+class OffsetAngleHead(nn.Module):
+    """This class defines the offset angle head,
+    which is used for constraining the offset and calculating pseudo gt_bboxes for RPN."""
+
+    def __init__(
+        self,
+        in_size=1024,
+        in_channels=3,
+        num_convs=6,
+        conv_out_channels=[16, 16, 32, 32, 64, 64],
+        kernel_size=3,
+        strides=[2, 2, 2, 2, 2, 2],
+        num_fcs=2,
+        fc_out_channels=[128, 32],
+        reg_num=2,
+        conv_cfg=None,
+        norm_cfg=None,
+        loss_angle=dict(type="MSELoss", loss_weight=1.0),
+        regular_lambda=0.1,
+        loss_method="loss",
+        with_tanh=True,
+    ):
+        super().__init__()
+
+        assert num_convs == len(conv_out_channels)
+        assert num_convs > 0
+        assert num_fcs == len(fc_out_channels)
+        assert num_fcs > 0
+
+        self.num_convs = num_convs
+        self.num_fcs = num_fcs
+        self.in_channels = in_channels
+        self.conv_out_channels = conv_out_channels
+        self.fc_out_channels = fc_out_channels
+        self.reg_num = reg_num
+        self.with_tanh = with_tanh
+
+        self.loss_angle = build_loss(loss_angle)
+        self.regular_lambda = regular_lambda
+        self.loss_method = getattr(self, loss_method)
+
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+
+        # define the conv and fc operations
+        self.convs = nn.ModuleList()
+        conv_in_channels = [self.in_channels] + self.conv_out_channels[:-1]
+        for i in range(self.num_convs):
+            self.convs.append(
+                Conv2d(
+                    in_channels=conv_in_channels[i],
+                    out_channels=self.conv_out_channels[i],
+                    kernel_size=kernel_size,
+                    stride=strides[i],
+                    padding=1,
+                )
+            )
+        fc_in_size = in_size / reduce(lambda x, y: x * y, strides)
+        fc_first_in_channel = int(fc_in_size * fc_in_size * conv_out_channels[-1])
+        fc_in_channels = [fc_first_in_channel] + self.fc_out_channels[:-1]
+
+        self.fcs = nn.ModuleList()
+        for i in range(num_fcs):
+            self.fcs.append(
+                nn.Linear(
+                    in_features=fc_in_channels[i],
+                    out_features=self.fc_out_channels[i],
+                )
+            )
+
+        self.fc_angle = nn.Linear(self.fc_out_channels[-1], self.reg_num)
+
+        self.relu = nn.ReLU()
+
+    def init_weights(self):
+        """This method initializes the head's weights."""
+        for conv in self.convs:
+            kaiming_init(conv)
+
+        for fc in self.fcs:
+            kaiming_init(
+                module=fc,
+                a=1,
+                mode="fan_in",
+                nonlinearity="leaky_relu",
+                distribution="uniform",
+            )
+        normal_init(self.fc_angle, std=0.01)
+
+    def forward(self, x):
+        """This method defines the forward process."""
+        if x.size(0) == 0:
+            return x.new_empty(x.size(0), self.reg_num)
+
+        for conv in self.convs:
+            x = self.relu(conv(x))
+
+        x = x.view(x.size(0), -1)
+
+        for fc in self.fcs:
+            x = self.relu(fc(x))
+
+        angle_pred = self.fc_angle(x)
+
+        if self.with_tanh:
+            angle_pred = torch.tanh(angle_pred)
+
+        return angle_pred
+
+    def loss(self, offset_angle_pred, offset_angle_targets):
+        """This method defines the loss function."""
+        if offset_angle_pred.size(0) == 0:
+            loss_offset_angle = offset_angle_pred.sum() * 0
+            loss_regular = offset_angle_pred.sum() * 0
+        else:
+            loss_offset_angle = self.loss_angle(offset_angle_pred, offset_angle_targets)
+            ones = torch.ones_like(
+                offset_angle_pred[:, 0],
+                dtype=offset_angle_pred.dtype,
+                device=offset_angle_pred.device,
+            )
+            loss_regular = self.loss_angle(
+                offset_angle_pred[:, 0] ** 2 + offset_angle_pred[:, 1] ** 2, ones
+            )
+        return loss_offset_angle + self.regular_lambda * loss_regular
+
+    def forward_train(self, x, img_metas, gt_offset_angles):
+        """Forward train process."""
+        offset_angle_pred = self.forward(x)
+        offset_angle_targets = torch.cat(gt_offset_angles, 0)
+        loss_offset_angle = self.loss_method(offset_angle_pred, offset_angle_targets)
+        return offset_angle_pred, dict(loss_offset_angle=loss_offset_angle)
+
+    def simple_test(self, x):
+        return self.forward(x)
+
+
+@HEADS.register_module
+class NadirAngleHead(OffsetAngleHead):
+    """This class defines the nadir angle head,
+    which is used for constraining the offset and calculating pseudo gt_bboxes for RPN."""
+
+    def forward(self, x):
+        """This method defines the forward process."""
+        if x.size(0) == 0:
+            return x.new_empty(x.size(0), self.reg_num)
+
+        for conv in self.convs:
+            x = self.relu(conv(x))
+
+        x = x.view(x.size(0), -1)
+
+        for fc in self.fcs:
+            x = self.relu(fc(x))
+
+        angle_pred = self.fc_angle(x)
+
+        return angle_pred
+
+    def loss(self, nadir_angle_pred, nadir_angle_targets):
+        """This method defines the loss function."""
+        if nadir_angle_pred.size(0) == 0:
+            loss_angle = nadir_angle_pred.sum() * 0
+        else:
+            loss_angle = self.loss_angle(nadir_angle_pred, nadir_angle_targets)
+        return loss_angle
+
+    def forward_train(self, x, img_metas, gt_nadir_angles):
+        """Forward train process."""
+        nadir_angle_pred = self.forward(x)
+        nadir_angle_targets = torch.unsqueeze(torch.cat(gt_nadir_angles, 0), dim=-1)
+        loss_nadir_angle = self.loss(nadir_angle_pred, nadir_angle_targets)
+        return nadir_angle_pred, dict(loss_nadir_angle=loss_nadir_angle)
diff --git a/mmdet/models/roi_heads/attribute_heads/footprint_mask_from_roof_offset_head.py b/mmdet/models/roi_heads/attribute_heads/footprint_mask_from_roof_offset_head.py
new file mode 100644
index 00000000..8deccd5b
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/footprint_mask_from_roof_offset_head.py
@@ -0,0 +1,62 @@
+"""This file defines the Footprint mask from roof and offset heads."""
+import torch
+from mmcv.runner import auto_fp16, force_fp32
+
+from mmdet.models.builder import HEADS
+from ..mask_heads import FCNMaskHead
+
+
+@HEADS.register_module()
+class FootprintMaskFromRoofOffsetHead(FCNMaskHead):
+    """The FootprintMaskFromRoofOffsetHead."""
+
+    @auto_fp16()
+    def forward(self, offsets, roofs):
+        x = self.concat_offsets_and_roofs(offsets, roofs)
+        for conv in self.convs:
+            x = conv(x)
+        if self.upsample is not None:
+            x = self.upsample(x)
+            if self.upsample_method == "deconv":
+                x = self.relu(x)
+        mask_pred = self.conv_logits(x)
+        return mask_pred
+
+    def concat_offsets_and_roofs(self, offsets, roofs):
+        ox = torch.unsqueeze(torch.unsqueeze(torch.unsqueeze(offsets[:, 0], dim=1), dim=1), dim=1)
+        oy = torch.unsqueeze(torch.unsqueeze(torch.unsqueeze(offsets[:, 1], dim=1), dim=1), dim=1)
+        offset_x_mask = torch.ones_like(roofs) * ox
+        offset_y_mask = torch.ones_like(roofs) * oy
+        x = torch.cat([offset_x_mask, offset_y_mask, roofs], dim=1)
+        return x
+
+    @force_fp32(apply_to=("mask_pred",))
+    def loss(self, mask_pred, mask_targets, labels):
+        """
+        Example:
+            >>> from mmdet.models.roi_heads.mask_heads.fcn_mask_head import *  # NOQA
+            >>> N = 7  # N = number of extracted ROIs
+            >>> C, H, W = 11, 32, 32
+            >>> # Create example instance of FCN Mask Head.
+            >>> # There are lots of variations depending on the configuration
+            >>> self = FCNMaskHead(num_classes=C, num_convs=1)
+            >>> inputs = torch.rand(N, self.in_channels, H, W)
+            >>> mask_pred = self.forward(inputs)
+            >>> sf = self.scale_factor
+            >>> labels = torch.randint(0, C, size=(N,))
+            >>> # With the default properties the mask targets should indicate
+            >>> # a (potentially soft) single-class label
+            >>> mask_targets = torch.rand(N, H * sf, W * sf)
+            >>> loss = self.loss(mask_pred, mask_targets, labels)
+            >>> print('loss = {!r}'.format(loss))
+        """
+        loss = dict()
+        if mask_pred.size(0) == 0:
+            loss_mask = mask_pred.sum()
+        else:
+            if self.class_agnostic:
+                loss_mask = self.loss_mask(mask_pred, mask_targets, torch.zeros_like(labels))
+            else:
+                loss_mask = self.loss_mask(mask_pred, mask_targets, labels)
+        loss["loss_footprint_mask_from_roof_offset"] = loss_mask
+        return loss
diff --git a/mmdet/models/roi_heads/attribute_heads/footprint_mask_head.py b/mmdet/models/roi_heads/attribute_heads/footprint_mask_head.py
new file mode 100644
index 00000000..71921685
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/footprint_mask_head.py
@@ -0,0 +1,42 @@
+"""This file defines the Footprint mask head."""
+import torch
+from mmcv.runner import force_fp32
+
+from mmdet.models.builder import HEADS
+from ..mask_heads import FCNMaskHead
+
+
+@HEADS.register_module()
+class FootprintMaskHead(FCNMaskHead):
+    """The FootprintMaskHead. The only difference with FCNMaskHead is the key in loss dict."""
+
+    @force_fp32(apply_to=("mask_pred",))
+    def loss(self, mask_pred, mask_targets, labels):
+        """
+        Example:
+            >>> from mmdet.models.roi_heads.mask_heads.fcn_mask_head import *  # NOQA
+            >>> N = 7  # N = number of extracted ROIs
+            >>> C, H, W = 11, 32, 32
+            >>> # Create example instance of FCN Mask Head.
+            >>> # There are lots of variations depending on the configuration
+            >>> self = FCNMaskHead(num_classes=C, num_convs=1)
+            >>> inputs = torch.rand(N, self.in_channels, H, W)
+            >>> mask_pred = self.forward(inputs)
+            >>> sf = self.scale_factor
+            >>> labels = torch.randint(0, C, size=(N,))
+            >>> # With the default properties the mask targets should indicate
+            >>> # a (potentially soft) single-class label
+            >>> mask_targets = torch.rand(N, H * sf, W * sf)
+            >>> loss = self.loss(mask_pred, mask_targets, labels)
+            >>> print('loss = {!r}'.format(loss))
+        """
+        loss = dict()
+        if mask_pred.size(0) == 0:
+            loss_mask = mask_pred.sum()
+        else:
+            if self.class_agnostic:
+                loss_mask = self.loss_mask(mask_pred, mask_targets, torch.zeros_like(labels))
+            else:
+                loss_mask = self.loss_mask(mask_pred, mask_targets, labels)
+        loss["loss_footprint_mask"] = loss_mask
+        return loss
diff --git a/mmdet/models/roi_heads/attribute_heads/height_head.py b/mmdet/models/roi_heads/attribute_heads/height_head.py
new file mode 100644
index 00000000..145c8ccb
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/height_head.py
@@ -0,0 +1,172 @@
+# -*- encoding: utf-8 -*-
+
+import numpy as np
+import torch
+from mmcv.cnn import Conv2d, kaiming_init, normal_init
+from torch import nn
+
+from mmdet.core import build_bbox_coder, multi_apply
+from mmdet.models.builder import HEADS, build_loss
+
+
+@HEADS.register_module
+class HeightHead(nn.Module):
+    """This class defines the height head."""
+
+    def __init__(
+        self,
+        roi_feat_size=7,
+        in_channels=256,
+        num_convs=4,
+        num_fcs=2,
+        reg_num=1,
+        conv_out_channels=256,
+        fc_out_channels=1024,
+        height_coder=dict(type="DeltaHeightCoder", target_means=[0.0], target_stds=[0.5]),
+        conv_cfg=None,
+        norm_cfg=None,
+        loss_height=dict(type="MSELoss", loss_weight=1.0),
+    ):
+        super().__init__()
+
+        self.in_channels = in_channels
+        self.conv_out_channels = conv_out_channels
+        self.fc_out_channels = fc_out_channels
+        self.reg_num = reg_num
+
+        self.height_coder = build_bbox_coder(height_coder)
+        self.loss_height = build_loss(loss_height)
+
+        # TODO: Confirm that whether conv_cfg and norm_cfg are used.
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+
+        # define the conv and fc operations
+        self.convs = nn.ModuleList()
+        for i in range(num_convs):
+            self.convs.append(
+                Conv2d(
+                    in_channels=self.in_channels if i == 0 else self.conv_out_channels,
+                    out_channels=self.conv_out_channels,
+                    kernel_size=3,
+                    padding=1,
+                )
+            )
+
+        roi_feat_area = roi_feat_size * roi_feat_size
+
+        self.fcs = nn.ModuleList()
+        for i in range(num_fcs):
+            in_channels = self.conv_out_channels * roi_feat_area if i == 0 else self.fc_out_channels
+            self.fcs.append(
+                nn.Linear(
+                    in_features=in_channels,
+                    out_features=self.fc_out_channels,
+                )
+            )
+
+        self.fc_height = nn.Linear(self.fc_out_channels, self.reg_num)
+
+        self.relu = nn.ReLU()
+
+    def init_weights(self):
+        """This method initializes the head's weights."""
+        for conv in self.convs:
+            kaiming_init(conv)
+
+        for fc in self.fcs:
+            kaiming_init(
+                module=fc,
+                a=1,
+                mode="fan_in",
+                nonlinearity="leaky_relu",
+                distribution="uniform",
+            )
+        normal_init(self.fc_height, std=0.01)
+
+    def forward(self, x):
+        """This method defines the forward process."""
+        if x.size(0) == 0:
+            return x.new_empty(x.size(0), self.reg_num)
+
+        for conv in self.convs:
+            x = self.relu(conv(x))
+
+        x = x.view(x.size(0), -1)
+
+        for fc in self.fcs:
+            x = self.relu(fc(x))
+
+        height = self.fc_height(x)
+
+        return height
+
+    def loss(self, height_pred, height_targets):
+        """This method defines the loss function."""
+        if height_pred.size(0) == 0:
+            loss_height = height_pred.sum() * 0
+        else:
+            loss_height = self.loss_height(height_pred, height_targets)
+        return dict(loss_height=loss_height)
+
+    def _height_target_single(self, pos_proposals, pos_assigned_gt_inds, gt_heights, cfg):
+        device = pos_proposals.device
+        num_pos = pos_proposals.size(0)
+        height_targets = pos_proposals.new_zeros(pos_proposals.size(0), self.reg_num)
+
+        pos_gt_heights = []
+
+        if num_pos > 0:
+            pos_assigned_gt_inds = pos_assigned_gt_inds.cpu().numpy()
+            for i in range(num_pos):
+                gt_height = gt_heights[pos_assigned_gt_inds[i]]
+                pos_gt_heights.append(gt_height.tolist())
+
+            pos_gt_heights = torch.from_numpy(np.stack(np.array(pos_gt_heights))).float().to(device)
+            height_targets = self.height_coder.encode(pos_proposals, pos_gt_heights)
+        else:
+            height_targets = pos_proposals.new_zeros((0, self.reg_num))
+
+        return height_targets, height_targets
+
+    def get_targets(self, sampling_results, gt_heights, rcnn_train_cfg, concat=True):
+        """get the targets of height in training stage
+
+        Args:
+            sampling_results (torch.Tensor): sampling results
+            gt_heights (torch.Tensor): height ground truth
+            rcnn_train_cfg (dict): rcnn training config
+            concat (bool, optional): concat flag. Defaults to True.
+
+        Returns:
+            torch.Tensor: height targets
+        """
+        pos_proposals = [res.pos_bboxes for res in sampling_results]
+        pos_assigned_gt_inds = [res.pos_assigned_gt_inds for res in sampling_results]
+        height_targets, _ = multi_apply(
+            self._height_target_single,
+            pos_proposals,
+            pos_assigned_gt_inds,
+            gt_heights,
+            cfg=rcnn_train_cfg,
+        )
+
+        if concat:
+            height_targets = torch.cat(height_targets, 0)
+
+        return height_targets
+
+    def get_heights(self, height_pred, det_bboxes, scale_factor, rescale, img_shape=[1024, 1024]):
+        # generate heights in inference stage
+        if height_pred is not None:
+            heights = self.height_coder.decode(det_bboxes, height_pred)
+        else:
+            heights = torch.zeros((det_bboxes.size()[0], self.reg_num))
+
+        if isinstance(heights, torch.Tensor):
+            heights = heights.cpu().numpy()
+        assert isinstance(heights, np.ndarray)
+
+        heights = heights.astype(np.float32)
+
+        return heights
diff --git a/mmdet/models/roi_heads/attribute_heads/offset_head.py b/mmdet/models/roi_heads/attribute_heads/offset_head.py
new file mode 100644
index 00000000..174d2515
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/offset_head.py
@@ -0,0 +1,235 @@
+# -*- encoding: utf-8 -*-
+"""
+@File    :   offset_head.py
+@Time    :   2021/01/17 20:42:55
+@Author  :   Jinwang Wang
+@Version :   1.0
+@Contact :   jwwangchn@163.com
+@License :   (C)Copyright 2017-2021
+@Desc    :   Core codes of offset head
+"""
+
+import numpy as np
+import torch
+import torch.nn as nn
+from mmcv.cnn import kaiming_init, normal_init
+from mmcv.ops import Conv2d
+from mmcv.runner import force_fp32
+from torch.nn.modules.utils import _pair
+
+from mmdet.core import build_bbox_coder, multi_apply
+from mmdet.models.builder import HEADS, build_loss
+
+
+@HEADS.register_module
+class OffsetHead(nn.Module):
+    def __init__(
+        self,
+        roi_feat_size=7,
+        in_channels=256,
+        num_convs=4,
+        num_fcs=2,
+        reg_num=2,
+        conv_out_channels=256,
+        fc_out_channels=1024,
+        offset_coordinate="rectangle",
+        offset_coder=dict(
+            type="DeltaXYOffsetCoder", target_means=[0.0, 0.0], target_stds=[0.5, 0.5]
+        ),
+        reg_decoded_offset=False,
+        conv_cfg=None,
+        norm_cfg=None,
+        loss_offset=dict(type="MSELoss", loss_weight=1.0),
+    ):
+        super(OffsetHead, self).__init__()
+        self.in_channels = in_channels
+        self.conv_out_channels = conv_out_channels
+        self.fc_out_channels = fc_out_channels
+        self.offset_coordinate = offset_coordinate
+        self.reg_decoded_offset = reg_decoded_offset
+        self.reg_num = reg_num
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+
+        self.offset_coder = build_bbox_coder(offset_coder)
+        self.loss_offset = build_loss(loss_offset)
+
+        self.convs = nn.ModuleList()
+        for i in range(num_convs):
+            in_channels = self.in_channels if i == 0 else self.conv_out_channels
+            self.convs.append(Conv2d(in_channels, self.conv_out_channels, 3, padding=1))
+
+        roi_feat_size = _pair(roi_feat_size)
+        roi_feat_area = roi_feat_size[0] * roi_feat_size[1]
+        self.fcs = nn.ModuleList()
+        for i in range(num_fcs):
+            in_channels = self.conv_out_channels * roi_feat_area if i == 0 else self.fc_out_channels
+            self.fcs.append(nn.Linear(in_channels, self.fc_out_channels))
+
+        self.fc_offset = nn.Linear(self.fc_out_channels, self.reg_num)
+        self.relu = nn.ReLU()
+        self.loss_offset = build_loss(loss_offset)
+
+    def init_weights(self):
+        for conv in self.convs:
+            kaiming_init(conv)
+        for fc in self.fcs:
+            kaiming_init(fc, a=1, mode="fan_in", nonlinearity="leaky_relu", distribution="uniform")
+        normal_init(self.fc_offset, std=0.01)
+
+    def forward(self, x):
+        # self.vis_featuremap = x.clone()
+        if x.size(0) == 0:
+            return x.new_empty(x.size(0), 2)
+        for conv in self.convs:
+            x = self.relu(conv(x))
+
+        self.vis_featuremap = x.clone()
+
+        x = x.view(x.size(0), -1)
+        # self.vis_featuremap = x.clone()
+        for fc in self.fcs:
+            x = self.relu(fc(x))
+        offset = self.fc_offset(x)
+
+        return offset
+
+    @force_fp32(apply_to=("offset_pred",))
+    def loss(self, offset_pred, offset_targets):
+        if offset_pred.size(0) == 0:
+            loss_offset = offset_pred.sum() * 0
+        else:
+            loss_offset = self.loss_offset(offset_pred, offset_targets)
+        return dict(loss_offset=loss_offset)
+
+    def _offset_target_single(self, pos_proposals, pos_assigned_gt_inds, gt_offsets, cfg):
+        device = pos_proposals.device
+        num_pos = pos_proposals.size(0)
+        offset_targets = pos_proposals.new_zeros(pos_proposals.size(0), 2)
+
+        pos_gt_offsets = []
+
+        if num_pos > 0:
+            pos_assigned_gt_inds = pos_assigned_gt_inds.cpu().numpy()
+            for i in range(num_pos):
+                gt_offset = gt_offsets[pos_assigned_gt_inds[i]]
+                pos_gt_offsets.append(gt_offset.tolist())
+
+            pos_gt_offsets = np.array(pos_gt_offsets)
+            pos_gt_offsets = torch.from_numpy(np.stack(pos_gt_offsets)).float().to(device)
+
+            if not self.reg_decoded_offset:
+                offset_targets = self.offset_coder.encode(pos_proposals, pos_gt_offsets)
+            else:
+                offset_targets = pos_gt_offsets
+        else:
+            offset_targets = pos_proposals.new_zeros((0, 2))
+
+        return offset_targets, offset_targets
+
+    def get_targets(self, sampling_results, gt_offsets, rcnn_train_cfg, concat=True):
+        """generate offset targets
+
+        Args:
+            sampling_results (torch.Tensor): sampling results
+            gt_offsets (torch.Tensor): offset ground truth
+            rcnn_train_cfg (dict): config of rcnn train
+            concat (bool, optional): concat flag. Defaults to True.
+
+        Returns:
+            torch.Tensor: offset targets
+        """
+        pos_proposals = [res.pos_bboxes for res in sampling_results]
+        pos_assigned_gt_inds = [res.pos_assigned_gt_inds for res in sampling_results]
+        offset_targets, _ = multi_apply(
+            self._offset_target_single,
+            pos_proposals,
+            pos_assigned_gt_inds,
+            gt_offsets,
+            cfg=rcnn_train_cfg,
+        )
+
+        if concat:
+            offset_targets = torch.cat(offset_targets, 0)
+
+        if self.reg_num == 2:
+            return offset_targets
+        elif self.reg_num == 3:
+            length = offset_targets[:, 0]
+            angle = offset_targets[:, 1]
+            angle_cos = torch.cos(angle)
+            angle_sin = torch.sin(angle)
+            offset_targets = torch.stack([length, angle_cos, angle_sin], dim=-1)
+
+            return offset_targets
+        else:
+            raise (RuntimeError("error reg_num value: ", self.reg_num))
+
+    def get_offsets(self, offset_pred, det_bboxes, scale_factor, rescale, img_shape=[1024, 1024]):
+        """get offsets in inference stage
+
+        Args:
+            offset_pred (torch.Tensor): predicted offset
+            det_bboxes (torch.Tensor): detected bboxes
+            scale_factor (int): scale factor
+            rescale (int): rescale flag
+            img_shape (list, optional): shape of image. Defaults to [1024, 1024].
+
+        Returns:
+            np.array: predicted offsets
+        """
+        if offset_pred is not None:
+            if self.reg_num == 2:
+                offsets = self.offset_coder.decode(det_bboxes, offset_pred, max_shape=img_shape)
+            elif self.reg_num == 3:
+                length, angle_cos, angle_sin = (
+                    offset_pred[:, 0],
+                    offset_pred[:, 1],
+                    offset_pred[:, 2],
+                )
+                angle = torch.atan2(angle_sin, angle_cos)
+
+                offset_pred = torch.stack([length, angle], dim=-1)
+
+                offsets = self.offset_coder.decode(det_bboxes, offset_pred, max_shape=img_shape)
+
+            else:
+                raise (RuntimeError("error reg_num value: ", self.reg_num))
+        else:
+            offsets = torch.zeros((det_bboxes.size()[0], self.reg_num))
+
+        if isinstance(offsets, torch.Tensor):
+            offsets = offsets.cpu().numpy()
+        assert isinstance(offsets, np.ndarray)
+
+        offsets = offsets.astype(np.float32)
+
+        if self.offset_coordinate == "rectangle":
+            return offsets
+        elif self.offset_coordinate == "polar":
+            length, angle = offsets[:, 0], offsets[:, 1]
+            offset_x = length * np.cos(angle)
+            offset_y = length * np.sin(angle)
+            offsets = np.stack([offset_x, offset_y], axis=-1)
+        else:
+            raise (RuntimeError(f"do not support this coordinate: {self.offset_coordinate}"))
+
+        return offsets
+
+    def get_roof_footprint_bbox_offsets(self, offset_pred, det_bboxes, img_shape=[1024, 1024]):
+        """decode the predicted offset
+
+        Args:
+            offset_pred (torch.Tensor): predicted offsets
+            det_bboxes (torch.Tensor): predicted bboxes
+            img_shape (list, optional): image shape. Defaults to [1024, 1024].
+
+        Returns:
+            np.array: decoded offsets
+        """
+        if offset_pred is not None:
+            offsets = self.offset_coder.decode(det_bboxes, offset_pred, max_shape=img_shape)
+        else:
+            offsets = torch.zeros((det_bboxes.size()[0], self.reg_num))
+
+        return offsets
diff --git a/mmdet/models/roi_heads/attribute_heads/offset_head_expand_feature.py b/mmdet/models/roi_heads/attribute_heads/offset_head_expand_feature.py
new file mode 100644
index 00000000..f98c97b2
--- /dev/null
+++ b/mmdet/models/roi_heads/attribute_heads/offset_head_expand_feature.py
@@ -0,0 +1,494 @@
+# -*- encoding: utf-8 -*-
+"""
+@File    :   offset_head_expand_feature.py
+@Time    :   2021/01/17 20:18:09
+@Author  :   Jinwang Wang
+@Version :   1.0
+@Contact :   jwwangchn@163.com
+@License :   (C)Copyright 2017-2021
+@Desc    :   Main code for FOA module.
+"""
+
+import math
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcv.cnn import kaiming_init, normal_init
+from mmcv.ops import Conv2d
+from mmcv.runner import force_fp32
+from torch.nn.modules.utils import _pair
+
+from mmdet.core import build_bbox_coder, multi_apply
+from mmdet.models.builder import HEADS, build_loss
+
+
+@HEADS.register_module
+class OffsetHeadExpandFeature(nn.Module):
+    def __init__(
+        self,
+        roi_feat_size=7,
+        in_channels=256,
+        num_convs=4,
+        num_fcs=2,
+        reg_num=2,
+        conv_out_channels=256,
+        fc_out_channels=1024,
+        expand_feature_num=4,
+        share_expand_fc=False,
+        rotations=[0, 90, 180, 270],
+        offset_coordinate="rectangle",
+        offset_coder=dict(
+            type="DeltaXYOffsetCoder", target_means=[0.0, 0.0], target_stds=[0.5, 0.5]
+        ),
+        reg_decoded_offset=False,
+        conv_cfg=None,
+        norm_cfg=None,
+        loss_offset=dict(type="MSELoss", loss_weight=1.0),
+    ):
+        super(OffsetHeadExpandFeature, self).__init__()
+        self.in_channels = in_channels
+        self.conv_out_channels = conv_out_channels
+        self.fc_out_channels = fc_out_channels
+        self.offset_coordinate = offset_coordinate
+        self.reg_decoded_offset = reg_decoded_offset
+        self.reg_num = reg_num
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        # expand_feature_num is the branch numbers
+        self.expand_feature_num = expand_feature_num
+        self.share_expand_fc = share_expand_fc
+
+        self.offset_coder = build_bbox_coder(offset_coder)
+        self.loss_offset = build_loss(loss_offset)
+        # the rotation angle of feature transformation
+        self.rotations = rotations
+        self.flips = ["h", "v"]
+
+        # define the conv and fc operations
+        self.expand_convs = nn.ModuleList()
+        for _ in range(self.expand_feature_num):
+            convs = nn.ModuleList()
+            for i in range(num_convs):
+                in_channels = self.in_channels if i == 0 else self.conv_out_channels
+                convs.append(Conv2d(in_channels, self.conv_out_channels, 3, padding=1))
+            self.expand_convs.append(convs)
+
+        roi_feat_size = _pair(roi_feat_size)
+        roi_feat_area = roi_feat_size[0] * roi_feat_size[1]
+        if self.share_expand_fc == False:
+            self.expand_fcs = nn.ModuleList()
+            for _ in range(self.expand_feature_num):
+                fcs = nn.ModuleList()
+                for i in range(num_fcs):
+                    in_channels = (
+                        self.conv_out_channels * roi_feat_area if i == 0 else self.fc_out_channels
+                    )
+                    fcs.append(nn.Linear(in_channels, self.fc_out_channels))
+                self.expand_fcs.append(fcs)
+            self.expand_fc_offsets = nn.ModuleList()
+            for _ in range(self.expand_feature_num):
+                fc_offset = nn.Linear(self.fc_out_channels, self.reg_num)
+                self.expand_fc_offsets.append(fc_offset)
+        else:
+            self.fcs = nn.ModuleList()
+            for i in range(num_fcs):
+                in_channels = (
+                    self.conv_out_channels * roi_feat_area if i == 0 else self.fc_out_channels
+                )
+                self.fcs.append(nn.Linear(in_channels, self.fc_out_channels))
+
+            self.fc_offset = nn.Linear(self.fc_out_channels, self.reg_num)
+
+        self.relu = nn.ReLU()
+        self.loss_offset = build_loss(loss_offset)
+
+    def init_weights(self):
+        for convs in self.expand_convs:
+            for conv in convs:
+                kaiming_init(conv)
+        if self.share_expand_fc == False:
+            for fcs in self.expand_fcs:
+                for fc in fcs:
+                    kaiming_init(
+                        fc, a=1, mode="fan_in", nonlinearity="leaky_relu", distribution="uniform"
+                    )
+            for fc_offset in self.expand_fc_offsets:
+                normal_init(fc_offset, std=0.01)
+        else:
+            for fc in self.fcs:
+                kaiming_init(
+                    fc, a=1, mode="fan_in", nonlinearity="leaky_relu", distribution="uniform"
+                )
+            normal_init(self.fc_offset, std=0.01)
+
+    def forward(self, x):
+        if x.size(0) == 0:
+            return x.new_empty(x.size(0), 2 * self.expand_feature_num)
+        input_feature = x.clone()
+        offsets = []
+        for idx in range(self.expand_feature_num):
+            x = self.expand_feature(input_feature, idx)
+            convs = self.expand_convs[idx]
+            for conv in convs:
+                x = self.relu(conv(x))
+
+            x = x.view(x.size(0), -1)
+            # share the fully connected parameters
+            if self.share_expand_fc == False:
+                fcs = self.expand_fcs[idx]
+                for fc in fcs:
+                    x = self.relu(fc(x))
+                fc_offset = self.expand_fc_offsets[idx]
+                offset = fc_offset(x)
+            else:
+                for fc in self.fcs:
+                    x = self.relu(fc(x))
+                offset = self.fc_offset(x)
+
+            offsets.append(offset)
+
+        offsets = torch.cat(offsets, 0)
+        return offsets
+
+    def expand_feature(self, feature, operation_idx):
+        """rotate the feature by operation index
+
+        Args:
+            feature (torch.Tensor): input feature map
+            operation_idx (int): operation index -> rotation angle
+
+        Returns:
+            torch.Tensor: rotated feature
+        """
+        if operation_idx < 4:
+            # rotate feature map
+            rotate_angle = self.rotations[operation_idx]
+            theta = torch.zeros(
+                (feature.size()[0], 2, 3), requires_grad=False, device=feature.device
+            )
+
+            with torch.no_grad():
+                # counterclockwise
+                angle = rotate_angle * math.pi / 180.0
+
+                theta[:, 0, 0] = torch.tensor(
+                    math.cos(angle), requires_grad=False, device=feature.device
+                )
+                theta[:, 0, 1] = torch.tensor(
+                    math.sin(-angle), requires_grad=False, device=feature.device
+                )
+                theta[:, 1, 0] = torch.tensor(
+                    math.sin(angle), requires_grad=False, device=feature.device
+                )
+                theta[:, 1, 1] = torch.tensor(
+                    math.cos(angle), requires_grad=False, device=feature.device
+                )
+
+            grid = F.affine_grid(theta, feature.size())
+            transformed_feature = F.grid_sample(feature, grid).to(feature.device)
+
+        elif operation_idx >= 4 and operation_idx < 8:
+            # rotate and flip feature map
+            raise NotImplementedError
+        else:
+            raise NotImplementedError
+
+        return transformed_feature
+
+    @force_fp32(apply_to=("offset_pred",))
+    def loss(self, offset_pred, offset_targets):
+        if offset_pred.size(0) == 0:
+            loss_offset = offset_pred.sum() * 0
+        else:
+            loss_offset = self.loss_offset(offset_pred, offset_targets)
+        return dict(loss_offset=loss_offset)
+
+    def offset_coordinate_transform(self, offset, transform_flag="xy2la"):
+        """transform the coordinate of offsets
+
+        Args:
+            offset (list): list of offset
+            transform_flag (str, optional): flag of transform. Defaults to 'xy2la'.
+
+        Returns:
+            list: transformed offsets
+        """
+        if transform_flag == "xy2la":
+            offset_x, offset_y = offset
+            length = math.sqrt(offset_x**2 + offset_y**2)
+            angle = math.atan2(offset_y, offset_x)
+            offset = [length, angle]
+        elif transform_flag == "la2xy":
+            length, angle = offset
+            offset_x = length * math.cos(angle)
+            offset_y = length * math.sin(angle)
+            offset = [offset_x, offset_y]
+        else:
+            raise NotImplementedError
+
+        return offset
+
+    def offset_rotate(self, offset, rotate_angle):
+        """rotate the offset
+
+        Args:
+            offset (np.array): input offset
+            rotate_angle (int): rotation angle
+
+        Returns:
+            np.array: rotated offset
+        """
+        offset = self.offset_coordinate_transform(offset, transform_flag="xy2la")
+        # counterclockwise
+        offset = [offset[0], offset[1] - rotate_angle * math.pi / 180.0]
+        offset = self.offset_coordinate_transform(offset, transform_flag="la2xy")
+
+        return offset
+
+    def expand_gt_offset(self, gt_offset, operation_idx):
+        """rotate the ground truth of offset
+
+        Args:
+            gt_offset (np.array): offset ground truth
+            operation_idx (int): operation index
+
+        Returns:
+            np.array: rotated offset
+        """
+        if operation_idx < 4:
+            # rotate feature map
+            rotate_angle = self.rotations[operation_idx]
+            transformed_offset = self.offset_rotate(gt_offset, rotate_angle)
+        elif operation_idx >= 4 and operation_idx < 8:
+            # rotate and flip feature map
+            raise NotImplementedError
+        else:
+            raise NotImplementedError
+
+        return transformed_offset
+
+    def _offset_target_single(
+        self, pos_proposals, pos_assigned_gt_inds, gt_offsets, cfg, operation_idx
+    ):
+        # generate target of single item
+        device = pos_proposals.device
+        num_pos = pos_proposals.size(0)
+        offset_targets = pos_proposals.new_zeros(pos_proposals.size(0), 2)
+
+        pos_gt_offsets = []
+
+        if num_pos > 0:
+            pos_assigned_gt_inds = pos_assigned_gt_inds.cpu().numpy()
+            for i in range(num_pos):
+                gt_offset = gt_offsets[pos_assigned_gt_inds[i]].tolist()
+                gt_offset = self.expand_gt_offset(gt_offset, operation_idx=operation_idx)
+                pos_gt_offsets.append(gt_offset)
+
+            pos_gt_offsets = np.array(pos_gt_offsets)
+            pos_gt_offsets = torch.from_numpy(np.stack(pos_gt_offsets)).float().to(device)
+
+            if not self.reg_decoded_offset:
+                if self.rotations[operation_idx] == 90 or self.rotations[operation_idx] == 270:
+                    # if rotation angle is 90 or 270, the position of x and y need to be exchange
+                    offset_targets = self.offset_coder.encode(
+                        pos_proposals, pos_gt_offsets[:, [1, 0]]
+                    )
+                    offset_targets = offset_targets[:, [1, 0]]
+                else:
+                    offset_targets = self.offset_coder.encode(pos_proposals, pos_gt_offsets)
+            else:
+                offset_targets = pos_gt_offsets
+        else:
+            offset_targets = pos_proposals.new_zeros((0, 2))
+
+        return offset_targets, offset_targets
+
+    def get_targets(self, sampling_results, gt_offsets, rcnn_train_cfg, concat=True):
+        """get the targets of offset in training stage
+
+        Args:
+            sampling_results (torch.Tensor): sampling results
+            gt_offsets (torch.Tensor): offset ground truth
+            rcnn_train_cfg (dict): rcnn training config
+            concat (bool, optional): concat flag. Defaults to True.
+
+        Returns:
+            torch.Tensor: offset targets
+        """
+        pos_proposals = [res.pos_bboxes for res in sampling_results]
+        pos_assigned_gt_inds = [res.pos_assigned_gt_inds for res in sampling_results]
+        expand_offset_targets = []
+        for idx in range(self.expand_feature_num):
+            offset_targets, _ = multi_apply(
+                self._offset_target_single,
+                pos_proposals,
+                pos_assigned_gt_inds,
+                gt_offsets,
+                cfg=rcnn_train_cfg,
+                operation_idx=idx,
+            )
+
+            if concat:
+                offset_targets = torch.cat(offset_targets, 0)
+
+            expand_offset_targets.append(offset_targets)
+
+        expand_offset_targets = torch.cat(expand_offset_targets, 0)
+        return expand_offset_targets
+
+    def offset_fusion(self, offset_pred, model="max"):
+        """Fuse the predicted offsets in inference stage
+
+        Args:
+            offset_pred (torch.Tensor): predicted offsets
+            model (str, optional): fusion model. Defaults to 'max'. Max -> keep the max of offsets, Mean -> keep the mean value of offsets.
+
+        Returns:
+            np.array: fused offsets
+        """
+        split_offsets = offset_pred.split(
+            int(offset_pred.shape[0] / self.expand_feature_num), dim=0
+        )
+        main_offsets = split_offsets[0]
+        if model == "mean":
+            # Mean model for offset fusion
+            offset_values = 0
+            for idx in range(self.expand_feature_num):
+                # 1. processing the rotation, rotation angle in (90, 270) -> switch the position of (x, y)
+                if self.rotations[idx] == 90 or self.rotations[idx] == 270:
+                    current_offsets = split_offsets[idx][:, [1, 0]]
+                elif self.rotations[idx] == 0 or self.rotations[idx] == 180:
+                    current_offsets = split_offsets[idx]
+                else:
+                    raise NotImplementedError(
+                        f"rotation angle: {self.rotations[idx]} (self.rotations = {self.rotations})"
+                    )
+
+                offset_values += torch.abs(current_offsets)
+            offset_values /= 1
+        elif model == "max":
+            # Max model for offset fusion
+            if self.expand_feature_num == 2 and self.rotations == [0, 180]:
+                offset_value_x = torch.cat(
+                    [
+                        split_offsets[0][:, 0].unsqueeze(dim=1),
+                        split_offsets[1][:, 0].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+                offset_value_y = torch.cat(
+                    [
+                        split_offsets[0][:, 1].unsqueeze(dim=1),
+                        split_offsets[1][:, 1].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+            elif self.expand_feature_num == 2 and self.rotations == [0, 90]:
+                offset_value_x = torch.cat(
+                    [
+                        split_offsets[0][:, 0].unsqueeze(dim=1),
+                        split_offsets[1][:, 1].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+                offset_value_y = torch.cat(
+                    [
+                        split_offsets[0][:, 1].unsqueeze(dim=1),
+                        split_offsets[1][:, 0].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+            elif self.expand_feature_num == 3 and self.rotations == [0, 90, 180]:
+                offset_value_x = torch.cat(
+                    [
+                        split_offsets[0][:, 0].unsqueeze(dim=1),
+                        split_offsets[1][:, 1].unsqueeze(dim=1),
+                        split_offsets[2][:, 0].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+                offset_value_y = torch.cat(
+                    [
+                        split_offsets[0][:, 1].unsqueeze(dim=1),
+                        split_offsets[1][:, 0].unsqueeze(dim=1),
+                        split_offsets[2][:, 1].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+            elif self.expand_feature_num == 4:
+                offset_value_x = torch.cat(
+                    [
+                        split_offsets[0][:, 0].unsqueeze(dim=1),
+                        split_offsets[1][:, 1].unsqueeze(dim=1),
+                        split_offsets[2][:, 0].unsqueeze(dim=1),
+                        split_offsets[3][:, 1].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+                offset_value_y = torch.cat(
+                    [
+                        split_offsets[0][:, 1].unsqueeze(dim=1),
+                        split_offsets[1][:, 0].unsqueeze(dim=1),
+                        split_offsets[2][:, 1].unsqueeze(dim=1),
+                        split_offsets[3][:, 0].unsqueeze(dim=1),
+                    ],
+                    dim=1,
+                )
+            else:
+                raise NotImplementedError
+
+            offset_values = torch.cat(
+                [
+                    torch.max(torch.abs(offset_value_x), dim=1)[0].unsqueeze(dim=1),
+                    torch.max(torch.abs(offset_value_y), dim=1)[0].unsqueeze(dim=1),
+                ],
+                dim=1,
+            )
+        else:
+            raise NotImplementedError
+
+        offset_polarity = torch.zeros(main_offsets.size(), device=offset_pred.device)
+        offset_polarity[main_offsets > 0] = 1
+        offset_polarity[main_offsets <= 0] = -1
+
+        fused_offsets = offset_values * offset_polarity
+
+        return fused_offsets
+
+    def get_offsets(self, offset_pred, det_bboxes, scale_factor, rescale, img_shape=[1024, 1024]):
+        # generate offsets in inference stage
+        if offset_pred is not None:
+            # fuse the predicted offsets
+            offset_pred = self.offset_fusion(offset_pred)
+            # after offset fusion, the position of x and y is (x, y)
+            offsets = self.offset_coder.decode(det_bboxes, offset_pred, max_shape=img_shape)
+        else:
+            offsets = torch.zeros((det_bboxes.size()[0], self.reg_num))
+
+        if isinstance(offsets, torch.Tensor):
+            offsets = offsets.cpu().numpy()
+        assert isinstance(offsets, np.ndarray)
+
+        offsets = offsets.astype(np.float32)
+
+        if self.offset_coordinate == "rectangle":
+            return offsets
+        elif self.offset_coordinate == "polar":
+            length, angle = offsets[:, 0], offsets[:, 1]
+            offset_x = length * np.cos(angle)
+            offset_y = length * np.sin(angle)
+            offsets = np.stack([offset_x, offset_y], axis=-1)
+        else:
+            raise (RuntimeError(f"do not support this coordinate: {self.offset_coordinate}"))
+
+        return offsets
+
+    def get_roof_footprint_bbox_offsets(self, offset_pred, det_bboxes, img_shape=[1024, 1024]):
+        if offset_pred is not None:
+            offsets = self.offset_coder.decode(det_bboxes, offset_pred, max_shape=img_shape)
+        else:
+            offsets = torch.zeros((det_bboxes.size()[0], self.reg_num))
+
+        return offsets
diff --git a/mmdet/models/roi_heads/base_roi_head.py b/mmdet/models/roi_heads/base_roi_head.py
index 4adbdef8..ea7e59d6 100644
--- a/mmdet/models/roi_heads/base_roi_head.py
+++ b/mmdet/models/roi_heads/base_roi_head.py
@@ -9,16 +9,18 @@
 class BaseRoIHead(BaseModule, metaclass=ABCMeta):
     """Base class for RoIHeads."""
 
-    def __init__(self,
-                 bbox_roi_extractor=None,
-                 bbox_head=None,
-                 mask_roi_extractor=None,
-                 mask_head=None,
-                 shared_head=None,
-                 train_cfg=None,
-                 test_cfg=None,
-                 pretrained=None,
-                 init_cfg=None):
+    def __init__(
+        self,
+        bbox_roi_extractor=None,
+        bbox_head=None,
+        mask_roi_extractor=None,
+        mask_head=None,
+        shared_head=None,
+        train_cfg=None,
+        test_cfg=None,
+        pretrained=None,
+        init_cfg=None,
+    ):
         super(BaseRoIHead, self).__init__(init_cfg)
         self.train_cfg = train_cfg
         self.test_cfg = test_cfg
@@ -37,17 +39,49 @@ def __init__(self,
     @property
     def with_bbox(self):
         """bool: whether the RoI head contains a `bbox_head`"""
-        return hasattr(self, 'bbox_head') and self.bbox_head is not None
+        return hasattr(self, "bbox_head") and self.bbox_head is not None
 
     @property
     def with_mask(self):
         """bool: whether the RoI head contains a `mask_head`"""
-        return hasattr(self, 'mask_head') and self.mask_head is not None
+        return hasattr(self, "mask_head") and self.mask_head is not None
+
+    @property
+    def with_offset(self):
+        return hasattr(self, "offset_head") and self.offset_head is not None
+
+    @property
+    def with_footprint(self):
+        return hasattr(self, "footprint_head") and self.footprint_head is not None
+
+    @property
+    def with_angle(self):
+        return hasattr(self, "angle_head") and self.angle_head is not None
+
+    @property
+    def with_height(self):
+        return hasattr(self, "height_head") and self.height_head is not None
+
+    @property
+    def with_offset_height(self):
+        return hasattr(self, "offset_height_head") and self.offset_height_head is not None
+
+    @property
+    def with_edge(self):
+        return hasattr(self, "edge_head") and self.edge_head is not None
+
+    @property
+    def with_side_face(self):
+        return hasattr(self, "side_face_head") and self.side_face_head is not None
+
+    @property
+    def with_offset_field(self):
+        return hasattr(self, "offset_field_head") and self.offset_field_head is not None
 
     @property
     def with_shared_head(self):
         """bool: whether the RoI head contains a `shared_head`"""
-        return hasattr(self, 'shared_head') and self.shared_head is not None
+        return hasattr(self, "shared_head") and self.shared_head is not None
 
     @abstractmethod
     def init_bbox_head(self):
@@ -65,34 +99,35 @@ def init_assigner_sampler(self):
         pass
 
     @abstractmethod
-    def forward_train(self,
-                      x,
-                      img_meta,
-                      proposal_list,
-                      gt_bboxes,
-                      gt_labels,
-                      gt_bboxes_ignore=None,
-                      gt_masks=None,
-                      **kwargs):
+    def forward_train(
+        self,
+        x,
+        img_meta,
+        proposal_list,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        **kwargs
+    ):
         """Forward function during training."""
 
-    async def async_simple_test(self,
-                                x,
-                                proposal_list,
-                                img_metas,
-                                proposals=None,
-                                rescale=False,
-                                **kwargs):
+    async def async_simple_test(
+        self, x, proposal_list, img_metas, proposals=None, rescale=False, **kwargs
+    ):
         """Asynchronized test function."""
         raise NotImplementedError
 
-    def simple_test(self,
-                    x,
-                    proposal_list,
-                    img_meta,
-                    proposals=None,
-                    rescale=False,
-                    **kwargs):
+    def simple_test(
+        self,
+        x,
+        # proposal_list,
+        abstractmethod,
+        img_meta,
+        proposals=None,
+        rescale=False,
+        **kwargs
+    ):
         """Test without augmentation."""
 
     def aug_test(self, x, proposal_list, img_metas, rescale=False, **kwargs):
diff --git a/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py b/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py
index 21124b9c..fab436f9 100644
--- a/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py
+++ b/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py
@@ -1,7 +1,9 @@
 # Copyright (c) OpenMMLab. All rights reserved.
+import torch
 import torch.nn as nn
 from mmcv.cnn import ConvModule
 
+from mmdet.core import multi_apply
 from mmdet.models.builder import HEADS
 from mmdet.models.utils import build_linear_layer
 from .bbox_head import BBoxHead
@@ -19,24 +21,32 @@ class ConvFCBBoxHead(BBoxHead):
                                     \-> reg convs -> reg fcs -> reg
     """  # noqa: W605
 
-    def __init__(self,
-                 num_shared_convs=0,
-                 num_shared_fcs=0,
-                 num_cls_convs=0,
-                 num_cls_fcs=0,
-                 num_reg_convs=0,
-                 num_reg_fcs=0,
-                 conv_out_channels=256,
-                 fc_out_channels=1024,
-                 conv_cfg=None,
-                 norm_cfg=None,
-                 init_cfg=None,
-                 *args,
-                 **kwargs):
-        super(ConvFCBBoxHead, self).__init__(
-            *args, init_cfg=init_cfg, **kwargs)
-        assert (num_shared_convs + num_shared_fcs + num_cls_convs +
-                num_cls_fcs + num_reg_convs + num_reg_fcs > 0)
+    def __init__(
+        self,
+        num_shared_convs=0,
+        num_shared_fcs=0,
+        num_cls_convs=0,
+        num_cls_fcs=0,
+        num_reg_convs=0,
+        num_reg_fcs=0,
+        conv_out_channels=256,
+        fc_out_channels=1024,
+        conv_cfg=None,
+        norm_cfg=None,
+        init_cfg=None,
+        *args,
+        **kwargs
+    ):
+        super(ConvFCBBoxHead, self).__init__(*args, init_cfg=init_cfg, **kwargs)
+        assert (
+            num_shared_convs
+            + num_shared_fcs
+            + num_cls_convs
+            + num_cls_fcs
+            + num_reg_convs
+            + num_reg_fcs
+            > 0
+        )
         if num_cls_convs > 0 or num_reg_convs > 0:
             assert num_shared_fcs == 0
         if not self.with_cls:
@@ -55,21 +65,20 @@ def __init__(self,
         self.norm_cfg = norm_cfg
 
         # add shared convs and fcs
-        self.shared_convs, self.shared_fcs, last_layer_dim = \
-            self._add_conv_fc_branch(
-                self.num_shared_convs, self.num_shared_fcs, self.in_channels,
-                True)
+        self.shared_convs, self.shared_fcs, last_layer_dim = self._add_conv_fc_branch(
+            self.num_shared_convs, self.num_shared_fcs, self.in_channels, True
+        )
         self.shared_out_channels = last_layer_dim
 
         # add cls specific branch
-        self.cls_convs, self.cls_fcs, self.cls_last_dim = \
-            self._add_conv_fc_branch(
-                self.num_cls_convs, self.num_cls_fcs, self.shared_out_channels)
+        self.cls_convs, self.cls_fcs, self.cls_last_dim = self._add_conv_fc_branch(
+            self.num_cls_convs, self.num_cls_fcs, self.shared_out_channels
+        )
 
         # add reg specific branch
-        self.reg_convs, self.reg_fcs, self.reg_last_dim = \
-            self._add_conv_fc_branch(
-                self.num_reg_convs, self.num_reg_fcs, self.shared_out_channels)
+        self.reg_convs, self.reg_fcs, self.reg_last_dim = self._add_conv_fc_branch(
+            self.num_reg_convs, self.num_reg_fcs, self.shared_out_channels
+        )
 
         if self.num_shared_fcs == 0 and not self.with_avg_pool:
             if self.num_cls_fcs == 0:
@@ -85,16 +94,13 @@ def __init__(self,
             else:
                 cls_channels = self.num_classes + 1
             self.fc_cls = build_linear_layer(
-                self.cls_predictor_cfg,
-                in_features=self.cls_last_dim,
-                out_features=cls_channels)
+                self.cls_predictor_cfg, in_features=self.cls_last_dim, out_features=cls_channels
+            )
         if self.with_reg:
-            out_dim_reg = (4 if self.reg_class_agnostic else 4 *
-                           self.num_classes)
+            out_dim_reg = 4 if self.reg_class_agnostic else 4 * self.num_classes
             self.fc_reg = build_linear_layer(
-                self.reg_predictor_cfg,
-                in_features=self.reg_last_dim,
-                out_features=out_dim_reg)
+                self.reg_predictor_cfg, in_features=self.reg_last_dim, out_features=out_dim_reg
+            )
 
         if init_cfg is None:
             # when init_cfg is None,
@@ -106,20 +112,13 @@ def __init__(self,
             # for `shared_fcs`, `cls_fcs` and `reg_fcs`
             self.init_cfg += [
                 dict(
-                    type='Xavier',
-                    distribution='uniform',
-                    override=[
-                        dict(name='shared_fcs'),
-                        dict(name='cls_fcs'),
-                        dict(name='reg_fcs')
-                    ])
+                    type="Xavier",
+                    distribution="uniform",
+                    override=[dict(name="shared_fcs"), dict(name="cls_fcs"), dict(name="reg_fcs")],
+                )
             ]
 
-    def _add_conv_fc_branch(self,
-                            num_branch_convs,
-                            num_branch_fcs,
-                            in_channels,
-                            is_shared=False):
+    def _add_conv_fc_branch(self, num_branch_convs, num_branch_fcs, in_channels, is_shared=False):
         """Add shared or separable branch.
 
         convs -> avg pool (optional) -> fcs
@@ -129,8 +128,7 @@ def _add_conv_fc_branch(self,
         branch_convs = nn.ModuleList()
         if num_branch_convs > 0:
             for i in range(num_branch_convs):
-                conv_in_channels = (
-                    last_layer_dim if i == 0 else self.conv_out_channels)
+                conv_in_channels = last_layer_dim if i == 0 else self.conv_out_channels
                 branch_convs.append(
                     ConvModule(
                         conv_in_channels,
@@ -138,21 +136,20 @@ def _add_conv_fc_branch(self,
                         3,
                         padding=1,
                         conv_cfg=self.conv_cfg,
-                        norm_cfg=self.norm_cfg))
+                        norm_cfg=self.norm_cfg,
+                    )
+                )
             last_layer_dim = self.conv_out_channels
         # add branch specific fc layers
         branch_fcs = nn.ModuleList()
         if num_branch_fcs > 0:
             # for shared branch, only consider self.with_avg_pool
             # for separated branches, also consider self.num_shared_fcs
-            if (is_shared
-                    or self.num_shared_fcs == 0) and not self.with_avg_pool:
+            if (is_shared or self.num_shared_fcs == 0) and not self.with_avg_pool:
                 last_layer_dim *= self.roi_feat_area
             for i in range(num_branch_fcs):
-                fc_in_channels = (
-                    last_layer_dim if i == 0 else self.fc_out_channels)
-                branch_fcs.append(
-                    nn.Linear(fc_in_channels, self.fc_out_channels))
+                fc_in_channels = last_layer_dim if i == 0 else self.fc_out_channels
+                branch_fcs.append(nn.Linear(fc_in_channels, self.fc_out_channels))
             last_layer_dim = self.fc_out_channels
         return branch_convs, branch_fcs, last_layer_dim
 
@@ -199,7 +196,6 @@ def forward(self, x):
 
 @HEADS.register_module()
 class Shared2FCBBoxHead(ConvFCBBoxHead):
-
     def __init__(self, fc_out_channels=1024, *args, **kwargs):
         super(Shared2FCBBoxHead, self).__init__(
             num_shared_convs=0,
@@ -210,12 +206,12 @@ def __init__(self, fc_out_channels=1024, *args, **kwargs):
             num_reg_fcs=0,
             fc_out_channels=fc_out_channels,
             *args,
-            **kwargs)
+            **kwargs
+        )
 
 
 @HEADS.register_module()
 class Shared4Conv1FCBBoxHead(ConvFCBBoxHead):
-
     def __init__(self, fc_out_channels=1024, *args, **kwargs):
         super(Shared4Conv1FCBBoxHead, self).__init__(
             num_shared_convs=4,
@@ -226,4 +222,5 @@ def __init__(self, fc_out_channels=1024, *args, **kwargs):
             num_reg_fcs=0,
             fc_out_channels=fc_out_channels,
             *args,
-            **kwargs)
+            **kwargs
+        )
diff --git a/mmdet/models/roi_heads/loft_h_roi_head.py b/mmdet/models/roi_heads/loft_h_roi_head.py
new file mode 100644
index 00000000..aff9005f
--- /dev/null
+++ b/mmdet/models/roi_heads/loft_h_roi_head.py
@@ -0,0 +1,269 @@
+# -*- encoding: utf-8 -*-
+
+import torch
+
+from mmdet.core import bbox2result, bbox2roi
+from ..builder import HEADS, build_head, build_roi_extractor
+from .standard_roi_head import StandardRoIHead
+from .test_mixins import HeightTestMixin, OffsetTestMixin
+
+
+@HEADS.register_module()
+class LoftHRoIHead(StandardRoIHead, OffsetTestMixin, HeightTestMixin):
+    def __init__(
+        self,
+        offset_roi_extractor=None,
+        offset_head=None,
+        height_roi_extractor=None,
+        height_head=None,
+        **kwargs,
+    ):
+        assert offset_head is not None
+        assert height_head is not None
+        super().__init__(**kwargs)
+
+        self.init_offset_head(offset_roi_extractor, offset_head)
+        self.init_height_head(height_roi_extractor, height_head)
+
+        self.with_vis_feat = False
+
+    def init_offset_head(self, offset_roi_extractor, offset_head):
+        """Build offset roi extractor and offset head."""
+        self.offset_roi_extractor = build_roi_extractor(offset_roi_extractor)
+        self.offset_head = build_head(offset_head)
+
+    def init_height_head(self, height_roi_extractor, height_head):
+        """Build height roi extractor and height head."""
+        self.height_roi_extractor = build_roi_extractor(height_roi_extractor)
+        self.height_head = build_head(height_head)
+
+    # def init_weights(self, pretrained):
+    #     super(LoftRoIHead, self).init_weights(pretrained)
+    #     self.offset_head.init_weights()
+    def init_weights(self):
+        super().init_weights()
+        self.offset_head.init_weights()
+        self.height_head.init_weights()
+
+    def forward_train(
+        self,
+        x,
+        img_metas,
+        proposal_list,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        gt_offsets=None,
+        gt_heights=None,
+    ):
+        """
+        Args:
+            x (list[Tensor]): list of multi-level img features.
+
+            img_metas (list[dict]): list of image info dict where each dict
+                has: 'img_shape', 'scale_factor', 'flip', and may also contain
+                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
+                For details on the values of these keys see
+                `mmdet/datasets/pipelines/formatting.py:Collect`.
+
+            proposals (list[Tensors]): list of region proposals.
+
+            gt_bboxes (list[Tensor]): each item are the truth boxes for each
+                image in [tl_x, tl_y, br_x, br_y] format.
+
+            gt_labels (list[Tensor]): class indices corresponding to each box
+
+            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
+                boxes can be ignored when computing the loss.
+
+            gt_masks (None | Tensor) : true segmentation masks for each box
+                used if the architecture supports a segmentation task.
+            gt_heights (None | Tensor): each item are truth heights for each
+                image.
+
+        Returns:
+            dict[str, Tensor]: a dictionary of loss components
+        """
+        # assign gts and sample proposals
+        if self.with_bbox or self.with_mask:
+            num_imgs = len(img_metas)
+            if gt_bboxes_ignore is None:
+                gt_bboxes_ignore = [None for _ in range(num_imgs)]
+            sampling_results = []
+            for i in range(num_imgs):
+                assign_result = self.bbox_assigner.assign(
+                    proposal_list[i], gt_bboxes[i], gt_bboxes_ignore[i], gt_labels[i]
+                )
+                sampling_result = self.bbox_sampler.sample(
+                    assign_result,
+                    proposal_list[i],
+                    gt_bboxes[i],
+                    gt_labels[i],
+                    feats=[lvl_feat[i][None] for lvl_feat in x],
+                )
+                sampling_results.append(sampling_result)
+
+        losses = dict()
+        # bbox head forward and loss
+        if self.with_bbox:
+            bbox_results = self._bbox_forward_train(
+                x, sampling_results, gt_bboxes, gt_labels, img_metas
+            )
+            losses.update(bbox_results["loss_bbox"])
+
+        # mask head forward and loss
+        if self.with_mask:
+            mask_results = self._mask_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_masks, img_metas
+            )
+            # TOD: Support empty tensor input. #2280
+            if mask_results["loss_mask"] is not None:
+                losses.update(mask_results["loss_mask"])
+
+        # offset head forward and loss
+        if self.with_offset:
+            # print("mask_results['mask_pred']: ", mask_results['mask_pred'].shape)
+            # print("mask_results['mask_targets']: ", mask_results['mask_targets'].shape)
+            # print("bbox_results['bbox_feats']: ", bbox_results['bbox_feats'].shape)
+            offset_results = self._offset_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_offsets, img_metas
+            )
+            # TOD: Support empty tensor input. #2280
+            if offset_results["loss_offset"] is not None:
+                losses.update(offset_results["loss_offset"])
+
+        # height head forward and los
+        if self.with_height:
+            height_results = self._height_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_heights, img_metas
+            )
+            if height_results["loss_height"] is not None:
+                losses.update(height_results["loss_height"])
+
+        return losses
+
+    def _offset_forward_train(self, x, sampling_results, bbox_feats, gt_offsets, img_metas):
+        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+        # if pos_rois.shape[0] == 0:
+        #     return dict(loss_offset=None)
+        offset_results = self._offset_forward(x, pos_rois)
+
+        offset_targets = self.offset_head.get_targets(sampling_results, gt_offsets, self.train_cfg)
+
+        loss_offset = self.offset_head.loss(offset_results["offset_pred"], offset_targets)
+
+        offset_results.update(loss_offset=loss_offset, offset_targets=offset_targets)
+        return offset_results
+
+    def _offset_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
+        if rois is not None:
+            offset_feats = self.offset_roi_extractor(
+                x[: self.offset_roi_extractor.num_inputs], rois
+            )
+        else:
+            assert bbox_feats is not None
+            offset_feats = bbox_feats[pos_inds]
+
+        # self._show_offset_feat(rois, offset_feats)
+
+        offset_pred = self.offset_head(offset_feats)
+        offset_results = dict(offset_pred=offset_pred, offset_feats=offset_feats)
+        return offset_results
+
+    def _height_forward_train(self, x, sampling_results, bbox_feats, gt_heights, img_metas):
+        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+        height_results = self._height_forward(x, pos_rois)
+
+        height_targets = self.height_head.get_targets(sampling_results, gt_heights, self.train_cfg)
+
+        loss_height = self.height_head.loss(height_results["height_pred"], height_targets)
+
+        height_results.update(loss_height=loss_height, height_targets=height_targets)
+        return height_results
+
+    def _height_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
+        if rois is not None:
+            height_feats = self.height_roi_extractor(
+                x[: self.height_roi_extractor.num_inputs], rois
+            )
+        else:
+            assert bbox_feats is not None
+            height_feats = bbox_feats[pos_inds]
+
+        height_pred = self.height_head(height_feats)
+        height_results = dict(height_pred=height_pred, height_feats=height_feats)
+        return height_results
+
+    def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, img_metas):
+        """Run forward function and calculate loss for mask head in training."""
+        if not self.share_roi_extractor:
+            pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+            mask_results = self._mask_forward(x, pos_rois)
+        else:
+            pos_inds = []
+            device = bbox_feats.device
+            for res in sampling_results:
+                pos_inds.append(
+                    torch.ones(res.pos_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
+                pos_inds.append(
+                    torch.zeros(res.neg_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
+            pos_inds = torch.cat(pos_inds)
+            mask_results = self._mask_forward(x, pos_inds=pos_inds, bbox_feats=bbox_feats)
+
+        mask_targets = self.mask_head.get_targets(sampling_results, gt_masks, self.train_cfg)
+        pos_labels = torch.cat([res.pos_gt_labels for res in sampling_results])
+        loss_mask = self.mask_head.loss(mask_results["mask_pred"], mask_targets, pos_labels)
+
+        mask_results.update(loss_mask=loss_mask, mask_targets=mask_targets)
+        return mask_results
+
+    def simple_test(self, x, proposal_list, img_metas, proposals=None, rescale=False):
+        """Test without augmentation."""
+
+        assert self.with_bbox, "Bbox head must be implemented."
+
+        det_bboxes, det_labels = self.simple_test_bboxes(
+            x, img_metas, proposal_list, self.test_cfg, rescale=rescale
+        )
+
+        bbox_results = bbox2result(det_bboxes[0], det_labels[0], self.bbox_head.num_classes)
+        # bbox_results = bbox2result(det_bboxes, det_labels,
+        #                            self.bbox_head.num_classes)
+
+        height_results = self.simple_test_height(
+            x, img_metas, det_bboxes[0], det_labels[0], rescale=rescale
+        )
+
+        if self.with_mask:
+            seg_results = self.simple_test_mask(
+                x, img_metas, det_bboxes, det_labels, rescale=rescale
+            )
+            if self.with_vis_feat:
+                offset_results = self.simple_test_offset_rotate_feature(
+                    x, img_metas, det_bboxes, det_labels, rescale=rescale
+                )
+                return (
+                    bbox_results,
+                    seg_results[0],
+                    offset_results,
+                    height_results,
+                    self.vis_featuremap,
+                )
+            else:
+                offset_results = self.simple_test_offset(
+                    x, img_metas, det_bboxes[0], det_labels[0], rescale=rescale
+                )
+                return bbox_results, seg_results[0], offset_results, height_results
+        else:
+            offset_results = self.simple_test_offset(
+                x, img_metas, det_bboxes[0], det_labels[0], rescale=rescale
+            )
+            # offset_results = self.simple_test_offset(
+            #     x, img_metas, det_bboxes, det_labels, rescale=rescale)
+
+            return bbox_results, None, offset_results, height_results
diff --git a/mmdet/models/roi_heads/loft_hfm_roi_head.py b/mmdet/models/roi_heads/loft_hfm_roi_head.py
new file mode 100644
index 00000000..0c7e1795
--- /dev/null
+++ b/mmdet/models/roi_heads/loft_hfm_roi_head.py
@@ -0,0 +1,584 @@
+# -*- encoding: utf-8 -*-
+
+import copy
+import math
+
+import cv2
+import numpy as np
+import torch
+
+from mmdet.core import bbox2result, bbox2roi
+from ..builder import HEADS, build_head, build_roi_extractor
+from ..utils import offset_roof_to_footprint
+from .standard_roi_head import StandardRoIHead
+from .test_mixins import FootprintMaskFromRoofOffsetTestMixin, HeightTestMixin, OffsetTestMixin
+
+
+@HEADS.register_module()
+class LoftHFMRoIHead(  # pylint: disable=abstract-method, too-many-ancestors
+    StandardRoIHead,
+    OffsetTestMixin,
+    HeightTestMixin,
+    FootprintMaskFromRoofOffsetTestMixin,
+):
+    """The base head of all the task-specific head, e.g. offset head, mask head."""
+
+    def __init__(
+        self,
+        offset_roi_extractor=None,
+        offset_head=None,
+        height_roi_extractor=None,
+        height_head=None,
+        footprint_mask_from_roof_offset_head=None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        if offset_head:
+            self.init_offset_head(offset_roi_extractor, offset_head)
+            self.offset_expand_feature_num = offset_head.expand_feature_num
+        if height_head:
+            self.init_height_head(height_roi_extractor, height_head)
+        if footprint_mask_from_roof_offset_head:
+            self.init_footprint_mask_from_roof_offset_head(footprint_mask_from_roof_offset_head)
+
+        # self.with_vis_feat = False
+
+    def init_offset_head(self, offset_roi_extractor, offset_head):
+        """Build offset roi extractor and offset head."""
+        self.offset_roi_extractor = build_roi_extractor(offset_roi_extractor)
+        self.offset_head = build_head(offset_head)
+
+    def init_height_head(self, height_roi_extractor, height_head):
+        """Build height roi extractor and height head."""
+        self.height_roi_extractor = build_roi_extractor(height_roi_extractor)
+        self.height_head = build_head(height_head)
+
+    def init_footprint_mask_from_roof_offset_head(self, footprint_mask_from_roof_offset_head):
+        """Build head that predicts footprint mask from offset and roof mask heads' output."""
+        self.footprint_mask_from_roof_offset_head = build_head(footprint_mask_from_roof_offset_head)
+
+    def init_weights(self):
+        super().init_weights()
+        if self.with_offset:
+            self.offset_head.init_weights()
+        if self.with_height:
+            self.height_head.init_weights()
+        if self.with_footprint_mask_from_roof_offset:
+            self.footprint_mask_from_roof_offset_head.init_weights()
+
+    @property
+    def with_offset(self):
+        """bool: whether the RoI head contains a `offset head`"""
+        return hasattr(self, "offset_head") and self.offset_head is not None
+
+    @property
+    def with_height(self):
+        """bool: whether the RoI head contains a `height_head`"""
+        return hasattr(self, "height_head") and self.height_head is not None
+
+    @property
+    def with_footprint_mask_from_roof_offset(self):
+        """bool: whether the RoI head contains a `footprint_mask_from_roof_offset_head`"""
+        return (
+            hasattr(self, "footprint_mask_from_roof_offset_head")
+            and self.footprint_mask_from_roof_offset_head is not None
+        )
+
+    def forward_train(  # pylint: disable=arguments-differ
+        self,
+        x,
+        img_metas,
+        proposal_list,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        gt_offsets=None,
+        gt_heights=None,
+        gt_footprint_masks=None,
+        gt_footprint_bboxes=None,
+        gt_offset_angles=None,
+        loss_offset_angle_consistency=None,
+        regular_lambda=None,
+        is_semi_supervised_batch=False,
+        is_semi_supervised_stage=False,
+        is_valid_height_batch=True,
+        use_pred_for_offset_angle_consistency=False,
+        with_offset_angle_head=False,
+        shrunk_losses=None,
+        shrunk_factor=1.0,
+        footprint_mask_fro_loss_lambda=1.0,
+        img=None,
+    ):
+        """
+        Args:
+            x (list[Tensor]): list of multi-level img features.
+
+            img_metas (list[dict]): list of image info dict where each dict
+                has: 'img_shape', 'scale_factor', 'flip', and may also contain
+                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
+                For details on the values of these keys see
+                `mmdet/datasets/pipelines/formatting.py:Collect`.
+
+            proposals (list[Tensors]): list of region proposals.
+
+            gt_bboxes (list[Tensor]): each item are the truth boxes for each
+                image in [tl_x, tl_y, br_x, br_y] format.
+
+            gt_labels (list[Tensor]): class indices corresponding to each box
+
+            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
+                boxes can be ignored when computing the loss.
+
+            gt_masks (None | Tensor) : true segmentation masks for each box
+                used if the architecture supports a segmentation task.
+
+            gt_heights (None | Tensor): each item are truth heights for each box.
+
+            gt_footprint_masks (None | Tensor): each item are truth footprint mask for each box.
+
+            is_semi_supervised_batch (bool): whether this batch is a semi supervised batch.
+
+            is_semi_supervised_stage (bool): whether in semi-supervised stage or out.
+
+            shrunk_losses(None, set[str]): whether shrink corresponding losses
+                when facing a ssl batch.
+
+        Returns:
+            dict[str, Tensor]: a dictionary of loss components
+        """
+        if shrunk_losses is None:
+            shrunk_losses = set()
+
+        # Assign gts and sample proposals.
+        if self.with_bbox or self.with_mask:
+            num_imgs = len(img_metas)
+            if gt_bboxes_ignore is None:
+                gt_bboxes_ignore = [None for _ in range(num_imgs)]
+            sampling_results = []
+            assign_results = []
+            for i in range(num_imgs):
+                assign_result = self.bbox_assigner.assign(
+                    proposal_list[i], gt_bboxes[i], gt_bboxes_ignore[i], gt_labels[i]
+                )
+                sampling_result = self.bbox_sampler.sample(
+                    assign_result,
+                    proposal_list[i],
+                    gt_bboxes[i],
+                    gt_labels[i],
+                    feats=[lvl_feat[i][None] for lvl_feat in x],
+                )
+                assign_results.append(assign_result)
+                sampling_results.append(sampling_result)
+
+        losses = dict()
+
+        # Bbox head forward without loss.
+        if self.with_bbox:
+            bbox_results = self._bbox_forward_train(
+                x,
+                sampling_results,
+                gt_bboxes,
+                gt_labels,
+                img_metas,
+            )
+            bbox_loss = bbox_results["loss_bbox"]
+            if is_semi_supervised_batch:
+                if is_semi_supervised_stage:
+                    if "bbox" in shrunk_losses:
+                        bbox_loss["loss_bbox"] *= shrunk_factor
+                    if "cls" in shrunk_losses:
+                        bbox_loss["loss_cls"] *= shrunk_factor
+                    if not is_valid_height_batch:
+                        bbox_loss["loss_bbox"] *= 0.0
+                        bbox_loss["loss_cls"] *= 0.0
+                else:
+                    bbox_loss["loss_bbox"] *= 0.0
+                    bbox_loss["loss_cls"] *= 0.0
+
+            losses.update(bbox_loss)
+
+        # Mask head forward and loss.
+        if self.with_mask:
+            mask_results = self._mask_forward_train(
+                x,
+                sampling_results,
+                bbox_results["bbox_feats"],
+                gt_masks,
+                img_metas,
+            )
+            # TOD: Support empty tensor input. #2280
+            if mask_results["loss_mask"] is not None:
+                mask_loss = mask_results["loss_mask"]
+                if is_semi_supervised_batch:
+                    if is_semi_supervised_stage:
+                        if "mask" in shrunk_losses:
+                            mask_loss["loss_mask"] *= shrunk_factor
+                        if not is_valid_height_batch:
+                            mask_loss["loss_mask"] *= 0.0
+                    else:
+                        mask_loss["loss_mask"] *= 0.0
+                losses.update(mask_loss)
+
+        # Offset head forward and loss.
+        if self.with_offset:
+            offset_results = self._offset_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_offsets, img_metas
+            )
+            # TOD: Support empty tensor input. #2280
+            if offset_results["loss_offset"] is not None:
+                loss_offset = offset_results["loss_offset"]["loss_offset"]
+                offset_results["loss_offset"]["loss_offset"] = torch.where(
+                    torch.isinf(loss_offset), torch.full_like(loss_offset, 0), loss_offset
+                )
+                offset_loss = offset_results["loss_offset"]
+                if is_semi_supervised_batch:
+                    if is_semi_supervised_stage:
+                        if "offset" in shrunk_losses:
+                            offset_loss["loss_offset"] *= shrunk_factor
+                        if not is_valid_height_batch:
+                            offset_loss["loss_offset"] *= 0.0
+                    else:
+                        offset_loss["loss_offset"] *= 0.0
+
+                losses.update(offset_loss)
+
+            instances_cnt = [len(sr.pos_gt_bboxes) for sr in sampling_results]
+
+            if loss_offset_angle_consistency is not None:
+                assert gt_offset_angles is not None
+
+                gt_offset_angles_repeated = []
+                offsets_pred = []
+                start_idx = 0
+                for i, cnt in enumerate(instances_cnt):
+                    gt_offset_angles_repeated.append(gt_offset_angles[i].repeat((cnt, 1)))
+                    offsets_pred.append(
+                        offset_results["offset_pred"][start_idx : start_idx + cnt, :]
+                    )
+                    start_idx += cnt * self.offset_expand_feature_num
+
+                gt_offset_angles_t = torch.cat(gt_offset_angles_repeated, 0)
+                offsets_pred_t = torch.cat(offsets_pred, 0)
+
+                gt_offsets_cat = torch.cat(
+                    [
+                        go[sr.pos_assigned_gt_inds.long(), :]
+                        for sr, go in zip(sampling_results, gt_offsets)
+                    ],
+                    0,
+                )
+                offsets_pred_norm = offsets_pred_t[:, 0] ** 2 + offsets_pred_t[:, 1] ** 2
+                gt_offsets_norm = gt_offsets_cat[:, 0] ** 2 + gt_offsets_cat[:, 1] ** 2
+                loss_offsets_norm = loss_offset_angle_consistency(
+                    offsets_pred_norm, gt_offsets_norm
+                )
+
+                loss_tan_cot = loss_offset_angle_consistency(
+                    offsets_pred_t[:, 0] * gt_offset_angles_t[:, 0],
+                    offsets_pred_t[:, 1] * gt_offset_angles_t[:, 1],
+                )
+
+                loss_ = regular_lambda[0] * loss_offsets_norm + regular_lambda[1] * loss_tan_cot
+
+                if is_semi_supervised_batch:
+                    if is_semi_supervised_stage:
+                        if with_offset_angle_head:
+                            if use_pred_for_offset_angle_consistency:
+                                if "offset_angle" in shrunk_losses:
+                                    loss_ *= shrunk_factor
+                            else:
+                                loss_ *= 0.0
+                        else:
+                            loss_ *= 0.0
+                    else:
+                        loss_ *= 0.0
+
+                losses.update(loss_offset_angle_consistency=loss_)
+
+        # Footprint mask from roof offset head forward and loss.
+        if self.with_offset and self.with_mask and self.with_footprint_mask_from_roof_offset:
+            start_idx = 0
+            offsets_pred = []
+            for i, cnt in enumerate(instances_cnt):
+                offsets_pred.append(offset_results["offset_pred"][start_idx : start_idx + cnt, :])
+                start_idx += cnt * self.offset_expand_feature_num
+
+            offsets_pred_t = torch.cat(offsets_pred, 0)
+            footprint_mask_fro_results = self._footprint_mask_from_roof_offset_forward_train(
+                offsets_pred_t,
+                mask_results["mask_pred"],
+                sampling_results,
+                gt_footprint_masks,
+                img_metas,
+            )
+            if footprint_mask_fro_results["loss_footprint_mask_from_roof_offset"] is not None:
+                footprint_mask_fro_loss = footprint_mask_fro_results[
+                    "loss_footprint_mask_from_roof_offset"
+                ]
+                if is_semi_supervised_batch:
+                    if is_semi_supervised_stage:
+                        footprint_mask_fro_loss[
+                            "loss_footprint_mask_from_roof_offset"
+                        ] *= footprint_mask_fro_loss_lambda
+                    else:
+                        footprint_mask_fro_loss["loss_footprint_mask_from_roof_offset"] *= 0.0
+                losses.update(footprint_mask_fro_loss)
+
+        # Height head forward and loss.
+        if self.with_height:
+            height_results = self._height_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_heights, img_metas
+            )
+            if height_results["loss_height"] is not None:
+                height_loss = height_results["loss_height"]
+                if not is_valid_height_batch:
+                    height_loss["loss_height"] *= 0.0
+                losses.update(height_loss)
+
+        return losses
+
+    def _calculate_offset_angle_from_offset(self, offsets):
+        norm = torch.sqrt(torch.pow(offsets[:, 0], 2) + torch.pow(offsets[:, 1], 2))
+        angle = torch.cat(
+            (torch.unsqueeze(offsets[:, 1] / norm, 1), torch.unsqueeze(offsets[:, 0] / norm, 1)), 1
+        )
+        return angle
+
+    def _calculate_offset_from_angle(self, offsets, angles):
+        offset_x_from_angle = torch.unsqueeze(
+            offsets[:, 1] / (angles[:, 0] + 1e-2) * angles[:, 1], 1
+        )
+        offset_y_from_angle = torch.unsqueeze(
+            offsets[:, 0] / (angles[:, 1] + 1e-2) * angles[:, 0], 1
+        )
+        return torch.cat((offset_x_from_angle, offset_y_from_angle), 1)
+
+    def _offset_forward_train(self, x, sampling_results, bbox_feats, gt_offsets, img_metas):
+        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+        offset_results = self._offset_forward(x, pos_rois)
+        offset_targets = self.offset_head.get_targets(sampling_results, gt_offsets, self.train_cfg)
+        loss_offset = self.offset_head.loss(offset_results["offset_pred"], offset_targets)
+        offset_results.update(loss_offset=loss_offset, offset_targets=offset_targets)
+        return offset_results
+
+    def _offset_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
+        if rois is not None:
+            offset_feats = self.offset_roi_extractor(
+                x[: self.offset_roi_extractor.num_inputs], rois
+            )
+        else:
+            assert bbox_feats is not None
+            offset_feats = bbox_feats[pos_inds]
+
+        # self._show_offset_feat(rois, offset_feats)
+
+        offset_pred = self.offset_head(offset_feats)
+        offset_results = dict(offset_pred=offset_pred, offset_feats=offset_feats)
+        return offset_results
+
+    def _height_forward_train(self, x, sampling_results, bbox_feats, gt_heights, img_metas):
+        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+        height_results = self._height_forward(x, pos_rois)
+        height_targets = self.height_head.get_targets(sampling_results, gt_heights, self.train_cfg)
+        loss_height = self.height_head.loss(height_results["height_pred"], height_targets)
+        height_results.update(loss_height=loss_height, height_targets=height_targets)
+        return height_results
+
+    def _height_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
+        if rois is not None:
+            height_feats = self.height_roi_extractor(
+                x[: self.height_roi_extractor.num_inputs], rois
+            )
+        else:
+            assert bbox_feats is not None
+            height_feats = bbox_feats[pos_inds]
+
+        height_pred = self.height_head(height_feats)
+        height_results = dict(height_pred=height_pred, height_feats=height_feats)
+        return height_results
+
+    def _footprint_mask_from_roof_offset_forward_train(
+        self, offset_pred, roof_pred, sampling_results, gt_footprint_masks, img_metas
+    ):
+        """Run forward function and calculate loss for footprint mask from roof offset head."""
+        footprint_mask_fro_results = self._footprint_mask_from_roof_offset_forward(
+            offset_pred, roof_pred
+        )
+
+        footprint_mask_fro_targets = self.footprint_mask_from_roof_offset_head.get_targets(
+            sampling_results, gt_footprint_masks, self.train_cfg
+        )
+        pos_labels = torch.cat([res.pos_gt_labels for res in sampling_results])
+        loss_footprint_mask_ro = self.footprint_mask_from_roof_offset_head.loss(
+            footprint_mask_fro_results["footprint_mask_from_roof_offset_pred"],
+            footprint_mask_fro_targets,
+            pos_labels,
+        )
+
+        footprint_mask_fro_results.update(
+            loss_footprint_mask_from_roof_offset=loss_footprint_mask_ro,
+            footprint_mask_from_roof_offset_targets=footprint_mask_fro_targets,
+        )
+        return footprint_mask_fro_results
+
+    def _footprint_mask_from_roof_offset_forward(self, offset_pred, roof_pred):
+        """Footprint mask from roof offset head forward function used in training and testing."""
+        footprint_mask_fro_pred = self.footprint_mask_from_roof_offset_head(offset_pred, roof_pred)
+        footprint_mask_fro_results = dict(
+            footprint_mask_from_roof_offset_pred=footprint_mask_fro_pred,
+        )
+        return footprint_mask_fro_results
+
+    def simple_test(
+        self,
+        x,
+        proposal_list,
+        img_metas,
+        proposals=None,
+        rescale=False,
+    ):
+        """Test without augmentation."""
+        det_bboxes, det_labels = self.simple_test_bboxes(
+            x, img_metas, proposal_list, self.test_cfg, rescale=rescale
+        )
+        bbox_results = [
+            bbox2result(bboxes, labels, self.bbox_head.num_classes)
+            for bboxes, labels in zip(det_bboxes, det_labels)
+        ]
+
+        if self.with_offset:
+            offset_results_ = [
+                self.simple_test_offset(x, img_metas, bboxes, labels, rescale=rescale)
+                for bboxes, labels in zip(det_bboxes, det_labels)
+            ]
+            offset_results = [ele[0] for ele in offset_results_]
+            instances_cnt = [len(ele[1]) for ele in offset_results_]
+            offset_preds = torch.concat(
+                [ele[1][: int(cnt / 4), :] for ele, cnt in zip(offset_results_, instances_cnt)]
+            )
+        else:
+            offset_results, offset_preds = [None for _ in det_bboxes], None
+
+        if self.with_mask:
+            roof_mask_results, roof_mask_preds = self.simple_test_mask(
+                x, img_metas, det_bboxes, det_labels, rescale=rescale
+            )
+        else:
+            roof_mask_results, roof_mask_preds = [None for _ in det_bboxes], None
+
+        if self.with_height:
+            height_results = [
+                self.simple_test_height(x, img_metas, bboxes, labels, rescale=rescale)
+                for bboxes, labels in zip(det_bboxes, det_labels)
+            ]
+        else:
+            height_results = [None for _ in det_bboxes]
+
+        if self.with_offset and self.with_mask and self.with_footprint_mask_from_roof_offset:
+            footprint_mask_fro_results = self.simple_test_footprint_mask_fro(
+                img_metas, offset_preds, roof_mask_preds, det_bboxes, det_labels
+            )
+        else:
+            footprint_mask_fro_results = [None for _ in det_bboxes]
+
+        footprint_from_roof = (
+            offset_roof_to_footprint(offset_results, roof_mask_results, True)
+            if self.with_offset and self.with_mask
+            else [None for _ in det_bboxes]
+        )
+
+        return (
+            bbox_results,
+            offset_results,
+            roof_mask_results,
+            height_results,
+            footprint_from_roof,
+            footprint_mask_fro_results,
+        )
+
+
+def visualize_masks_bboxes(
+    img,
+    img_metas,
+    building_bboxes,
+    roof_masks,
+    is_semi_supervised_batch=False,
+    footprint_bboxes=None,
+    footprint_masks=None,
+):
+    save_path = "tmp/ROI_bboxes_masks_visualization_angle_bbox/"
+    mean = np.array([123.675, 116.28, 103.53], dtype=np.float64).reshape(1, -1)
+    std = np.array([58.395, 57.12, 57.375], dtype=np.float64).reshape(1, -1)
+
+    img_batch = img.detach().cpu().numpy().copy()
+    building_bboxes = [sb.detach().cpu().numpy().astype(np.int32) for sb in building_bboxes]
+    roof_masks = [sm.masks.astype(np.uint8) for sm in roof_masks]
+
+    if footprint_bboxes:
+        footprint_bboxes = [sb.detach().cpu().numpy().astype(np.int32) for sb in footprint_bboxes]
+    else:
+        footprint_bboxes = [None for _ in building_bboxes]
+
+    if footprint_masks:
+        footprint_masks = [sm.masks.astype(np.uint8) for sm in footprint_masks]
+    else:
+        footprint_masks = [None for _ in roof_masks]
+
+    for img, sbs, sms, sfbs, sfms, img_meta in zip(
+        img_batch, building_bboxes, roof_masks, footprint_bboxes, footprint_masks, img_metas
+    ):
+        file_name = img_meta["filename"].rsplit("/", 1)[1]
+        file_name_split = file_name.rsplit(".", 1)
+        file_name_split[0] += "_pseudo_gt" if is_semi_supervised_batch else "_gt"
+        file_name = save_path + ".".join((file_name_split[0], file_name_split[1]))
+
+        img_o = np.ascontiguousarray(img.transpose(1, 2, 0))
+        cv2.multiply(img_o, std, img_o)
+        cv2.add(img_o, mean, img_o)
+        cv2.cvtColor(img_o, cv2.COLOR_RGB2BGR, img_o)
+        img = img_o.astype(np.uint8).copy()
+        img_o = img_o.astype(np.uint8).copy()
+
+        alpha = 0.6
+        beta = 1 - alpha
+
+        sms = np.sum(sms, axis=0).astype(np.uint8)
+        sms[np.where(sms != 0)] = 1
+        sms = np.expand_dims(sms, axis=2).repeat(3, axis=2)
+        sms *= np.array([136, 14, 79], dtype=np.uint8)
+
+        cv2.addWeighted(img, alpha, sms, beta, 0, img)
+        img = img.astype(np.float64)
+        img[np.where(sms != 1)] *= float(4.99999 / 3)
+        img = img.astype(np.uint8)
+
+        if sfms is not None:
+            sfms = np.sum(sfms, axis=0).astype(np.uint8)
+            sfms[np.where(sfms != 0)] = 1
+            sfms = np.expand_dims(sfms, axis=2).repeat(3, axis=2)
+            sfms *= np.array([0, 180, 0], dtype=np.uint8)
+
+            cv2.addWeighted(img, alpha, sfms, beta, 0, img)
+            img = img.astype(np.float64)
+            img[np.where(sms != 1)] *= float(4.99999 / 3)
+            img = img.astype(np.uint8)
+
+        bbox_pts = [
+            np.array([[sb[0], sb[1]], [sb[2], sb[1]], [sb[2], sb[3]], [sb[0], sb[3]]]) for sb in sbs
+        ]
+        cv2.polylines(img, bbox_pts, True, (180, 0, 0), 2)
+
+        if sfbs is not None:
+            f_bbox_pts = [
+                np.array([[sb[0], sb[1]], [sb[2], sb[1]], [sb[2], sb[3]], [sb[0], sb[3]]])
+                for sb in sfbs
+            ]
+            cv2.polylines(img, f_bbox_pts, True, (0, 0, 180), 2)
+
+        img = np.concatenate((img, img_o), axis=1)
+
+        cv2.imwrite(file_name, img)
diff --git a/mmdet/models/roi_heads/loft_roi_head.py b/mmdet/models/roi_heads/loft_roi_head.py
new file mode 100644
index 00000000..f8e3fe1d
--- /dev/null
+++ b/mmdet/models/roi_heads/loft_roi_head.py
@@ -0,0 +1,210 @@
+# -*- encoding: utf-8 -*-
+
+from abc import abstractmethod
+
+import numpy as np
+import torch
+
+from mmdet.core import bbox2result, bbox2roi, roi2bbox
+from ..builder import HEADS, build_head, build_roi_extractor
+from .standard_roi_head import StandardRoIHead
+from .test_mixins import OffsetTestMixin
+
+
+@HEADS.register_module()
+class LoftRoIHead(StandardRoIHead, OffsetTestMixin):
+    def __init__(self, offset_roi_extractor=None, offset_head=None, **kwargs):
+        assert offset_head is not None
+        super(LoftRoIHead, self).__init__(**kwargs)
+
+        if offset_head is not None:
+            self.init_offset_head(offset_roi_extractor, offset_head)
+
+        self.with_vis_feat = False
+
+    def init_offset_head(self, offset_roi_extractor, offset_head):
+        self.offset_roi_extractor = build_roi_extractor(offset_roi_extractor)
+        self.offset_head = build_head(offset_head)
+
+    # def init_weights(self, pretrained):
+    #     super(LoftRoIHead, self).init_weights(pretrained)
+    #     self.offset_head.init_weights()
+    def init_weights(self):
+        super(LoftRoIHead, self).init_weights()
+        self.offset_head.init_weights()
+
+    def forward_train(
+        self,
+        x,
+        img_metas,
+        proposal_list,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        gt_offsets=None,
+    ):
+        """
+        Args:
+            x (list[Tensor]): list of multi-level img features.
+
+            img_metas (list[dict]): list of image info dict where each dict
+                has: 'img_shape', 'scale_factor', 'flip', and may also contain
+                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
+                For details on the values of these keys see
+                `mmdet/datasets/pipelines/formatting.py:Collect`.
+
+            proposals (list[Tensors]): list of region proposals.
+
+            gt_bboxes (list[Tensor]): each item are the truth boxes for each
+                image in [tl_x, tl_y, br_x, br_y] format.
+
+            gt_labels (list[Tensor]): class indices corresponding to each box
+
+            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
+                boxes can be ignored when computing the loss.
+
+            gt_masks (None | Tensor) : true segmentation masks for each box
+                used if the architecture supports a segmentation task.
+
+        Returns:
+            dict[str, Tensor]: a dictionary of loss components
+        """
+        # assign gts and sample proposals
+        if self.with_bbox or self.with_mask:
+            num_imgs = len(img_metas)
+            if gt_bboxes_ignore is None:
+                gt_bboxes_ignore = [None for _ in range(num_imgs)]
+            sampling_results = []
+            for i in range(num_imgs):
+                assign_result = self.bbox_assigner.assign(
+                    proposal_list[i], gt_bboxes[i], gt_bboxes_ignore[i], gt_labels[i]
+                )
+                sampling_result = self.bbox_sampler.sample(
+                    assign_result,
+                    proposal_list[i],
+                    gt_bboxes[i],
+                    gt_labels[i],
+                    feats=[lvl_feat[i][None] for lvl_feat in x],
+                )
+                sampling_results.append(sampling_result)
+
+        losses = dict()
+        # bbox head forward and loss
+        if self.with_bbox:
+            bbox_results = self._bbox_forward_train(
+                x, sampling_results, gt_bboxes, gt_labels, img_metas
+            )
+            losses.update(bbox_results["loss_bbox"])
+
+        # mask head forward and loss
+        if self.with_mask:
+            mask_results = self._mask_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_masks, img_metas
+            )
+            # TODO: Support empty tensor input. #2280
+            if mask_results["loss_mask"] is not None:
+                losses.update(mask_results["loss_mask"])
+
+        if self.with_offset:
+            # print("mask_results['mask_pred']: ", mask_results['mask_pred'].shape)
+            # print("mask_results['mask_targets']: ", mask_results['mask_targets'].shape)
+            # print("bbox_results['bbox_feats']: ", bbox_results['bbox_feats'].shape)
+            offset_results = self._offset_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_offsets, img_metas
+            )
+            # TODO: Support empty tensor input. #2280
+            if offset_results["loss_offset"] is not None:
+                losses.update(offset_results["loss_offset"])
+
+        return losses
+
+    def _offset_forward_train(self, x, sampling_results, bbox_feats, gt_offsets, img_metas):
+        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+        # if pos_rois.shape[0] == 0:
+        #     return dict(loss_offset=None)
+        offset_results = self._offset_forward(x, pos_rois)
+
+        offset_targets = self.offset_head.get_targets(sampling_results, gt_offsets, self.train_cfg)
+
+        loss_offset = self.offset_head.loss(offset_results["offset_pred"], offset_targets)
+
+        offset_results.update(loss_offset=loss_offset, offset_targets=offset_targets)
+        return offset_results
+
+    def _offset_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
+        if rois is not None:
+            offset_feats = self.offset_roi_extractor(
+                x[: self.offset_roi_extractor.num_inputs], rois
+            )
+        else:
+            assert bbox_feats is not None
+            offset_feats = bbox_feats[pos_inds]
+
+        # self._show_offset_feat(rois, offset_feats)
+
+        offset_pred = self.offset_head(offset_feats)
+        offset_results = dict(offset_pred=offset_pred, offset_feats=offset_feats)
+        return offset_results
+
+    def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, img_metas):
+        """Run forward function and calculate loss for mask head in
+        training."""
+        if not self.share_roi_extractor:
+            pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
+            mask_results = self._mask_forward(x, pos_rois)
+        else:
+            pos_inds = []
+            device = bbox_feats.device
+            for res in sampling_results:
+                pos_inds.append(
+                    torch.ones(res.pos_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
+                pos_inds.append(
+                    torch.zeros(res.neg_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
+            pos_inds = torch.cat(pos_inds)
+            mask_results = self._mask_forward(x, pos_inds=pos_inds, bbox_feats=bbox_feats)
+
+        mask_targets = self.mask_head.get_targets(sampling_results, gt_masks, self.train_cfg)
+        pos_labels = torch.cat([res.pos_gt_labels for res in sampling_results])
+        loss_mask = self.mask_head.loss(mask_results["mask_pred"], mask_targets, pos_labels)
+
+        mask_results.update(loss_mask=loss_mask, mask_targets=mask_targets)
+        return mask_results
+
+    def simple_test(self, x, proposal_list, img_metas, proposals=None, rescale=False):
+        """Test without augmentation."""
+        assert self.with_bbox, "Bbox head must be implemented."
+
+        det_bboxes, det_labels = self.simple_test_bboxes(
+            x, img_metas, proposal_list, self.test_cfg, rescale=rescale
+        )
+
+        bbox_results = bbox2result(det_bboxes[0], det_labels[0], self.bbox_head.num_classes)
+        # bbox_results = bbox2result(det_bboxes, det_labels,
+        #                            self.bbox_head.num_classes)
+
+        if self.with_mask:
+            segm_results = self.simple_test_mask(
+                x, img_metas, det_bboxes, det_labels, rescale=rescale
+            )
+            if self.with_vis_feat:
+                offset_results = self.simple_test_offset_rotate_feature(
+                    x, img_metas, det_bboxes, det_labels, rescale=rescale
+                )
+                return bbox_results, segm_results[0], offset_results, self.vis_featuremap
+            else:
+                offset_results = self.simple_test_offset(
+                    x, img_metas, det_bboxes[0], det_labels[0], rescale=rescale
+                )
+                return bbox_results, segm_results[0], offset_results
+        else:
+            offset_results = self.simple_test_offset(
+                x, img_metas, det_bboxes[0], det_labels[0], rescale=rescale
+            )
+            # offset_results = self.simple_test_offset(
+            #     x, img_metas, det_bboxes, det_labels, rescale=rescale)
+
+            return bbox_results, None, offset_results
diff --git a/mmdet/models/roi_heads/standard_roi_head.py b/mmdet/models/roi_heads/standard_roi_head.py
index 3fdd82ad..49d7d32e 100644
--- a/mmdet/models/roi_heads/standard_roi_head.py
+++ b/mmdet/models/roi_heads/standard_roi_head.py
@@ -17,8 +17,7 @@ def init_assigner_sampler(self):
         self.bbox_sampler = None
         if self.train_cfg:
             self.bbox_assigner = build_assigner(self.train_cfg.assigner)
-            self.bbox_sampler = build_sampler(
-                self.train_cfg.sampler, context=self)
+            self.bbox_sampler = build_sampler(self.train_cfg.sampler, context=self)
 
     def init_bbox_head(self, bbox_roi_extractor, bbox_head):
         """Initialize ``bbox_head``"""
@@ -42,24 +41,25 @@ def forward_dummy(self, x, proposals):
         rois = bbox2roi([proposals])
         if self.with_bbox:
             bbox_results = self._bbox_forward(x, rois)
-            outs = outs + (bbox_results['cls_score'],
-                           bbox_results['bbox_pred'])
+            outs = outs + (bbox_results["cls_score"], bbox_results["bbox_pred"])
         # mask head
         if self.with_mask:
             mask_rois = rois[:100]
             mask_results = self._mask_forward(x, mask_rois)
-            outs = outs + (mask_results['mask_pred'], )
+            outs = outs + (mask_results["mask_pred"],)
         return outs
 
-    def forward_train(self,
-                      x,
-                      img_metas,
-                      proposal_list,
-                      gt_bboxes,
-                      gt_labels,
-                      gt_bboxes_ignore=None,
-                      gt_masks=None,
-                      **kwargs):
+    def forward_train(
+        self,
+        x,
+        img_metas,
+        proposal_list,
+        gt_bboxes,
+        gt_labels,
+        gt_bboxes_ignore=None,
+        gt_masks=None,
+        **kwargs
+    ):
         """
         Args:
             x (list[Tensor]): list of multi-level img features.
@@ -88,65 +88,62 @@ def forward_train(self,
             sampling_results = []
             for i in range(num_imgs):
                 assign_result = self.bbox_assigner.assign(
-                    proposal_list[i], gt_bboxes[i], gt_bboxes_ignore[i],
-                    gt_labels[i])
+                    proposal_list[i], gt_bboxes[i], gt_bboxes_ignore[i], gt_labels[i]
+                )
                 sampling_result = self.bbox_sampler.sample(
                     assign_result,
                     proposal_list[i],
                     gt_bboxes[i],
                     gt_labels[i],
-                    feats=[lvl_feat[i][None] for lvl_feat in x])
+                    feats=[lvl_feat[i][None] for lvl_feat in x],
+                )
                 sampling_results.append(sampling_result)
 
         losses = dict()
         # bbox head forward and loss
         if self.with_bbox:
-            bbox_results = self._bbox_forward_train(x, sampling_results,
-                                                    gt_bboxes, gt_labels,
-                                                    img_metas)
-            losses.update(bbox_results['loss_bbox'])
+            bbox_results = self._bbox_forward_train(
+                x, sampling_results, gt_bboxes, gt_labels, img_metas
+            )
+            losses.update(bbox_results["loss_bbox"])
 
         # mask head forward and loss
         if self.with_mask:
-            mask_results = self._mask_forward_train(x, sampling_results,
-                                                    bbox_results['bbox_feats'],
-                                                    gt_masks, img_metas)
-            losses.update(mask_results['loss_mask'])
+            mask_results = self._mask_forward_train(
+                x, sampling_results, bbox_results["bbox_feats"], gt_masks, img_metas
+            )
+            losses.update(mask_results["loss_mask"])
 
         return losses
 
     def _bbox_forward(self, x, rois):
         """Box head forward function used in both training and testing."""
         # TODO: a more flexible way to decide which feature maps to use
-        bbox_feats = self.bbox_roi_extractor(
-            x[:self.bbox_roi_extractor.num_inputs], rois)
+        bbox_feats = self.bbox_roi_extractor(x[: self.bbox_roi_extractor.num_inputs], rois)
         if self.with_shared_head:
             bbox_feats = self.shared_head(bbox_feats)
         cls_score, bbox_pred = self.bbox_head(bbox_feats)
 
-        bbox_results = dict(
-            cls_score=cls_score, bbox_pred=bbox_pred, bbox_feats=bbox_feats)
+        bbox_results = dict(cls_score=cls_score, bbox_pred=bbox_pred, bbox_feats=bbox_feats)
         return bbox_results
 
-    def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels,
-                            img_metas):
+    def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels, img_metas):
         """Run forward function and calculate loss for box head in training."""
         rois = bbox2roi([res.bboxes for res in sampling_results])
         bbox_results = self._bbox_forward(x, rois)
 
-        bbox_targets = self.bbox_head.get_targets(sampling_results, gt_bboxes,
-                                                  gt_labels, self.train_cfg)
-        loss_bbox = self.bbox_head.loss(bbox_results['cls_score'],
-                                        bbox_results['bbox_pred'], rois,
-                                        *bbox_targets)
+        bbox_targets = self.bbox_head.get_targets(
+            sampling_results, gt_bboxes, gt_labels, self.train_cfg
+        )
+        loss_bbox = self.bbox_head.loss(
+            bbox_results["cls_score"], bbox_results["bbox_pred"], rois, *bbox_targets
+        )
 
         bbox_results.update(loss_bbox=loss_bbox)
         return bbox_results
 
-    def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks,
-                            img_metas):
-        """Run forward function and calculate loss for mask head in
-        training."""
+    def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, img_metas):
+        """Run forward function and calculate loss for mask head in training."""
         if not self.share_roi_extractor:
             pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
             mask_results = self._mask_forward(x, pos_rois)
@@ -155,36 +152,27 @@ def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks,
             device = bbox_feats.device
             for res in sampling_results:
                 pos_inds.append(
-                    torch.ones(
-                        res.pos_bboxes.shape[0],
-                        device=device,
-                        dtype=torch.uint8))
+                    torch.ones(res.pos_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
                 pos_inds.append(
-                    torch.zeros(
-                        res.neg_bboxes.shape[0],
-                        device=device,
-                        dtype=torch.uint8))
+                    torch.zeros(res.neg_bboxes.shape[0], device=device, dtype=torch.uint8)
+                )
             pos_inds = torch.cat(pos_inds)
 
-            mask_results = self._mask_forward(
-                x, pos_inds=pos_inds, bbox_feats=bbox_feats)
+            mask_results = self._mask_forward(x, pos_inds=pos_inds, bbox_feats=bbox_feats)
 
-        mask_targets = self.mask_head.get_targets(sampling_results, gt_masks,
-                                                  self.train_cfg)
+        mask_targets = self.mask_head.get_targets(sampling_results, gt_masks, self.train_cfg)
         pos_labels = torch.cat([res.pos_gt_labels for res in sampling_results])
-        loss_mask = self.mask_head.loss(mask_results['mask_pred'],
-                                        mask_targets, pos_labels)
+        loss_mask = self.mask_head.loss(mask_results["mask_pred"], mask_targets, pos_labels)
 
         mask_results.update(loss_mask=loss_mask, mask_targets=mask_targets)
         return mask_results
 
     def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
         """Mask head forward function used in both training and testing."""
-        assert ((rois is not None) ^
-                (pos_inds is not None and bbox_feats is not None))
+        assert (rois is not None) ^ (pos_inds is not None and bbox_feats is not None)
         if rois is not None:
-            mask_feats = self.mask_roi_extractor(
-                x[:self.mask_roi_extractor.num_inputs], rois)
+            mask_feats = self.mask_roi_extractor(x[: self.mask_roi_extractor.num_inputs], rois)
             if self.with_shared_head:
                 mask_feats = self.shared_head(mask_feats)
         else:
@@ -195,19 +183,14 @@ def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
         mask_results = dict(mask_pred=mask_pred, mask_feats=mask_feats)
         return mask_results
 
-    async def async_simple_test(self,
-                                x,
-                                proposal_list,
-                                img_metas,
-                                proposals=None,
-                                rescale=False):
+    async def async_simple_test(self, x, proposal_list, img_metas, proposals=None, rescale=False):
         """Async test without augmentation."""
-        assert self.with_bbox, 'Bbox head must be implemented.'
+        assert self.with_bbox, "Bbox head must be implemented."
 
         det_bboxes, det_labels = await self.async_test_bboxes(
-            x, img_metas, proposal_list, self.test_cfg, rescale=rescale)
-        bbox_results = bbox2result(det_bboxes, det_labels,
-                                   self.bbox_head.num_classes)
+            x, img_metas, proposal_list, self.test_cfg, rescale=rescale
+        )
+        bbox_results = bbox2result(det_bboxes, det_labels, self.bbox_head.num_classes)
         if not self.with_mask:
             return bbox_results
         else:
@@ -217,15 +200,11 @@ async def async_simple_test(self,
                 det_bboxes,
                 det_labels,
                 rescale=rescale,
-                mask_test_cfg=self.test_cfg.get('mask'))
+                mask_test_cfg=self.test_cfg.get("mask"),
+            )
             return bbox_results, segm_results
 
-    def simple_test(self,
-                    x,
-                    proposal_list,
-                    img_metas,
-                    proposals=None,
-                    rescale=False):
+    def simple_test(self, x, proposal_list, img_metas, proposals=None, rescale=False):
         """Test without augmentation.
 
         Args:
@@ -248,14 +227,14 @@ def simple_test(self,
             The outer list corresponds to each image, and first element
             of tuple is bbox results, second element is mask results.
         """
-        assert self.with_bbox, 'Bbox head must be implemented.'
+        assert self.with_bbox, "Bbox head must be implemented."
 
         det_bboxes, det_labels = self.simple_test_bboxes(
-            x, img_metas, proposal_list, self.test_cfg, rescale=rescale)
+            x, img_metas, proposal_list, self.test_cfg, rescale=rescale
+        )
 
         bbox_results = [
-            bbox2result(det_bboxes[i], det_labels[i],
-                        self.bbox_head.num_classes)
+            bbox2result(det_bboxes[i], det_labels[i], self.bbox_head.num_classes)
             for i in range(len(det_bboxes))
         ]
 
@@ -263,7 +242,8 @@ def simple_test(self,
             return bbox_results
         else:
             segm_results = self.simple_test_mask(
-                x, img_metas, det_bboxes, det_labels, rescale=rescale)
+                x, img_metas, det_bboxes, det_labels, rescale=rescale
+            )
             return list(zip(bbox_results, segm_results))
 
     def aug_test(self, x, proposal_list, img_metas, rescale=False):
@@ -272,37 +252,34 @@ def aug_test(self, x, proposal_list, img_metas, rescale=False):
         If rescale is False, then returned bboxes and masks will fit the scale
         of imgs[0].
         """
-        det_bboxes, det_labels = self.aug_test_bboxes(x, img_metas,
-                                                      proposal_list,
-                                                      self.test_cfg)
+        det_bboxes, det_labels = self.aug_test_bboxes(x, img_metas, proposal_list, self.test_cfg)
         if rescale:
             _det_bboxes = det_bboxes
         else:
             _det_bboxes = det_bboxes.clone()
-            _det_bboxes[:, :4] *= det_bboxes.new_tensor(
-                img_metas[0][0]['scale_factor'])
-        bbox_results = bbox2result(_det_bboxes, det_labels,
-                                   self.bbox_head.num_classes)
+            _det_bboxes[:, :4] *= det_bboxes.new_tensor(img_metas[0][0]["scale_factor"])
+        bbox_results = bbox2result(_det_bboxes, det_labels, self.bbox_head.num_classes)
 
         # det_bboxes always keep the original scale
         if self.with_mask:
-            segm_results = self.aug_test_mask(x, img_metas, det_bboxes,
-                                              det_labels)
+            segm_results = self.aug_test_mask(x, img_metas, det_bboxes, det_labels)
             return [(bbox_results, segm_results)]
         else:
             return [bbox_results]
 
     def onnx_export(self, x, proposals, img_metas, rescale=False):
         """Test without augmentation."""
-        assert self.with_bbox, 'Bbox head must be implemented.'
+        assert self.with_bbox, "Bbox head must be implemented."
         det_bboxes, det_labels = self.bbox_onnx_export(
-            x, img_metas, proposals, self.test_cfg, rescale=rescale)
+            x, img_metas, proposals, self.test_cfg, rescale=rescale
+        )
 
         if not self.with_mask:
             return det_bboxes, det_labels
         else:
             segm_results = self.mask_onnx_export(
-                x, img_metas, det_bboxes, det_labels, rescale=rescale)
+                x, img_metas, det_bboxes, det_labels, rescale=rescale
+            )
             return det_bboxes, det_labels, segm_results
 
     def mask_onnx_export(self, x, img_metas, det_bboxes, det_labels, **kwargs):
@@ -323,32 +300,34 @@ def mask_onnx_export(self, x, img_metas, det_bboxes, det_labels, **kwargs):
         # image shapes of images in the batch
 
         if all(det_bbox.shape[0] == 0 for det_bbox in det_bboxes):
-            raise RuntimeError('[ONNX Error] Can not record MaskHead '
-                               'as it has not been executed this time')
+            raise RuntimeError(
+                "[ONNX Error] Can not record MaskHead " "as it has not been executed this time"
+            )
         batch_size = det_bboxes.size(0)
         # if det_bboxes is rescaled to the original image size, we need to
         # rescale it back to the testing scale to obtain RoIs.
         det_bboxes = det_bboxes[..., :4]
-        batch_index = torch.arange(
-            det_bboxes.size(0), device=det_bboxes.device).float().view(
-                -1, 1, 1).expand(det_bboxes.size(0), det_bboxes.size(1), 1)
+        batch_index = (
+            torch.arange(det_bboxes.size(0), device=det_bboxes.device)
+            .float()
+            .view(-1, 1, 1)
+            .expand(det_bboxes.size(0), det_bboxes.size(1), 1)
+        )
         mask_rois = torch.cat([batch_index, det_bboxes], dim=-1)
         mask_rois = mask_rois.view(-1, 5)
         mask_results = self._mask_forward(x, mask_rois)
-        mask_pred = mask_results['mask_pred']
-        max_shape = img_metas[0]['img_shape_for_onnx']
+        mask_pred = mask_results["mask_pred"]
+        max_shape = img_metas[0]["img_shape_for_onnx"]
         num_det = det_bboxes.shape[1]
         det_bboxes = det_bboxes.reshape(-1, 4)
         det_labels = det_labels.reshape(-1)
-        segm_results = self.mask_head.onnx_export(mask_pred, det_bboxes,
-                                                  det_labels, self.test_cfg,
-                                                  max_shape)
-        segm_results = segm_results.reshape(batch_size, num_det, max_shape[0],
-                                            max_shape[1])
+        segm_results = self.mask_head.onnx_export(
+            mask_pred, det_bboxes, det_labels, self.test_cfg, max_shape
+        )
+        segm_results = segm_results.reshape(batch_size, num_det, max_shape[0], max_shape[1])
         return segm_results
 
-    def bbox_onnx_export(self, x, img_metas, proposals, rcnn_test_cfg,
-                         **kwargs):
+    def bbox_onnx_export(self, x, img_metas, proposals, rcnn_test_cfg, **kwargs):
         """Export bbox branch to onnx which supports batch inference.
 
         Args:
@@ -363,16 +342,17 @@ def bbox_onnx_export(self, x, img_metas, proposals, rcnn_test_cfg,
                 and class labels of shape [N, num_bboxes].
         """
         # get origin input shape to support onnx dynamic input shape
-        assert len(
-            img_metas
-        ) == 1, 'Only support one input image while in exporting to ONNX'
-        img_shapes = img_metas[0]['img_shape_for_onnx']
+        assert len(img_metas) == 1, "Only support one input image while in exporting to ONNX"
+        img_shapes = img_metas[0]["img_shape_for_onnx"]
 
         rois = proposals
 
-        batch_index = torch.arange(
-            rois.size(0), device=rois.device).float().view(-1, 1, 1).expand(
-                rois.size(0), rois.size(1), 1)
+        batch_index = (
+            torch.arange(rois.size(0), device=rois.device)
+            .float()
+            .view(-1, 1, 1)
+            .expand(rois.size(0), rois.size(1), 1)
+        )
 
         rois = torch.cat([batch_index, rois[..., :4]], dim=-1)
         batch_size = rois.shape[0]
@@ -381,17 +361,16 @@ def bbox_onnx_export(self, x, img_metas, proposals, rcnn_test_cfg,
         # Eliminate the batch dimension
         rois = rois.view(-1, 5)
         bbox_results = self._bbox_forward(x, rois)
-        cls_score = bbox_results['cls_score']
-        bbox_pred = bbox_results['bbox_pred']
+        cls_score = bbox_results["cls_score"]
+        bbox_pred = bbox_results["bbox_pred"]
 
         # Recover the batch dimension
         rois = rois.reshape(batch_size, num_proposals_per_img, rois.size(-1))
-        cls_score = cls_score.reshape(batch_size, num_proposals_per_img,
-                                      cls_score.size(-1))
+        cls_score = cls_score.reshape(batch_size, num_proposals_per_img, cls_score.size(-1))
 
-        bbox_pred = bbox_pred.reshape(batch_size, num_proposals_per_img,
-                                      bbox_pred.size(-1))
+        bbox_pred = bbox_pred.reshape(batch_size, num_proposals_per_img, bbox_pred.size(-1))
         det_bboxes, det_labels = self.bbox_head.onnx_export(
-            rois, cls_score, bbox_pred, img_shapes, cfg=rcnn_test_cfg)
+            rois, cls_score, bbox_pred, img_shapes, cfg=rcnn_test_cfg
+        )
 
         return det_bboxes, det_labels
diff --git a/mmdet/models/roi_heads/test_mixins.py b/mmdet/models/roi_heads/test_mixins.py
index ae6e79ae..f21fefaa 100644
--- a/mmdet/models/roi_heads/test_mixins.py
+++ b/mmdet/models/roi_heads/test_mixins.py
@@ -5,39 +5,32 @@
 import numpy as np
 import torch
 
-from mmdet.core import (bbox2roi, bbox_mapping, merge_aug_bboxes,
-                        merge_aug_masks, multiclass_nms)
+from mmdet.core import bbox2roi, bbox_mapping, merge_aug_bboxes, merge_aug_masks, multiclass_nms
 
 if sys.version_info >= (3, 7):
     from mmdet.utils.contextmanagers import completed
 
 
 class BBoxTestMixin:
-
     if sys.version_info >= (3, 7):
 
-        async def async_test_bboxes(self,
-                                    x,
-                                    img_metas,
-                                    proposals,
-                                    rcnn_test_cfg,
-                                    rescale=False,
-                                    **kwargs):
+        async def async_test_bboxes(
+            self, x, img_metas, proposals, rcnn_test_cfg, rescale=False, **kwargs
+        ):
             """Asynchronized test for box head without augmentation."""
             rois = bbox2roi(proposals)
             roi_feats = self.bbox_roi_extractor(
-                x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
+                x[: len(self.bbox_roi_extractor.featmap_strides)], rois
+            )
             if self.with_shared_head:
                 roi_feats = self.shared_head(roi_feats)
-            sleep_interval = rcnn_test_cfg.get('async_sleep_interval', 0.017)
+            sleep_interval = rcnn_test_cfg.get("async_sleep_interval", 0.017)
 
-            async with completed(
-                    __name__, 'bbox_head_forward',
-                    sleep_interval=sleep_interval):
+            async with completed(__name__, "bbox_head_forward", sleep_interval=sleep_interval):
                 cls_score, bbox_pred = self.bbox_head(roi_feats)
 
-            img_shape = img_metas[0]['img_shape']
-            scale_factor = img_metas[0]['scale_factor']
+            img_shape = img_metas[0]["img_shape"]
+            scale_factor = img_metas[0]["scale_factor"]
             det_bboxes, det_labels = self.bbox_head.get_bboxes(
                 rois,
                 cls_score,
@@ -45,15 +38,11 @@ async def async_test_bboxes(self,
                 img_shape,
                 scale_factor,
                 rescale=rescale,
-                cfg=rcnn_test_cfg)
+                cfg=rcnn_test_cfg,
+            )
             return det_bboxes, det_labels
 
-    def simple_test_bboxes(self,
-                           x,
-                           img_metas,
-                           proposals,
-                           rcnn_test_cfg,
-                           rescale=False):
+    def simple_test_bboxes(self, x, img_metas, proposals, rcnn_test_cfg, rescale=False):
         """Test only det bboxes without augmentation.
 
         Args:
@@ -78,21 +67,20 @@ def simple_test_bboxes(self,
         if rois.shape[0] == 0:
             batch_size = len(proposals)
             det_bbox = rois.new_zeros(0, 5)
-            det_label = rois.new_zeros((0, ), dtype=torch.long)
+            det_label = rois.new_zeros((0,), dtype=torch.long)
             if rcnn_test_cfg is None:
                 det_bbox = det_bbox[:, :4]
-                det_label = rois.new_zeros(
-                    (0, self.bbox_head.fc_cls.out_features))
+                det_label = rois.new_zeros((0, self.bbox_head.fc_cls.out_features))
             # There is no proposal in the whole batch
             return [det_bbox] * batch_size, [det_label] * batch_size
 
         bbox_results = self._bbox_forward(x, rois)
-        img_shapes = tuple(meta['img_shape'] for meta in img_metas)
-        scale_factors = tuple(meta['scale_factor'] for meta in img_metas)
+        img_shapes = tuple(meta["img_shape"] for meta in img_metas)
+        scale_factors = tuple(meta["scale_factor"] for meta in img_metas)
 
         # split batch bbox prediction back to each image
-        cls_score = bbox_results['cls_score']
-        bbox_pred = bbox_results['bbox_pred']
+        cls_score = bbox_results["cls_score"]
+        bbox_pred = bbox_results["bbox_pred"]
         num_proposals_per_img = tuple(len(p) for p in proposals)
         rois = rois.split(num_proposals_per_img, 0)
         cls_score = cls_score.split(num_proposals_per_img, 0)
@@ -104,10 +92,9 @@ def simple_test_bboxes(self,
             if isinstance(bbox_pred, torch.Tensor):
                 bbox_pred = bbox_pred.split(num_proposals_per_img, 0)
             else:
-                bbox_pred = self.bbox_head.bbox_pred_split(
-                    bbox_pred, num_proposals_per_img)
+                bbox_pred = self.bbox_head.bbox_pred_split(bbox_pred, num_proposals_per_img)
         else:
-            bbox_pred = (None, ) * len(proposals)
+            bbox_pred = (None,) * len(proposals)
 
         # apply bbox post-processing to each image individually
         det_bboxes = []
@@ -116,11 +103,10 @@ def simple_test_bboxes(self,
             if rois[i].shape[0] == 0:
                 # There is no proposal in the single image
                 det_bbox = rois[i].new_zeros(0, 5)
-                det_label = rois[i].new_zeros((0, ), dtype=torch.long)
+                det_label = rois[i].new_zeros((0,), dtype=torch.long)
                 if rcnn_test_cfg is None:
                     det_bbox = det_bbox[:, :4]
-                    det_label = rois[i].new_zeros(
-                        (0, self.bbox_head.fc_cls.out_features))
+                    det_label = rois[i].new_zeros((0, self.bbox_head.fc_cls.out_features))
 
             else:
                 det_bbox, det_label = self.bbox_head.get_bboxes(
@@ -130,7 +116,8 @@ def simple_test_bboxes(self,
                     img_shapes[i],
                     scale_factors[i],
                     rescale=rescale,
-                    cfg=rcnn_test_cfg)
+                    cfg=rcnn_test_cfg,
+                )
             det_bboxes.append(det_bbox)
             det_labels.append(det_label)
         return det_bboxes, det_labels
@@ -141,109 +128,100 @@ def aug_test_bboxes(self, feats, img_metas, proposal_list, rcnn_test_cfg):
         aug_scores = []
         for x, img_meta in zip(feats, img_metas):
             # only one image in the batch
-            img_shape = img_meta[0]['img_shape']
-            scale_factor = img_meta[0]['scale_factor']
-            flip = img_meta[0]['flip']
-            flip_direction = img_meta[0]['flip_direction']
+            img_shape = img_meta[0]["img_shape"]
+            scale_factor = img_meta[0]["scale_factor"]
+            flip = img_meta[0]["flip"]
+            flip_direction = img_meta[0]["flip_direction"]
             # TODO more flexible
-            proposals = bbox_mapping(proposal_list[0][:, :4], img_shape,
-                                     scale_factor, flip, flip_direction)
+            proposals = bbox_mapping(
+                proposal_list[0][:, :4], img_shape, scale_factor, flip, flip_direction
+            )
             rois = bbox2roi([proposals])
             bbox_results = self._bbox_forward(x, rois)
             bboxes, scores = self.bbox_head.get_bboxes(
                 rois,
-                bbox_results['cls_score'],
-                bbox_results['bbox_pred'],
+                bbox_results["cls_score"],
+                bbox_results["bbox_pred"],
                 img_shape,
                 scale_factor,
                 rescale=False,
-                cfg=None)
+                cfg=None,
+            )
             aug_bboxes.append(bboxes)
             aug_scores.append(scores)
         # after merging, bboxes will be rescaled to the original image size
         merged_bboxes, merged_scores = merge_aug_bboxes(
-            aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
+            aug_bboxes, aug_scores, img_metas, rcnn_test_cfg
+        )
         if merged_bboxes.shape[0] == 0:
             # There is no proposal in the single image
             det_bboxes = merged_bboxes.new_zeros(0, 5)
-            det_labels = merged_bboxes.new_zeros((0, ), dtype=torch.long)
+            det_labels = merged_bboxes.new_zeros((0,), dtype=torch.long)
         else:
-            det_bboxes, det_labels = multiclass_nms(merged_bboxes,
-                                                    merged_scores,
-                                                    rcnn_test_cfg.score_thr,
-                                                    rcnn_test_cfg.nms,
-                                                    rcnn_test_cfg.max_per_img)
+            det_bboxes, det_labels = multiclass_nms(
+                merged_bboxes,
+                merged_scores,
+                rcnn_test_cfg.score_thr,
+                rcnn_test_cfg.nms,
+                rcnn_test_cfg.max_per_img,
+            )
         return det_bboxes, det_labels
 
 
 class MaskTestMixin:
-
     if sys.version_info >= (3, 7):
 
-        async def async_test_mask(self,
-                                  x,
-                                  img_metas,
-                                  det_bboxes,
-                                  det_labels,
-                                  rescale=False,
-                                  mask_test_cfg=None):
+        async def async_test_mask(
+            self, x, img_metas, det_bboxes, det_labels, rescale=False, mask_test_cfg=None
+        ):
             """Asynchronized test for mask head without augmentation."""
             # image shape of the first image in the batch (only one)
-            ori_shape = img_metas[0]['ori_shape']
-            scale_factor = img_metas[0]['scale_factor']
+            ori_shape = img_metas[0]["ori_shape"]
+            scale_factor = img_metas[0]["scale_factor"]
             if det_bboxes.shape[0] == 0:
                 segm_result = [[] for _ in range(self.mask_head.num_classes)]
             else:
-                if rescale and not isinstance(scale_factor,
-                                              (float, torch.Tensor)):
+                if rescale and not isinstance(scale_factor, (float, torch.Tensor)):
                     scale_factor = det_bboxes.new_tensor(scale_factor)
-                _bboxes = (
-                    det_bboxes[:, :4] *
-                    scale_factor if rescale else det_bboxes)
+                _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
                 mask_rois = bbox2roi([_bboxes])
                 mask_feats = self.mask_roi_extractor(
-                    x[:len(self.mask_roi_extractor.featmap_strides)],
-                    mask_rois)
+                    x[: len(self.mask_roi_extractor.featmap_strides)], mask_rois
+                )
 
                 if self.with_shared_head:
                     mask_feats = self.shared_head(mask_feats)
-                if mask_test_cfg and mask_test_cfg.get('async_sleep_interval'):
-                    sleep_interval = mask_test_cfg['async_sleep_interval']
+                if mask_test_cfg and mask_test_cfg.get("async_sleep_interval"):
+                    sleep_interval = mask_test_cfg["async_sleep_interval"]
                 else:
                     sleep_interval = 0.035
-                async with completed(
-                        __name__,
-                        'mask_head_forward',
-                        sleep_interval=sleep_interval):
+                async with completed(__name__, "mask_head_forward", sleep_interval=sleep_interval):
                     mask_pred = self.mask_head(mask_feats)
                 segm_result = self.mask_head.get_seg_masks(
-                    mask_pred, _bboxes, det_labels, self.test_cfg, ori_shape,
-                    scale_factor, rescale)
+                    mask_pred, _bboxes, det_labels, self.test_cfg, ori_shape, scale_factor, rescale
+                )
             return segm_result
 
-    def simple_test_mask(self,
-                         x,
-                         img_metas,
-                         det_bboxes,
-                         det_labels,
-                         rescale=False):
+    def simple_test_mask(self, x, img_metas, det_bboxes, det_labels, rescale=False):
         """Simple test for mask head without augmentation."""
         # image shapes of images in the batch
-        ori_shapes = tuple(meta['ori_shape'] for meta in img_metas)
-        scale_factors = tuple(meta['scale_factor'] for meta in img_metas)
+        ori_shapes = tuple(meta["ori_shape"] for meta in img_metas)
+        scale_factors = tuple(meta["scale_factor"] for meta in img_metas)
 
         if isinstance(scale_factors[0], float):
             warnings.warn(
-                'Scale factor in img_metas should be a '
-                'ndarray with shape (4,) '
-                'arrange as (factor_w, factor_h, factor_w, factor_h), '
-                'The scale_factor with float type has been deprecated. ')
+                "Scale factor in img_metas should be a "
+                "ndarray with shape (4,) "
+                "arrange as (factor_w, factor_h, factor_w, factor_h), "
+                "The scale_factor with float type has been deprecated. "
+            )
             scale_factors = np.array([scale_factors] * 4, dtype=np.float32)
 
         num_imgs = len(det_bboxes)
         if all(det_bbox.shape[0] == 0 for det_bbox in det_bboxes):
-            segm_results = [[[] for _ in range(self.mask_head.num_classes)]
-                            for _ in range(num_imgs)]
+            segm_results = [
+                [[] for _ in range(self.mask_head.num_classes)] for _ in range(num_imgs)
+            ]
         else:
             # if det_bboxes is rescaled to the original image size, we need to
             # rescale it back to the testing scale to obtain RoIs.
@@ -253,13 +231,13 @@ def simple_test_mask(self,
                     for scale_factor in scale_factors
                 ]
             _bboxes = [
-                det_bboxes[i][:, :4] *
-                scale_factors[i] if rescale else det_bboxes[i][:, :4]
+                det_bboxes[i][:, :4] * scale_factors[i] if rescale else det_bboxes[i][:, :4]
                 for i in range(len(det_bboxes))
             ]
+            ori_shapes = tuple(meta["ori_shape"] for meta in img_metas)
             mask_rois = bbox2roi(_bboxes)
             mask_results = self._mask_forward(x, mask_rois)
-            mask_pred = mask_results['mask_pred']
+            mask_pred = mask_results["mask_pred"]
             # split batch mask prediction back to each image
             num_mask_roi_per_img = [len(det_bbox) for det_bbox in det_bboxes]
             mask_preds = mask_pred.split(num_mask_roi_per_img, 0)
@@ -268,15 +246,19 @@ def simple_test_mask(self,
             segm_results = []
             for i in range(num_imgs):
                 if det_bboxes[i].shape[0] == 0:
-                    segm_results.append(
-                        [[] for _ in range(self.mask_head.num_classes)])
+                    segm_results.append([[] for _ in range(self.mask_head.num_classes)])
                 else:
                     segm_result = self.mask_head.get_seg_masks(
-                        mask_preds[i], _bboxes[i], det_labels[i],
-                        self.test_cfg, ori_shapes[i], scale_factors[i],
-                        rescale)
+                        mask_preds[i],
+                        _bboxes[i],
+                        det_labels[i],
+                        self.test_cfg,
+                        ori_shapes[i],
+                        scale_factors[i],
+                        rescale,
+                    )
                     segm_results.append(segm_result)
-        return segm_results
+        return segm_results, mask_pred
 
     def aug_test_mask(self, feats, img_metas, det_bboxes, det_labels):
         """Test for mask head with test time augmentation."""
@@ -285,20 +267,20 @@ def aug_test_mask(self, feats, img_metas, det_bboxes, det_labels):
         else:
             aug_masks = []
             for x, img_meta in zip(feats, img_metas):
-                img_shape = img_meta[0]['img_shape']
-                scale_factor = img_meta[0]['scale_factor']
-                flip = img_meta[0]['flip']
-                flip_direction = img_meta[0]['flip_direction']
-                _bboxes = bbox_mapping(det_bboxes[:, :4], img_shape,
-                                       scale_factor, flip, flip_direction)
+                img_shape = img_meta[0]["img_shape"]
+                scale_factor = img_meta[0]["scale_factor"]
+                flip = img_meta[0]["flip"]
+                flip_direction = img_meta[0]["flip_direction"]
+                _bboxes = bbox_mapping(
+                    det_bboxes[:, :4], img_shape, scale_factor, flip, flip_direction
+                )
                 mask_rois = bbox2roi([_bboxes])
                 mask_results = self._mask_forward(x, mask_rois)
                 # convert to numpy array to save memory
-                aug_masks.append(
-                    mask_results['mask_pred'].sigmoid().cpu().numpy())
+                aug_masks.append(mask_results["mask_pred"].sigmoid().cpu().numpy())
             merged_masks = merge_aug_masks(aug_masks, img_metas, self.test_cfg)
 
-            ori_shape = img_metas[0][0]['ori_shape']
+            ori_shape = img_metas[0][0]["ori_shape"]
             scale_factor = det_bboxes.new_ones(4)
             segm_result = self.mask_head.get_seg_masks(
                 merged_masks,
@@ -307,5 +289,370 @@ def aug_test_mask(self, feats, img_metas, det_bboxes, det_labels):
                 self.test_cfg,
                 ori_shape,
                 scale_factor=scale_factor,
-                rescale=False)
+                rescale=False,
+            )
         return segm_result
+
+
+class FootprintMaskFromRoofOffsetTestMixin:
+    def simple_test_footprint_mask_fro(
+        self, img_metas, offsets, roof_masks, det_bboxes, det_labels, rescale=False
+    ):
+        footprint_mask_fro_results = self._footprint_mask_from_roof_offset_forward(
+            offsets, roof_masks
+        )
+        footprint_mask_fro_pred = footprint_mask_fro_results["footprint_mask_from_roof_offset_pred"]
+        # split batch mask prediction back to each image
+        num_mask_roi_per_img = [len(det_bbox) for det_bbox in det_bboxes]
+        footprint_mask_fro_preds = footprint_mask_fro_pred.split(num_mask_roi_per_img, 0)
+
+        # apply mask post-processing to each image individually
+        segm_results = []
+        _bboxes = [det_bboxes[i][:, :4] for i in range(len(det_bboxes))]
+        ori_shapes = tuple(meta["ori_shape"] for meta in img_metas)
+        scale_factors = tuple(meta["scale_factor"] for meta in img_metas)
+        for i in range(len(det_bboxes)):
+            if det_bboxes[i].shape[0] == 0:
+                segm_results.append([[] for _ in range(self.mask_head.num_classes)])
+            else:
+                segm_result = self.footprint_mask_from_roof_offset_head.get_seg_masks(
+                    footprint_mask_fro_preds[i],
+                    _bboxes[i],
+                    det_labels[i],
+                    self.test_cfg,
+                    ori_shapes[i],
+                    scale_factors[i],
+                    rescale,
+                )
+                segm_results.append(segm_result)
+        return segm_results
+
+
+class FootprintMaskTestMixin:
+    if sys.version_info >= (3, 7):
+
+        async def async_test_footprint_mask(
+            self, x, img_metas, det_bboxes, det_labels, rescale=False, mask_test_cfg=None
+        ):
+            """Asynchronized test for footprint mask head without augmentation."""
+            # image shape of the first image in the batch (only one)
+            ori_shape = img_metas[0]["ori_shape"]
+            scale_factor = img_metas[0]["scale_factor"]
+            if det_bboxes.shape[0] == 0:
+                footprint_segm_result = [[] for _ in range(self.footprint_mask_head.num_classes)]
+            else:
+                if rescale and not isinstance(scale_factor, (float, torch.Tensor)):
+                    scale_factor = det_bboxes.new_tensor(scale_factor)
+                _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+                footprint_mask_rois = bbox2roi([_bboxes])
+                footprint_mask_feats = self.mask_roi_extractor(
+                    x[: len(self.footprint_mask_roi_extractor.featmap_strides)], footprint_mask_rois
+                )
+
+                if self.with_shared_head:
+                    footprint_mask_feats = self.shared_head(footprint_mask_feats)
+                if mask_test_cfg and mask_test_cfg.get("async_sleep_interval"):
+                    sleep_interval = mask_test_cfg["async_sleep_interval"]
+                else:
+                    sleep_interval = 0.035
+                async with completed(
+                    __name__, "footprint_mask_head_forward", sleep_interval=sleep_interval
+                ):
+                    footprint_mask_pred = self.footprint_mask_head(footprint_mask_feats)
+                footprint_segm_result = self.footprint_mask_head.get_seg_masks(
+                    footprint_mask_pred,
+                    _bboxes,
+                    det_labels,
+                    self.test_cfg,
+                    ori_shape,
+                    scale_factor,
+                    rescale,
+                )
+            return footprint_segm_result
+
+    def simple_test_footprint_mask(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        """Simple test for footprint mask head without augmentation."""
+        # image shapes of images in the batch
+        ori_shapes = tuple(meta["ori_shape"] for meta in img_metas)
+        scale_factors = tuple(meta["scale_factor"] for meta in img_metas)
+
+        if isinstance(scale_factors[0], float):
+            warnings.warn(
+                "Scale factor in img_metas should be a "
+                "ndarray with shape (4,) "
+                "arrange as (factor_w, factor_h, factor_w, factor_h), "
+                "The scale_factor with float type has been deprecated. "
+            )
+            scale_factors = np.array([scale_factors] * 4, dtype=np.float32)
+
+        num_imgs = len(det_bboxes)
+        if all(det_bbox.shape[0] == 0 for det_bbox in det_bboxes):
+            footprint_segm_results = [
+                [[] for _ in range(self.footprint_mask_head.num_classes)] for _ in range(num_imgs)
+            ]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale:
+                scale_factors = [
+                    torch.from_numpy(scale_factor).to(det_bboxes[0].device)
+                    for scale_factor in scale_factors
+                ]
+            _bboxes = [
+                det_bboxes[i][:, :4] * scale_factors[i] if rescale else det_bboxes[i][:, :4]
+                for i in range(len(det_bboxes))
+            ]
+            footprint_mask_rois = bbox2roi(_bboxes)
+            footprint_mask_results = self._footprint_mask_forward(x, footprint_mask_rois)
+            footprint_mask_pred = footprint_mask_results["footprint_mask_pred"]
+            # split batch mask prediction back to each image
+            num_mask_roi_per_img = [len(det_bbox) for det_bbox in det_bboxes]
+            footprint_mask_preds = footprint_mask_pred.split(num_mask_roi_per_img, 0)
+
+            # apply mask post-processing to each image individually
+            footprint_segm_results = []
+            for i in range(num_imgs):
+                if det_bboxes[i].shape[0] == 0:
+                    footprint_segm_results.append(
+                        [[] for _ in range(self.footprint_mask_head.num_classes)]
+                    )
+                else:
+                    footprint_segm_result = self.footprint_mask_head.get_seg_masks(
+                        footprint_mask_preds[i],
+                        _bboxes[i],
+                        det_labels[i],
+                        self.test_cfg,
+                        ori_shapes[i],
+                        scale_factors[i],
+                        rescale,
+                    )
+                    footprint_segm_results.append(footprint_segm_result)
+        return footprint_segm_results
+
+    def aug_test_footprint_mask(self, feats, img_metas, det_bboxes, det_labels):
+        """Test for footprint mask head with test time augmentation."""
+        if det_bboxes.shape[0] == 0:
+            footprint_segm_result = [[] for _ in range(self.footprint_mask_head.num_classes)]
+        else:
+            aug_footprint_masks = []
+            for x, img_meta in zip(feats, img_metas):
+                img_shape = img_meta[0]["img_shape"]
+                scale_factor = img_meta[0]["scale_factor"]
+                flip = img_meta[0]["flip"]
+                flip_direction = img_meta[0]["flip_direction"]
+                _bboxes = bbox_mapping(
+                    det_bboxes[:, :4], img_shape, scale_factor, flip, flip_direction
+                )
+                footprint_mask_rois = bbox2roi([_bboxes])
+                footprint_mask_results = self._footprint_mask_forward(x, footprint_mask_rois)
+                # convert to numpy array to save memory
+                aug_footprint_masks.append(
+                    footprint_mask_results["mask_pred"].sigmoid().cpu().numpy()
+                )
+            merged_footprint_masks = merge_aug_masks(aug_footprint_masks, img_metas, self.test_cfg)
+
+            ori_shape = img_metas[0][0]["ori_shape"]
+            scale_factor = det_bboxes.new_ones(4)
+            footprint_segm_result = self.footprint_mask_head.get_seg_masks(
+                merged_footprint_masks,
+                det_bboxes,
+                det_labels,
+                self.test_cfg,
+                ori_shape,
+                scale_factor=scale_factor,
+                rescale=False,
+            )
+        return footprint_segm_result
+
+
+class OffsetTestMixin(object):
+    def simple_test_offset(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            offset_result = [[] for _ in range(2)]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            offset_rois = bbox2roi([_bboxes])
+            offset_feats = self.offset_roi_extractor(
+                x[: len(self.offset_roi_extractor.featmap_strides)], offset_rois
+            )
+
+            offset_pred = self.offset_head(offset_feats)
+            offset_result = self.offset_head.get_offsets(
+                offset_pred, _bboxes, scale_factor, rescale
+            )
+        return offset_result, offset_pred
+
+    def simple_test_offset_rotate_feature(
+        self, x, img_metas, det_bboxes, det_labels, rescale=False, with_rotate=False, rotate_angle=0
+    ):
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            offset_result = [[] for _ in range(2)]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            offset_rois = bbox2roi([_bboxes])
+            offset_feats = self.offset_roi_extractor(
+                x[: len(self.offset_roi_extractor.featmap_strides)], offset_rois
+            )
+
+            # self._show_offset_feat(offset_rois, offset_feats)
+            if with_rotate:
+                theta = torch.empty((offset_feats.size()[0], 2, 3), device=offset_feats.device)
+
+                angle = rotate_angle * np.pi / 180.0
+
+                theta[:, 0, 0] = torch.tensor(np.cos(angle), device=offset_feats.device)
+                theta[:, 0, 1] = torch.tensor(np.sin(-angle), device=offset_feats.device)
+                theta[:, 1, 0] = torch.tensor(np.sin(angle), device=offset_feats.device)
+                theta[:, 1, 1] = torch.tensor(np.cos(angle), device=offset_feats.device)
+
+                grid = F.affine_grid(theta, offset_feats.size())
+                offset_feats = F.grid_sample(offset_feats, grid)
+
+            offset_pred = self.offset_head(offset_feats)
+
+            offset_result = self.offset_head.get_offsets(
+                offset_pred, _bboxes, scale_factor, rescale
+            )
+
+            self.vis_featuremap = self.offset_head.vis_featuremap
+
+        return offset_result
+
+
+class AngleTestMixin(object):
+    def simple_test_angle(self, x):
+        angle_pred = self.angle_head(x)
+
+        return angle_pred
+
+
+class OffsetHeightTestMixin(object):
+    def simple_test_offset_height(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            offset_result = [[] for _ in range(2)]
+            height_result = [[] for _ in range(1)]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            offset_height_rois = bbox2roi([_bboxes])
+            offset_height_feats = self.offset_height_roi_extractor(
+                x[: len(self.offset_height_roi_extractor.featmap_strides)], offset_height_rois
+            )
+
+            offset_pred, height_pred = self.offset_height_head(offset_height_feats)
+            offset_result = self.offset_height_head.get_offsets(
+                offset_pred, _bboxes, scale_factor, rescale
+            )
+            height_result = self.offset_height_head.get_heights(
+                height_pred, _bboxes, scale_factor, rescale
+            )
+        return offset_result, height_result
+
+
+class HeightTestMixin(object):
+    def simple_test_height(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            height_result = [[] for _ in range(1)]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            height_rois = bbox2roi([_bboxes])
+            height_result = self._height_forward(x, height_rois)
+
+            height_result = self.height_head.get_heights(
+                height_result["height_pred"], _bboxes, scale_factor, rescale
+            )
+
+        return height_result
+
+
+class OffsetFieldTestMixin(object):
+    def simple_test_offset_field(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        """Simple test for mask head without augmentation."""
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            offset_result = np.zeros((0, 2))
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            offset_field_rois = bbox2roi([_bboxes])
+            offset_field_results = self._offset_field_forward(x, offset_field_rois)
+            mask_results = self._mask_forward(x, offset_field_rois)
+
+            offset_result = self.offset_field_head.get_offset(
+                mask_results["mask_pred"],
+                offset_field_results["offset_field_pred"],
+                _bboxes,
+                det_labels,
+                self.test_cfg,
+                ori_shape,
+                scale_factor,
+                rescale,
+            )
+
+        return offset_result
+
+
+class OffsetReweightTestMixin(object):
+    def simple_test_offset_reweight(self, x, img_metas, det_bboxes, det_labels, rescale=False):
+        # image shape of the first image in the batch (only one)
+        ori_shape = img_metas[0]["ori_shape"]
+        scale_factor = img_metas[0]["scale_factor"]
+        if det_bboxes.shape[0] == 0:
+            offset_result = [[] for _ in range(2)]
+        else:
+            # if det_bboxes is rescaled to the original image size, we need to
+            # rescale it back to the testing scale to obtain RoIs.
+            if rescale and not isinstance(scale_factor, float):
+                scale_factor = torch.from_numpy(scale_factor).to(det_bboxes.device)
+            _bboxes = det_bboxes[:, :4] * scale_factor if rescale else det_bboxes
+            offset_rois = bbox2roi([_bboxes])
+
+            mask_results = self._mask_forward(x, offset_rois)
+            side_face_results = self._side_face_forward(x, offset_rois)
+
+            feature_weight = side_face_results["side_face_pred"] + mask_results["mask_pred"]
+            feature_weight = (torch.sigmoid(F.interpolate(feature_weight, size=[7, 7])) + 1) / 2.0
+
+            offset_feats = self.offset_roi_extractor(
+                x[: len(self.offset_roi_extractor.featmap_strides)], offset_rois
+            )
+
+            offset_feats = offset_feats * feature_weight
+
+            offset_pred = self.offset_head(offset_feats)
+            offset_result = self.offset_head.get_offsets(
+                offset_pred, _bboxes, scale_factor, rescale
+            )
+        return offset_result
diff --git a/mmdet/models/utils/__init__.py b/mmdet/models/utils/__init__.py
index e74ba89e..b350b5cc 100644
--- a/mmdet/models/utils/__init__.py
+++ b/mmdet/models/utils/__init__.py
@@ -6,29 +6,56 @@
 from .csp_layer import CSPLayer
 from .gaussian_target import gaussian_radius, gen_gaussian_target
 from .inverted_residual import InvertedResidual
+from .loft_utils import offset_roof_to_footprint
 from .make_divisible import make_divisible
 from .misc import interpolate_as, sigmoid_geometric_mean
 from .normed_predictor import NormedConv2d, NormedLinear
 from .panoptic_gt_processing import preprocess_panoptic_gt
-from .point_sample import (get_uncertain_point_coords_with_randomness,
-                           get_uncertainty)
-from .positional_encoding import (LearnedPositionalEncoding,
-                                  SinePositionalEncoding)
+from .point_sample import get_uncertain_point_coords_with_randomness, get_uncertainty
+from .positional_encoding import LearnedPositionalEncoding, SinePositionalEncoding
 from .res_layer import ResLayer, SimplifiedBasicBlock
 from .se_layer import DyReLU, SELayer
-from .transformer import (DetrTransformerDecoder, DetrTransformerDecoderLayer,
-                          DynamicConv, PatchEmbed, Transformer, nchw_to_nlc,
-                          nlc_to_nchw)
+from .transformer import (
+    DetrTransformerDecoder,
+    DetrTransformerDecoderLayer,
+    DynamicConv,
+    PatchEmbed,
+    Transformer,
+    nchw_to_nlc,
+    nlc_to_nchw,
+)
 
 __all__ = [
-    'ResLayer', 'gaussian_radius', 'gen_gaussian_target',
-    'DetrTransformerDecoderLayer', 'DetrTransformerDecoder', 'Transformer',
-    'build_transformer', 'build_linear_layer', 'SinePositionalEncoding',
-    'LearnedPositionalEncoding', 'DynamicConv', 'SimplifiedBasicBlock',
-    'NormedLinear', 'NormedConv2d', 'make_divisible', 'InvertedResidual',
-    'SELayer', 'interpolate_as', 'ConvUpsample', 'CSPLayer',
-    'adaptive_avg_pool2d', 'AdaptiveAvgPool2d', 'PatchEmbed', 'nchw_to_nlc',
-    'nlc_to_nchw', 'pvt_convert', 'sigmoid_geometric_mean',
-    'preprocess_panoptic_gt', 'DyReLU',
-    'get_uncertain_point_coords_with_randomness', 'get_uncertainty'
+    "ResLayer",
+    "gaussian_radius",
+    "gen_gaussian_target",
+    "DetrTransformerDecoderLayer",
+    "DetrTransformerDecoder",
+    "Transformer",
+    "build_transformer",
+    "build_linear_layer",
+    "SinePositionalEncoding",
+    "LearnedPositionalEncoding",
+    "DynamicConv",
+    "SimplifiedBasicBlock",
+    "NormedLinear",
+    "NormedConv2d",
+    "make_divisible",
+    "InvertedResidual",
+    "SELayer",
+    "interpolate_as",
+    "ConvUpsample",
+    "CSPLayer",
+    "adaptive_avg_pool2d",
+    "AdaptiveAvgPool2d",
+    "PatchEmbed",
+    "nchw_to_nlc",
+    "nlc_to_nchw",
+    "pvt_convert",
+    "sigmoid_geometric_mean",
+    "preprocess_panoptic_gt",
+    "DyReLU",
+    "get_uncertain_point_coords_with_randomness",
+    "get_uncertainty",
+    "offset_roof_to_footprint",
 ]
diff --git a/mmdet/models/utils/loft_utils.py b/mmdet/models/utils/loft_utils.py
new file mode 100644
index 00000000..2fe4bf7a
--- /dev/null
+++ b/mmdet/models/utils/loft_utils.py
@@ -0,0 +1,48 @@
+import copy
+import math
+
+import numpy as np
+
+
+def offset_roof_to_footprint(offsets, roof_masks, reverse=False):
+    """Used for offsetting the roof mask to footprint mask."""
+    direction = -1 if reverse else 1
+    h, w = roof_masks[0][0][0].shape
+    footprint_masks = []
+    for offsets_per_img, roof_masks_per_img in zip(offsets, roof_masks):
+        footprint_masks_per_img = []
+        for offset, roof_mask in zip(offsets_per_img, roof_masks_per_img[0]):
+            border = max(math.ceil(abs(offset[0])), math.ceil(abs(offset[1])))
+            canvas = np.full((h + 2 * border, w + 2 * border), False)
+            canvas[border : h + border, border : w + border] = roof_mask
+            canvas = np.roll(
+                canvas, shift=(direction * int(offset[1]), direction * int(offset[0])), axis=(0, 1)
+            )
+            footprint_mask = canvas[border : h + border, border : w + border]
+            footprint_masks_per_img.append(footprint_mask)
+        footprint_masks.append([footprint_masks_per_img])
+
+    return footprint_masks
+
+
+def test_offset_roof_to_footprint(offsets, roof_masks, footprint_masks):
+    import cv2
+
+    save_path = "tmp/test_offset_roof_to_footprint/"
+    canvas_origin = np.zeros((1024, 1024))
+    idx = 0
+    for offset, roof_mask, footprint_mask in zip(
+        offsets[0], footprint_masks[0][0], roof_masks[0][0]
+    ):
+        if np.where(roof_mask is True)[0].shape[0] == 0:
+            continue
+
+        path = save_path + f"({offset[0]}_{offset[1]})_{str(idx).zfill(12)}.jpg"
+        canvas = copy.deepcopy(canvas_origin)
+        canvas[np.where(roof_mask is False)] = 0
+        canvas[np.where(roof_mask is True)] = 255
+        canvas[np.where(footprint_mask is True)] = 128
+        cv2.imwrite(path, canvas)
+        idx += 1
+        # if idx > 4:
+        #     break
diff --git a/mmdet/utils/__init__.py b/mmdet/utils/__init__.py
index b5a2b6b3..90a60026 100644
--- a/mmdet/utils/__init__.py
+++ b/mmdet/utils/__init__.py
@@ -1,6 +1,5 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from .ascend_util import (batch_images_to_levels,
-                          get_max_num_gt_division_factor, masked_fill)
+from .ascend_util import batch_images_to_levels, get_max_num_gt_division_factor, masked_fill
 from .collect_env import collect_env
 from .compat_config import compat_cfg
 from .logger import get_caller_name, get_root_logger, log_img_scale
@@ -13,10 +12,23 @@
 from .util_distribution import build_ddp, build_dp, get_device
 
 __all__ = [
-    'get_root_logger', 'collect_env', 'find_latest_checkpoint',
-    'update_data_root', 'setup_multi_processes', 'get_caller_name',
-    'log_img_scale', 'compat_cfg', 'split_batch', 'build_ddp', 'build_dp',
-    'get_device', 'replace_cfg_vals', 'AvoidOOM', 'AvoidCUDAOOM',
-    'get_max_num_gt_division_factor', 'masked_fill', 'batch_images_to_levels',
-    'rfnext_init_model'
+    "get_root_logger",
+    "collect_env",
+    "find_latest_checkpoint",
+    "update_data_root",
+    "setup_multi_processes",
+    "get_caller_name",
+    "log_img_scale",
+    "compat_cfg",
+    "split_batch",
+    "build_ddp",
+    "build_dp",
+    "get_device",
+    "replace_cfg_vals",
+    "AvoidOOM",
+    "AvoidCUDAOOM",
+    "get_max_num_gt_division_factor",
+    "masked_fill",
+    "batch_images_to_levels",
+    "rfnext_init_model",
 ]
diff --git a/resources/dataset-details.png b/resources/dataset-details.png
new file mode 100644
index 00000000..94262bfc
Binary files /dev/null and b/resources/dataset-details.png differ
diff --git a/resources/samples-jpg.jpg b/resources/samples-jpg.jpg
new file mode 100644
index 00000000..bb4414aa
Binary files /dev/null and b/resources/samples-jpg.jpg differ
diff --git a/resources/samples-png.png b/resources/samples-png.png
new file mode 100644
index 00000000..0a261686
Binary files /dev/null and b/resources/samples-png.png differ
diff --git a/tools/bonai/bonai_evaluation.py b/tools/bonai/bonai_evaluation.py
new file mode 100644
index 00000000..a65fe5ff
--- /dev/null
+++ b/tools/bonai/bonai_evaluation.py
@@ -0,0 +1,1433 @@
+# -*- encoding: utf-8 -*-
+import argparse
+import csv
+import math
+import os
+import warnings
+
+import bstool
+import cv2
+import geopandas
+import mmcv
+import numpy as np
+import pycocotools.mask as maskUtils
+import six
+import tqdm
+from matplotlib import pyplot as plt
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+from shapely import affinity
+from terminaltables import AsciiTable
+
+
+class Evaluation:
+    def __init__(
+        self,
+        model=None,
+        anno_file=None,
+        pkl_file=None,
+        resolution=0.6,
+        gt_roof_csv_file=None,
+        gt_footprint_csv_file=None,
+        roof_csv_file=None,
+        footprint_csv_file=None,
+        footprint_direct_csv_file=None,
+        json_prefix=None,
+        iou_threshold=0.1,
+        score_threshold=0.4,
+        min_area=500,
+        with_offset=False,
+        with_height=False,
+        output_dir=None,
+        out_file_format="png",
+        show=False,
+        replace_pred_roof=False,
+        replace_pred_offset=False,
+        with_only_offset=False,
+        offset_model="footprint2roof",
+        save_merged_csv=True,
+    ):
+        self.anno_file = anno_file
+        self.resolution = resolution
+        self.gt_roof_csv_file = gt_roof_csv_file
+        self.gt_footprint_csv_file = gt_footprint_csv_file
+        self.roof_csv_file = roof_csv_file
+        self.footprint_csv_file = footprint_csv_file
+        self.footprint_direct_csv_file = footprint_direct_csv_file
+        self.pkl_file = pkl_file
+        self.json_prefix = json_prefix
+        self.show = show
+        self.classify_interval = [
+            0,
+            2,
+            4,
+            6,
+            8,
+            10,
+            15,
+            20,
+            25,
+            30,
+            35,
+            40,
+            45,
+            50,
+            55,
+            60,
+            65,
+            70,
+            75,
+            80,
+            85,
+            90,
+            95,
+            100,
+            110,
+            120,
+            130,
+            140,
+            150,
+            160,
+            170,
+            180,
+            190,
+            200,
+            220,
+            240,
+            260,
+            280,
+            300,
+            340,
+            380,
+        ]
+        self.offset_class_num = len(self.classify_interval)
+        self.with_only_offset = with_only_offset
+        self.save_merged_csv = save_merged_csv
+
+        self.out_file_format = out_file_format
+
+        self.output_dir = output_dir
+        if output_dir:
+            mkdir_or_exist(self.output_dir)
+
+        # 1. create the pkl parser which is used for parse the pkl file (detection result)
+        if self.with_only_offset:
+            # BSPklParser_Only_Offset is designed to evaluate the experimental model which only predicts the offsets
+            pkl_parser = bstool.BSPklParser_Only_Offset(
+                anno_file,
+                pkl_file,
+                iou_threshold=iou_threshold,
+                score_threshold=score_threshold,
+                min_area=min_area,
+                with_offset=with_offset,
+                with_height=with_height,
+                gt_roof_csv_file=gt_roof_csv_file,
+                replace_pred_roof=replace_pred_roof,
+                offset_model=offset_model,
+            )
+        else:
+            if with_offset:
+                # important
+                # BSPklParser is the general class for evaluating the LOVE and S2LOVE models
+                pkl_parser = bstool.BSPklParser(
+                    anno_file,
+                    pkl_file,
+                    iou_threshold=iou_threshold,
+                    score_threshold=score_threshold,
+                    min_area=min_area,
+                    with_offset=with_offset,
+                    with_height=with_height,
+                    gt_roof_csv_file=gt_roof_csv_file,
+                    replace_pred_roof=replace_pred_roof,
+                    replace_pred_offset=replace_pred_offset,
+                    offset_model=offset_model,
+                    merge_splitted=save_merged_csv,
+                )
+            else:
+                # BSPklParser_Without_Offset is designed to evaluate the baseline models (Mask R-CNN, etc.)
+                pkl_parser = bstool.BSPklParser_Without_Offset(
+                    anno_file,
+                    pkl_file,
+                    iou_threshold=iou_threshold,
+                    score_threshold=score_threshold,
+                    min_area=min_area,
+                    with_offset=with_offset,
+                    with_height=with_height,
+                    gt_roof_csv_file=gt_roof_csv_file,
+                    replace_pred_roof=replace_pred_roof,
+                    offset_model=offset_model,
+                )
+
+        # 2. merge the detection results, and generate the csv file (convert pkl to csv, the file file format for evaluating the F1 is CSV, pkl format is the pre format)
+        # whether or not merge the results on the sub-images (1024 * 1024) to original image (2048 * 2048)
+        if save_merged_csv:
+            merged_objects = pkl_parser.merged_objects
+            bstool.bs_csv_dump(merged_objects, roof_csv_file, footprint_csv_file)
+            self.dump_result = True
+        else:
+            objects = pkl_parser.objects
+            self.dump_result = bstool.bs_csv_dump(
+                objects, roof_csv_file, footprint_csv_file, footprint_direct_csv_file
+            )
+
+    def _csv2json(self, csv_file, ann_file):
+        """convert csv file to json which will be used to evaluate the results by COCO API
+
+        Args:
+            csv_file (str): csv file
+            ann_file (str): annotation file of COCO format (.json)
+
+        Returns:
+            list: list for saving to json
+        """
+        self.coco = COCO(ann_file)
+        self.cat_ids = self.coco.get_cat_ids()
+        self.img_ids = self.coco.get_img_ids()
+
+        csv_parser = bstool.CSVParse(csv_file)
+
+        bbox_json_results = []
+        segm_json_results = []
+        for idx in tqdm.tqdm(range(len(self.img_ids))):
+            img_id = self.img_ids[idx]
+            info = self.coco.load_imgs([img_id])[0]
+            image_name = bstool.get_basename(info["file_name"])
+
+            objects = csv_parser(image_name)
+
+            masks = [obj["mask"] for obj in objects]
+            bboxes = [bstool.mask2bbox(mask) for mask in masks]
+
+            for bbox, mask in zip(bboxes, masks):
+                data = dict()
+                data["image_id"] = img_id
+                data["bbox"] = bstool.xyxy2xywh(bbox)
+                data["score"] = 1.0
+                data["category_id"] = self.category_id
+
+                rles = maskUtils.frPyObjects([mask], self.image_size[0], self.image_size[1])
+                rle = maskUtils.merge(rles)
+                if isinstance(rle["counts"], bytes):
+                    rle["counts"] = rle["counts"].decode()
+                data["segmentation"] = rle
+
+                bbox_json_results.append(data)
+                segm_json_results.append(data)
+
+        return bbox_json_results, segm_json_results
+
+    def _coco_eval(
+        self,
+        metric=["bbox", "segm"],
+        classwise=False,
+        proposal_nums=(100, 300, 1000),
+        iou_thrs=np.arange(0.5, 0.96, 0.05),
+    ):
+        """Please reference to original code in mmdet"""
+        metrics = metric if isinstance(metric, list) else [metric]
+        allowed_metrics = ["bbox", "segm"]
+        for metric in metrics:
+            if metric not in allowed_metrics:
+                raise KeyError(f"metric {metric} is not supported")
+
+        result_files = self.dump_json_results()
+
+        eval_results = {}
+        cocoGt = self.coco
+        for metric in metrics:
+            msg = f"Evaluating {metric}..."
+            print(msg)
+            if metric not in result_files:
+                raise KeyError(f"{metric} is not in results")
+            try:
+                cocoDt = cocoGt.loadRes(result_files[metric])
+            except IndexError:
+                print("The testing results of the whole dataset is empty.")
+                break
+
+            iou_type = "bbox" if metric == "proposal" else metric
+            cocoEval = COCOeval(cocoGt, cocoDt, iou_type)
+            cocoEval.params.catIds = self.cat_ids
+            cocoEval.params.imgIds = self.img_ids
+
+            cocoEval.evaluate()
+            cocoEval.accumulate()
+            cocoEval.summarize()
+            if classwise:  # Compute per-category AP
+                # Compute per-category AP
+                # from https://github.com/facebookresearch/detectron2/
+                precisions = cocoEval.eval["precision"]
+                # precision: (iou, recall, cls, area range, max dets)
+                assert len(self.cat_ids) == precisions.shape[2]
+
+                results_per_category = []
+                for idx, catId in enumerate(self.cat_ids):
+                    # area range index 0: all area ranges
+                    # max dets index -1: typically 100 per image
+                    nm = self.coco.loadCats(catId)[0]
+                    precision = precisions[:, :, idx, 0, -1]
+                    precision = precision[precision > -1]
+                    if precision.size:
+                        ap = np.mean(precision)
+                    else:
+                        ap = float("nan")
+                    results_per_category.append((f'{nm["name"]}', f"{float(ap):0.3f}"))
+
+                num_columns = min(6, len(results_per_category) * 2)
+                results_flatten = list(itertools.chain(*results_per_category))
+                headers = ["category", "AP"] * (num_columns // 2)
+                results_2d = itertools.zip_longest(
+                    *[results_flatten[i::num_columns] for i in range(num_columns)]
+                )
+                table_data = [headers]
+                table_data += [result for result in results_2d]
+                table = AsciiTable(table_data)
+
+            metric_items = ["mAP", "mAP_50", "mAP_75", "mAP_s", "mAP_m", "mAP_l"]
+            for i in range(len(metric_items)):
+                key = f"{metric}_{metric_items[i]}"
+                val = float(f"{cocoEval.stats[i]:.3f}")
+                eval_results[key] = val
+            ap = cocoEval.stats[:6]
+            eval_results[f"{metric}_mAP_copypaste"] = (
+                f"{ap[0]:.3f} {ap[1]:.3f} {ap[2]:.3f} {ap[3]:.3f} " f"{ap[4]:.3f} {ap[5]:.3f}"
+            )
+
+        return eval_results
+
+    def cosine_distance(self, a, b):
+        """calculate the cos distance of two vectors
+
+        Args:
+            a (list): a vector
+            b (list): b vector
+
+        Returns:
+            int: cos distance
+        """
+        a_norm = np.linalg.norm(a, axis=1, keepdims=True)
+        b_norm = np.linalg.norm(b, axis=1, keepdims=True)
+
+        similiarity = (a[:, 0] * b[:, 0] + a[:, 1] * b[:, 1]) / (a_norm * b_norm)
+        dist = 1.0 - similiarity
+        return dist
+
+    def height_calculate(self):
+        objects = self.get_confusion_matrix_indexes_json_gt(mask_type="footprint")
+
+        dataset_gt_heights, dataset_pred_heights = [], []
+        for ori_image_name in self.ori_image_name_list:
+            if ori_image_name not in objects.keys():
+                continue
+
+            dataset_gt_heights += objects[ori_image_name]["gt_heights"]
+            dataset_pred_heights += objects[ori_image_name]["pred_heights"]
+
+        dataset_gt_heights = np.array(dataset_gt_heights)
+        dataset_pred_heights = np.array(dataset_pred_heights)
+
+        rmse = np.sqrt(np.sum((dataset_gt_heights - dataset_pred_heights) ** 2) / len(dataset_gt_heights))
+        mae = np.sum(np.absolute(dataset_gt_heights - dataset_pred_heights)) / len(dataset_gt_heights) 
+
+        return mae, rmse
+
+    def parse_ann_offset(self, gt_data_path):
+        import json
+
+        gt = []
+        json_file = open(gt_data_path, "r")
+        content = json_file.read()
+        json_content = json.loads(content)
+        annotations = json_content["annotations"]
+        images = json_content["images"]
+        images_ids = []
+        images_filenames = []
+        for item in images:
+            images_ids.append(item["id"])
+            images_filenames.append(item["file_name"])
+        id_2_filename = dict(zip(images_ids, images_filenames))
+        for image_id in images_ids:
+            gt_offset_angles = []
+            for ann in annotations:
+                if (
+                    "offset" in ann.keys()
+                    and ann["offset"] != [0, 0]
+                    and ann["image_id"] == image_id
+                ):
+                    offset = ann["offset"]
+                    z = math.sqrt(offset[0] ** 2 + offset[1] ** 2)
+                    gt_offset_angles.append([float(offset[1]) / z, float(offset[0]) / z])
+
+            if len(gt_offset_angles) > 0:
+                gt.append(
+                    dict(
+                        filename=id_2_filename[image_id],
+                        offset_angle=np.array(gt_offset_angles, dtype=np.float32).mean(axis=0),
+                    )
+                )
+            else:
+                gt.append(
+                    dict(
+                        filename=id_2_filename[image_id],
+                        offset_angle=[1.0, 0.0],
+                    )
+                )
+
+        return gt
+
+    def parse_ann_nadir(self, gt_data_path):
+        import json
+
+        gt = []
+        json_file = open(gt_data_path, "r")
+        content = json_file.read()
+        json_content = json.loads(content)
+        annotations = json_content["annotations"]
+        images = json_content["images"]
+        images_ids = []
+        images_filenames = []
+        for item in images:
+            images_ids.append(item["id"])
+            images_filenames.append(item["file_name"])
+        id_2_filename = dict(zip(images_ids, images_filenames))
+        for image_id in images_ids:
+            gt_nadir_angles = []
+            for ann in annotations:
+                if (
+                    "offset" in ann.keys()
+                    and "building_height" in ann.keys()
+                    and ann["image_id"] == image_id
+                ):
+                    offset_x, offset_y = ann["offset"]
+                    norm = offset_x**2 + offset_y**2
+                    height = ann["building_height"]
+                    if height != 0 and norm != 0:
+                        angle = math.sqrt(norm) * self.resolution / float(height)
+                    gt_nadir_angles.append(angle)
+
+            if len(gt_nadir_angles) > 0:
+                gt.append(
+                    dict(
+                        filename=id_2_filename[image_id],
+                        nadir_angle=np.array(gt_nadir_angles, dtype=np.float32).mean(axis=0),
+                    )
+                )
+            else:
+                gt.append(
+                    dict(
+                        filename=id_2_filename[image_id],
+                        nadir_angle=1,
+                    )
+                )
+
+        return gt
+
+    def vector2angle(self, vector):
+        length = np.sqrt(vector[0] ** 2 + vector[1] ** 2)
+        sin = vector[1] / length
+        cos = vector[0] / length
+        angle = math.atan2(sin, cos)  # 返回弧度值
+        angle = math.degrees(angle)
+        # 转换为0-360°
+        if angle < 0:
+            angle += 360
+        return angle
+
+    def offset_angle_evaluate(self):
+        ann = self.anno_file
+        gt = self.parse_ann_offset(ann)
+        pkl = mmcv.load(self.pkl_file)
+        gts = []
+        preds = []
+        for i in range(len(pkl)):
+            vector_pred = pkl[i][6]
+            vector_gt = gt[i]["offset_angle"]
+            angle_pred = self.vector2angle(vector_pred)
+            angle_gt = self.vector2angle(vector_gt)
+            gts.append(angle_gt)
+            preds.append(angle_pred)
+        gts = np.array(gts)
+        preds = np.array(preds)
+        error = np.abs(np.subtract(gts, preds))
+        return np.mean(error)
+
+    def nadir_angle_evaluate(self):
+        ann = self.anno_file
+        gt = self.parse_ann_nadir(ann)
+        pkl = mmcv.load(self.pkl_file)
+        gts = []
+        preds = []
+        for i in range(len(pkl)):
+            angle_pred = pkl[i][7]
+            angle_gt = gt[i]["nadir_angle"]
+            gts.append(angle_gt)
+            preds.append(angle_pred)
+        gts = np.array(gts)
+        preds = np.array(preds)
+        error = np.abs(np.subtract(gts, preds))
+        return np.mean(error)
+
+    def offset_error_vector(self, title="demo", show_polar=False):
+        objects = self.get_confusion_matrix_indexes_json_gt(mask_type="footprint")
+
+        dataset_gt_offsets, dataset_pred_offsets = [], []
+        for ori_image_name in self.ori_image_name_list:
+            if ori_image_name not in objects.keys():
+                continue
+
+            dataset_gt_offsets += objects[ori_image_name]["gt_offsets"]
+            dataset_pred_offsets += objects[ori_image_name]["pred_offsets"]
+
+        dataset_gt_offsets = np.array(dataset_gt_offsets)
+        dataset_pred_offsets = np.array(dataset_pred_offsets)
+
+        error_vectors = dataset_gt_offsets - dataset_pred_offsets
+
+        EPE = np.sqrt(error_vectors[..., 0] ** 2 + error_vectors[..., 1] ** 2)
+        gt_angle = np.arctan2(dataset_gt_offsets[..., 1], dataset_gt_offsets[..., 0])
+        gt_length = np.sqrt(dataset_gt_offsets[..., 1] ** 2 + dataset_gt_offsets[..., 0] ** 2)
+
+        pred_angle = np.arctan2(dataset_pred_offsets[..., 1], dataset_pred_offsets[..., 0])
+        pred_length = np.sqrt(dataset_pred_offsets[..., 1] ** 2 + dataset_pred_offsets[..., 0] ** 2)
+
+        AE = np.abs(gt_angle - pred_angle)
+
+        aEPE = EPE.mean()
+        aAE = AE.mean()
+
+        cos_distance = self.cosine_distance(dataset_gt_offsets, dataset_pred_offsets)
+        average_cos_distance = cos_distance.mean()
+
+        eval_results = {"aEPE": aEPE, "aAE": aAE}
+
+        if self.show:
+            r = gt_length - pred_length
+            angle = np.abs((gt_angle - pred_angle))
+            max_r = np.percentile(r, 95)
+            min_r = np.percentile(r, 0.01)
+
+            fig = plt.figure(figsize=(7, 7))
+            ax = plt.gca(projection="polar")
+            ax.set_thetagrids(np.arange(0.0, 360.0, 15.0))
+            ax.set_thetamin(0.0)
+            ax.set_thetamax(360.0)
+            ax.set_rgrids(np.arange(min_r, max_r, max_r / 10))
+            ax.set_rlabel_position(0.0)
+            ax.set_rlim(0, max_r)
+            plt.setp(ax.get_yticklabels(), fontsize=6)
+            ax.grid(True, linestyle="-", color="k", linewidth=0.5, alpha=0.5)
+            ax.set_axisbelow("True")
+
+            plt.scatter(angle, r, s=2.0)
+            plt.title(title + " offset error distribution", fontsize=10)
+
+            plt.savefig(
+                os.path.join(
+                    self.output_dir,
+                    "{}_offset_error_polar_evaluation.{}".format(title, self.out_file_format),
+                ),
+                bbox_inches="tight",
+                dpi=600,
+                pad_inches=0.1,
+            )
+
+            plt.clf()
+
+            max_r = np.percentile(r, 99.99)
+            min_r = np.percentile(r, 0.01)
+            plt.hist(
+                r,
+                bins=np.arange(min_r, max_r, (int(max_r) - int(min_r)) // 40),
+                histtype="bar",
+                facecolor="dodgerblue",
+                alpha=0.75,
+                rwidth=0.9,
+            )
+            plt.title(title + " Length Error Distribution", fontsize=10)
+            plt.xlim([min_r - 5, max_r + 5])
+            plt.xlabel("Error")
+            plt.ylabel("Num")
+            plt.yscale("log")
+            plt.savefig(
+                os.path.join(
+                    self.output_dir,
+                    "{}_offset_error_length_hist_evaluation.{}".format(title, self.out_file_format),
+                ),
+                bbox_inches="tight",
+                dpi=600,
+                pad_inches=0.1,
+            )
+
+            plt.clf()
+
+            max_angle = angle.max() * 180.0 / np.pi
+            min_angle = angle.min() * 180.0 / np.pi
+            plt.hist(
+                r,
+                bins=np.arange(min_angle, max_angle, (max_angle - min_angle) // 80),
+                histtype="bar",
+                facecolor="dodgerblue",
+                alpha=0.75,
+                rwidth=0.9,
+            )
+            plt.title(title + " Angle Error Distribution", fontsize=10)
+            plt.xlim([min_angle - 20, max_angle])
+            plt.xlabel("Error")
+            plt.ylabel("Num")
+            plt.yscale("log")
+            plt.savefig(
+                os.path.join(
+                    self.output_dir,
+                    "{}_offset_error_angle_hist_evaluation.{}".format(title, self.out_file_format),
+                ),
+                bbox_inches="tight",
+                dpi=600,
+                pad_inches=0.1,
+            )
+
+            plt.clf()
+
+        return eval_results
+
+    def direct_footprint_evaluate(self):
+        objects = self.get_confusion_matrix_indexes_direct_footprint()
+        (
+            dataset_gt_TP_indexes,
+            dataset_pred_TP_indexes,
+            dataset_gt_FN_indexes,
+            dataset_pred_FP_indexes,
+        ) = ([], [], [], [])
+        for ori_image_name in self.ori_image_name_list:
+            if ori_image_name not in objects.keys():
+                continue
+
+            gt_TP_indexes = objects[ori_image_name]["gt_TP_indexes"]
+            pred_TP_indexes = objects[ori_image_name]["pred_TP_indexes"]
+            gt_FN_indexes = objects[ori_image_name]["gt_FN_indexes"]
+            pred_FP_indexes = objects[ori_image_name]["pred_FP_indexes"]
+
+            dataset_gt_TP_indexes += gt_TP_indexes
+            dataset_pred_TP_indexes += pred_TP_indexes
+            dataset_gt_FN_indexes += gt_FN_indexes
+            dataset_pred_FP_indexes += pred_FP_indexes
+
+        TP = len(dataset_gt_TP_indexes)
+        FN = len(dataset_gt_FN_indexes)
+        FP = len(dataset_pred_FP_indexes)
+        Precision = float(TP) / (float(TP) + float(FP))
+        Recall = float(TP) / (float(TP) + float(FN))
+
+        F1_score = (2 * Precision * Recall) / (Precision + Recall)
+        eval_results = {
+            "F1_score": F1_score,
+            "Precision": Precision,
+            "Recall": Recall,
+            "TP": TP,
+            "FN": FN,
+            "FP": FP,
+        }
+        return eval_results
+
+    def segmentation_evaluate(self, mask_types=["roof", "footprint"]):
+        """evaluation for segmentation (F1 Score, Precision, Recall)
+
+        Args:
+            mask_types (list, optional): evaluate which object (roof or footprint). Defaults to ['roof', 'footprint'].
+
+        Returns:
+            dict: evaluation results
+        """
+        eval_results = dict()
+        for mask_type in mask_types:
+            print(f"========== Processing {mask_type} segmentation ==========")
+            objects = self.get_confusion_matrix_indexes(mask_type=mask_type)
+
+            (
+                dataset_gt_TP_indexes,
+                dataset_pred_TP_indexes,
+                dataset_gt_FN_indexes,
+                dataset_pred_FP_indexes,
+            ) = ([], [], [], [])
+            for ori_image_name in self.ori_image_name_list:
+                if ori_image_name not in objects.keys():
+                    continue
+
+                gt_TP_indexes = objects[ori_image_name]["gt_TP_indexes"]
+                pred_TP_indexes = objects[ori_image_name]["pred_TP_indexes"]
+                gt_FN_indexes = objects[ori_image_name]["gt_FN_indexes"]
+                pred_FP_indexes = objects[ori_image_name]["pred_FP_indexes"]
+
+                dataset_gt_TP_indexes += gt_TP_indexes
+                dataset_pred_TP_indexes += pred_TP_indexes
+                dataset_gt_FN_indexes += gt_FN_indexes
+                dataset_pred_FP_indexes += pred_FP_indexes
+
+            TP = len(dataset_gt_TP_indexes)
+            FN = len(dataset_gt_FN_indexes)
+            FP = len(dataset_pred_FP_indexes)
+
+            Precision = float(TP) / (float(TP) + float(FP)) if 0 != (float(TP) + float(FP)) else 0.0
+            Recall = float(TP) / (float(TP) + float(FN)) if 0 != (float(TP) + float(FN)) else 0.0
+
+            F1_score = (2 * Precision * Recall) / (Precision + Recall) if 0 != (Precision + Recall) else 0.0
+
+            eval_results[mask_type] = {
+                "F1_score": F1_score,
+                "Precision": Precision,
+                "Recall": Recall,
+                "TP": TP,
+                "FN": FN,
+                "FP": FP,
+            }
+
+        return eval_results
+
+    def get_confusion_matrix_indexes(self, mask_type="footprint"):
+        if mask_type == "footprint":
+            gt_csv_parser = bstool.CSVParse(self.gt_footprint_csv_file)
+            pred_csv_parser = bstool.CSVParse(self.footprint_csv_file)
+        else:
+            gt_csv_parser = bstool.CSVParse(self.gt_roof_csv_file)
+            pred_csv_parser = bstool.CSVParse(self.roof_csv_file)
+
+        self.ori_image_name_list = gt_csv_parser.image_name_list
+
+        gt_objects = gt_csv_parser.objects
+        pred_objects = pred_csv_parser.objects
+
+        objects = dict()
+
+        for ori_image_name in self.ori_image_name_list:
+            buildings = dict()
+
+            gt_buildings = gt_objects[ori_image_name]
+            pred_buildings = pred_objects[ori_image_name]
+
+            gt_polygons = [gt_building["polygon"] for gt_building in gt_buildings]
+            pred_polygons = [pred_building["polygon"] for pred_building in pred_buildings]
+
+            gt_polygons_origin = gt_polygons[:]
+            pred_polygons_origin = pred_polygons[:]
+
+            if len(gt_polygons) == 0 or len(pred_polygons) == 0:
+                print(
+                    f"Skip this image: {ori_image_name}, because length gt_polygons or length pred_polygons is zero"
+                )
+                continue
+
+            gt_offsets = [gt_building["offset"] for gt_building in gt_buildings]
+            pred_offsets = [pred_building["offset"] for pred_building in pred_buildings]
+
+            gt_heights = [gt_building["height"] for gt_building in gt_buildings]
+            pred_heights = [pred_building["height"] for pred_building in pred_buildings]
+
+            angles = []
+            for gt_offset, gt_height in zip(gt_offsets, gt_heights):
+                offset_x, offset_y = gt_offset
+                angle = math.atan2(math.sqrt(offset_x**2 + offset_y**2) * 0.6, gt_height)
+                angles.append(angle)
+
+            height_angle = np.array(angles).mean()
+
+            gt_polygons = geopandas.GeoSeries(gt_polygons)
+            pred_polygons = geopandas.GeoSeries(pred_polygons)
+
+            gt_df = geopandas.GeoDataFrame(
+                {"geometry": gt_polygons, "gt_df": range(len(gt_polygons))}
+            )
+            pred_df = geopandas.GeoDataFrame(
+                {"geometry": pred_polygons, "pred_df": range(len(pred_polygons))}
+            )
+
+            gt_df = gt_df.loc[~gt_df.geometry.is_empty]
+            pred_df = pred_df.loc[~pred_df.geometry.is_empty]
+
+            res_intersection = geopandas.overlay(gt_df, pred_df, how="intersection")
+
+            iou = np.zeros((len(pred_polygons), len(gt_polygons)))
+            for idx, row in res_intersection.iterrows():
+                gt_idx = row.gt_df
+                pred_idx = row.pred_df
+
+                inter = row.geometry.area
+                union = pred_polygons[pred_idx].area + gt_polygons[gt_idx].area
+
+                iou[pred_idx, gt_idx] = inter / (union - inter + 1.0)
+
+            iou_indexes = np.argwhere(iou >= 0.5)
+
+            gt_TP_indexes = list(iou_indexes[:, 1])
+            pred_TP_indexes = list(iou_indexes[:, 0])
+
+            gt_FN_indexes = list(set(range(len(gt_polygons))) - set(gt_TP_indexes))
+            pred_FP_indexes = list(set(range(len(pred_polygons))) - set(pred_TP_indexes))
+
+            buildings["gt_iou"] = np.max(iou, axis=0)
+
+            buildings["gt_TP_indexes"] = gt_TP_indexes
+            buildings["pred_TP_indexes"] = pred_TP_indexes
+            buildings["gt_FN_indexes"] = gt_FN_indexes
+            buildings["pred_FP_indexes"] = pred_FP_indexes
+
+            buildings["gt_offsets"] = np.array(gt_offsets)[gt_TP_indexes].tolist()
+            buildings["pred_offsets"] = np.array(pred_offsets)[pred_TP_indexes].tolist()
+
+            buildings["gt_heights"] = np.array(gt_heights)[gt_TP_indexes].tolist()
+            buildings["pred_heights"] = np.array(pred_heights)[pred_TP_indexes].tolist()
+
+            buildings["gt_polygons"] = gt_polygons
+            buildings["pred_polygons"] = pred_polygons
+
+            buildings["gt_polygons_matched"] = np.array(gt_polygons_origin)[gt_TP_indexes].tolist()
+            buildings["pred_polygons_matched"] = np.array(pred_polygons_origin)[
+                pred_TP_indexes
+            ].tolist()
+
+            buildings["height_angle"] = height_angle
+
+            objects[ori_image_name] = buildings
+
+        return objects
+
+    def get_confusion_matrix_indexes_direct_footprint(self):
+        gt_csv_parser = bstool.CSVParse(self.gt_footprint_csv_file)
+        pred_csv_parser = bstool.CSVParse(self.footprint_direct_csv_file)
+
+        self.ori_image_name_list = gt_csv_parser.image_name_list
+
+        gt_objects = gt_csv_parser.objects
+        pred_objects = pred_csv_parser.objects
+
+        objects = dict()
+
+        for ori_image_name in self.ori_image_name_list:
+            buildings = dict()
+
+            gt_buildings = gt_objects[ori_image_name]
+            pred_buildings = pred_objects[ori_image_name]
+
+            gt_polygons = [gt_building["polygon"] for gt_building in gt_buildings]
+            pred_polygons = [pred_building["polygon"] for pred_building in pred_buildings]
+
+            gt_polygons_origin = gt_polygons[:]
+            pred_polygons_origin = pred_polygons[:]
+
+            if len(gt_polygons) == 0 or len(pred_polygons) == 0:
+                print(
+                    f"Skip this image: {ori_image_name}, because length gt_polygons or length pred_polygons is zero"
+                )
+                continue
+
+            gt_offsets = [gt_building["offset"] for gt_building in gt_buildings]
+            pred_offsets = [pred_building["offset"] for pred_building in pred_buildings]
+
+            gt_heights = [gt_building["height"] for gt_building in gt_buildings]
+            pred_heights = [pred_building["height"] for pred_building in pred_buildings]
+
+            angles = []
+            for gt_offset, gt_height in zip(gt_offsets, gt_heights):
+                offset_x, offset_y = gt_offset
+                angle = math.atan2(math.sqrt(offset_x**2 + offset_y**2) * 0.6, gt_height)
+                angles.append(angle)
+
+            height_angle = np.array(angles).mean()
+
+            gt_polygons = geopandas.GeoSeries(gt_polygons)
+            pred_polygons = geopandas.GeoSeries(pred_polygons)
+
+            gt_df = geopandas.GeoDataFrame(
+                {"geometry": gt_polygons, "gt_df": range(len(gt_polygons))}
+            )
+            pred_df = geopandas.GeoDataFrame(
+                {"geometry": pred_polygons, "pred_df": range(len(pred_polygons))}
+            )
+
+            gt_df = gt_df.loc[~gt_df.geometry.is_empty]
+            pred_df = pred_df.loc[~pred_df.geometry.is_empty]
+
+            res_intersection = geopandas.overlay(gt_df, pred_df, how="intersection")
+
+            iou = np.zeros((len(pred_polygons), len(gt_polygons)))
+            for idx, row in res_intersection.iterrows():
+                gt_idx = row.gt_df
+                pred_idx = row.pred_df
+
+                inter = row.geometry.area
+                union = pred_polygons[pred_idx].area + gt_polygons[gt_idx].area
+
+                iou[pred_idx, gt_idx] = inter / (union - inter + 1.0)
+
+            iou_indexes = np.argwhere(iou >= 0.5)
+
+            gt_TP_indexes = list(iou_indexes[:, 1])
+            pred_TP_indexes = list(iou_indexes[:, 0])
+
+            gt_FN_indexes = list(set(range(len(gt_polygons))) - set(gt_TP_indexes))
+            pred_FP_indexes = list(set(range(len(pred_polygons))) - set(pred_TP_indexes))
+
+            buildings["gt_iou"] = np.max(iou, axis=0)
+
+            buildings["gt_TP_indexes"] = gt_TP_indexes
+            buildings["pred_TP_indexes"] = pred_TP_indexes
+            buildings["gt_FN_indexes"] = gt_FN_indexes
+            buildings["pred_FP_indexes"] = pred_FP_indexes
+
+            buildings["gt_offsets"] = np.array(gt_offsets)[gt_TP_indexes].tolist()
+            buildings["pred_offsets"] = np.array(pred_offsets)[pred_TP_indexes].tolist()
+
+            buildings["gt_heights"] = np.array(gt_heights)[gt_TP_indexes].tolist()
+            buildings["pred_heights"] = np.array(pred_heights)[pred_TP_indexes].tolist()
+
+            buildings["gt_polygons"] = gt_polygons
+            buildings["pred_polygons"] = pred_polygons
+
+            buildings["gt_polygons_matched"] = np.array(gt_polygons_origin)[gt_TP_indexes].tolist()
+            buildings["pred_polygons_matched"] = np.array(pred_polygons_origin)[
+                pred_TP_indexes
+            ].tolist()
+
+            buildings["height_angle"] = height_angle
+
+            objects[ori_image_name] = buildings
+
+        return objects
+
+    # use
+    def get_confusion_matrix_indexes_json_gt(self, mask_type="footprint"):
+        if mask_type == "footprint":
+            gt_coco_parser = bstool.COCOParse(self.anno_file)
+            pred_csv_parser = bstool.CSVParse(self.footprint_csv_file)
+        else:
+            raise (NotImplementedError)
+
+        self.ori_image_name_list = pred_csv_parser.image_name_list
+
+        # gt_objects = gt_csv_parser.objects
+        pred_objects = pred_csv_parser.objects
+
+        objects = dict()
+
+        for ori_image_name in self.ori_image_name_list:
+            buildings = dict()
+            try:
+                gt_buildings = gt_coco_parser(ori_image_name + ".png")
+            except:
+                gt_buildings = gt_coco_parser(ori_image_name + ".jpg")
+            pred_buildings = pred_objects[ori_image_name]
+
+            gt_polygons = [
+                bstool.mask2polygon(gt_building["footprint_mask"]).buffer(0)
+                for gt_building in gt_buildings
+            ]
+            pred_polygons = [pred_building["polygon"] for pred_building in pred_buildings]
+
+            gt_polygons_origin = gt_polygons[:]
+            pred_polygons_origin = pred_polygons[:]
+
+            if len(gt_polygons) == 0 or len(pred_polygons) == 0:
+                print(
+                    f"Skip this image: {ori_image_name}, because length gt_polygons or length pred_polygons is zero"
+                )
+                continue
+
+            gt_offsets = [gt_building["offset"] for gt_building in gt_buildings]
+            pred_offsets = [pred_building["offset"] for pred_building in pred_buildings]
+
+            gt_heights = [gt_building["building_height"] for gt_building in gt_buildings]
+            pred_heights = [pred_building["height"] for pred_building in pred_buildings]
+
+            gt_polygons = geopandas.GeoSeries(gt_polygons)
+            pred_polygons = geopandas.GeoSeries(pred_polygons)
+
+            gt_df = geopandas.GeoDataFrame(
+                {"geometry": gt_polygons, "gt_df": range(len(gt_polygons))}
+            )
+            pred_df = geopandas.GeoDataFrame(
+                {"geometry": pred_polygons, "pred_df": range(len(pred_polygons))}
+            )
+
+            gt_df = gt_df.loc[~gt_df.geometry.is_empty]
+            pred_df = pred_df.loc[~pred_df.geometry.is_empty]
+
+            res_intersection = geopandas.overlay(gt_df, pred_df, how="intersection")
+
+            iou = np.zeros((len(pred_polygons), len(gt_polygons)))
+            for idx, row in res_intersection.iterrows():
+                gt_idx = row.gt_df
+                pred_idx = row.pred_df
+
+                inter = row.geometry.area
+                union = pred_polygons[pred_idx].area + gt_polygons[gt_idx].area
+
+                iou[pred_idx, gt_idx] = inter / (union - inter + 1.0)
+
+            iou_indexes = np.argwhere(iou >= 0.5)
+
+            gt_TP_indexes = list(iou_indexes[:, 1])
+            pred_TP_indexes = list(iou_indexes[:, 0])
+
+            gt_FN_indexes = list(set(range(len(gt_polygons))) - set(gt_TP_indexes))
+            pred_FP_indexes = list(set(range(len(pred_polygons))) - set(pred_TP_indexes))
+
+            buildings["gt_iou"] = np.max(iou, axis=0)
+
+            buildings["gt_TP_indexes"] = gt_TP_indexes
+            buildings["pred_TP_indexes"] = pred_TP_indexes
+            buildings["gt_FN_indexes"] = gt_FN_indexes
+            buildings["pred_FP_indexes"] = pred_FP_indexes
+
+            buildings["gt_offsets"] = np.array(gt_offsets)[gt_TP_indexes].tolist()
+            buildings["pred_offsets"] = np.array(pred_offsets)[pred_TP_indexes].tolist()
+
+            buildings["gt_heights"] = np.array(gt_heights)[gt_TP_indexes].tolist()
+            buildings["pred_heights"] = np.array(pred_heights)[pred_TP_indexes].tolist()
+
+            buildings["gt_polygons"] = gt_polygons
+            buildings["pred_polygons"] = pred_polygons
+
+            buildings["gt_polygons_matched"] = np.array(gt_polygons_origin)[gt_TP_indexes].tolist()
+            buildings["pred_polygons_matched"] = np.array(pred_polygons_origin)[
+                pred_TP_indexes
+            ].tolist()
+
+            objects[ori_image_name] = buildings
+
+        return objects
+
+    def visualization_boundary(
+        self,
+        image_dir,
+        vis_dir,
+        mask_types=["roof", "footprint","direct_footprint"],
+        with_iou=False,
+        with_gt=True,
+        with_only_pred=False,
+        with_image=True,
+    ):
+        colors = {
+            # "gt_TP": (0, 255, 0),
+            "pred_TP": (255, 255, 0),
+            "FP": (0, 255, 255),
+            "FN": (255, 0, 0),
+        }
+        for mask_type in mask_types:
+            if mask_type == 'direct_footprint':
+                objects = self.get_confusion_matrix_indexes_direct_footprint()
+            else:
+                objects = self.get_confusion_matrix_indexes(mask_type=mask_type)
+            for image_name in os.listdir(image_dir):
+                image_basename = bstool.get_basename(image_name)
+                image_file = os.path.join(image_dir, image_name)
+
+                output_file = os.path.join(vis_dir, mask_type, image_name)
+                bstool.mkdir_or_exist(os.path.join(vis_dir, mask_type))
+
+                if with_image:
+                    img = cv2.imread(image_file)
+                else:
+                    img = bstool.generate_image(1024, 1024, color=(255, 255, 255))
+
+                if image_basename not in objects:
+                    continue
+
+                building = objects[image_basename]
+
+                if with_only_pred == False:
+                    for idx, gt_polygon in enumerate(building["gt_polygons"]):
+                        iou = building["gt_iou"][idx]
+                        if idx in building["gt_TP_indexes"]:
+                            # color = colors["gt_TP"][::-1]
+                            continue
+                            if not with_gt:
+                                continue
+                        else:
+                            color = colors["FN"][::-1]
+
+                        if gt_polygon.geom_type != "Polygon":
+                            continue
+
+                        img = bstool.draw_mask_boundary(
+                            img, bstool.polygon2mask(gt_polygon), color=color
+                        )
+                        if with_iou:
+                            img = bstool.draw_iou(img, gt_polygon, iou, color=color)
+
+                for idx, pred_polygon in enumerate(building["pred_polygons"]):
+                    if with_only_pred == False:
+                        if idx in building["pred_TP_indexes"]:
+                            color = colors["pred_TP"][::-1]
+                        else:
+                            color = colors["FP"][::-1]
+                    else:
+                        if with_image:
+                            color = colors["pred_TP"][::-1]
+                        else:
+                            color = (0, 0, 255)
+
+                    if pred_polygon.geom_type != "Polygon":
+                        continue
+
+                    img = bstool.draw_mask_boundary(
+                        img, bstool.polygon2mask(pred_polygon), color=color
+                    )
+                cv2.imwrite(output_file, img)
+
+    def visualization_offset(self, image_dir, vis_dir, with_footprint=True):
+        print("========== generation vis images with offset ==========")
+        if with_footprint:
+            image_dir = os.path.join(vis_dir, "..", "boundary", "footprint")
+            vis_dir = vis_dir + "_with_footprint"
+
+        bstool.mkdir_or_exist(vis_dir)
+
+        colors = {
+            "gt_matched": (0, 255, 0),
+            "pred_matched": (255, 255, 0),
+            "pred_un_matched": (0, 255, 255),
+            "gt_un_matched": (255, 0, 0),
+        }
+        objects = self.get_confusion_matrix_indexes(mask_type="roof")
+
+        for image_name in os.listdir(image_dir):
+            image_basename = bstool.get_basename(image_name)
+            image_file = os.path.join(image_dir, image_name)
+
+            output_file = os.path.join(vis_dir, image_name)
+
+            img = cv2.imread(image_file)
+
+            if image_basename not in objects:
+                continue
+
+            building = objects[image_basename]
+
+            height_angle = building["height_angle"]
+
+            img = bstool.draw_height_angle(img, height_angle)
+
+            for gt_polygon, gt_offset, pred_polygon, pred_offset, gt_height in zip(
+                building["gt_polygons_matched"],
+                building["gt_offsets"],
+                building["pred_polygons_matched"],
+                building["pred_offsets"],
+                building["gt_heights"],
+            ):
+                gt_roof_centroid = list(gt_polygon.centroid.coords)[0]
+                pred_roof_centroid = list(pred_polygon.centroid.coords)[0]
+
+                gt_footprint_centroid = [
+                    coordinate - offset for coordinate, offset in zip(gt_roof_centroid, gt_offset)
+                ]
+                pred_footprint_centroid = [
+                    coordinate - offset
+                    for coordinate, offset in zip(pred_roof_centroid, pred_offset)
+                ]
+
+                xoffset, yoffset = gt_offset
+                transform_matrix = [1, 0, 0, 1, -xoffset, -yoffset]
+                gt_footprint_polygon = affinity.affine_transform(gt_polygon, transform_matrix)
+
+                xoffset, yoffset = pred_offset
+                transform_matrix = [1, 0, 0, 1, -xoffset, -yoffset]
+                pred_footprint_polygon = affinity.affine_transform(pred_polygon, transform_matrix)
+
+                intersection = gt_footprint_polygon.intersection(pred_footprint_polygon).area
+                union = gt_footprint_polygon.union(pred_footprint_polygon).area
+
+                iou = intersection / (union - intersection + 1.0)
+
+                if iou >= 0.5:
+                    gt_color = colors["gt_matched"][::-1]
+                    pred_color = colors["pred_matched"][::-1]
+                else:
+                    gt_color = colors["gt_un_matched"][::-1]
+                    pred_color = colors["pred_un_matched"][::-1]
+
+                img = bstool.draw_offset_arrow(
+                    img, gt_roof_centroid, gt_footprint_centroid, color=gt_color
+                )
+                img = bstool.draw_offset_arrow(
+                    img, pred_roof_centroid, pred_footprint_centroid, color=pred_color
+                )
+
+            cv2.imwrite(output_file, img)
+
+
+def mkdir_or_exist(dir_name, mode=0o777):
+    """make of check the dir
+
+    Args:
+        dir_name (str): directory name
+        mode (str, optional): authority of mkdir. Defaults to 0o777.
+    """
+    if dir_name == "":
+        return
+    dir_name = os.path.expanduser(dir_name)
+    if six.PY3:
+        os.makedirs(dir_name, mode=mode, exist_ok=True)
+    else:
+        if not os.path.isdir(dir_name):
+            os.makedirs(dir_name, mode=mode)
+
+
+def write_results2csv(results, meta_info=None):
+    """Write the evaluation results to csv file
+
+    Args:
+        results (list): list of result
+        meta_info (dict, optional): The meta info about the evaluation (file path of ground truth etc.). Defaults to None.
+    """
+    # print("meta_info: ", meta_info)
+    segmentation_eval_results = results[0]
+    with open(meta_info["summary_file"], "w") as summary:
+        csv_writer = csv.writer(summary, delimiter=",")
+        csv_writer.writerow(["Meta Info"])
+        csv_writer.writerow(["model", meta_info["model"]])
+        csv_writer.writerow(["anno_file", meta_info["anno_file"]])
+        csv_writer.writerow(["gt_roof_csv_file", meta_info["gt_roof_csv_file"]])
+        csv_writer.writerow(["gt_footprint_csv_file", meta_info["gt_footprint_csv_file"]])
+        # csv_writer.writerow(['vis_dir', meta_info['vis_dir']])
+        csv_writer.writerow([""])
+        for mask_type in ["roof", "footprint"]:
+            csv_writer.writerow([mask_type])
+            csv_writer.writerow([segmentation_eval_results[mask_type]])
+            csv_writer.writerow(["F1 Score", segmentation_eval_results[mask_type]["F1_score"]])
+            csv_writer.writerow(["Precision", segmentation_eval_results[mask_type]["Precision"]])
+            csv_writer.writerow(["Recall", segmentation_eval_results[mask_type]["Recall"]])
+            csv_writer.writerow(["True Positive", segmentation_eval_results[mask_type]["TP"]])
+            csv_writer.writerow(["False Positive", segmentation_eval_results[mask_type]["FP"]])
+            csv_writer.writerow(["False Negative", segmentation_eval_results[mask_type]["FN"]])
+            csv_writer.writerow([""])
+
+        csv_writer.writerow([""])
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="MMDet eval on semantic segmentation")
+    parser.add_argument("pkl_file_path", help="pkl file for eval")
+    parser.add_argument(
+        "csv_save_path", help="root to save csv file, if path not exists, will create"
+    )
+    parser.add_argument(
+        "--version", type=str, default="bc_v100.01.09", help="model name (version) for evaluation"
+    )
+    parser.add_argument(
+        "--model",
+        type=str,
+        default="loft-foa-oa-na-fro-h",
+        help="full model name for evaluation",
+    )
+    parser.add_argument(
+        "--city", type=str, default="BONAI", help="dataset city for evaluation"
+    )
+
+    args = parser.parse_args()
+
+    return args
+
+
+def get_model_shortname(model_name):
+    return "bonai" + "_" + model_name.split("_")[1]
+
+
+class EvaluationParameters:
+    def __init__(self, city, model, pkl, csv_root):
+        # flags
+        self.with_vis = False
+        self.with_only_vis = False
+        self.with_only_pred = False
+        self.with_image = True
+        self.with_offset = True
+        self.save_merged_csv = False
+
+        city_types_to_full = {
+            "omnicity": "OmniCityView3WithOffset",
+            "hk": "hongkong",
+            "bonai_hk": "bonai_hongkong",
+        }
+
+        # basic info
+        self.city = city
+        self.model = model
+        self.score_threshold = 0.4
+
+        self.dataset_root = "./data"
+        
+        # self.with_vis = True # whether draw when eval
+        # self.with_only_vis = True # only draw 
+
+        # Default dataset
+        # dataset file
+        self.anno_file = f"{self.dataset_root}/BONAI/coco/bonai_shanghai_xian_test_roof.json"
+        self.test_image_dir = f"{self.dataset_root}/BONAI/test/images"
+        # csv ground truth files
+        self.gt_roof_csv_file = f"{self.dataset_root}/BONAI/csv/shanghai_xian_v3_merge_val_roof_crop1024_gt_minarea500.csv"
+        self.gt_footprint_csv_file = f"{self.dataset_root}/BONAI/csv/shanghai_xian_v3_merge_val_footprint_crop1024_gt_minarea500.csv"
+
+        if city == 'bonai':
+            print('################ Use Default City BONAI shanghai_xian for Eval ################')
+        elif city in ['bonai_hk']: # For City Group
+            city_full_name = city_types_to_full[city]
+            print('################ Use City Group {} for Eval ################'.format(city_full_name))
+            self.anno_file = f"{self.dataset_root}/combined_test/coco/{city_full_name}_test_roof.json"
+            self.test_image_dir = f"{self.dataset_root}/combined_test/images/{city_full_name}/"
+            self.gt_roof_csv_file = f"{self.dataset_root}/combined_test/csv/{city_full_name}_roof_gt_minarea500.csv"
+            self.gt_footprint_csv_file = f"{self.dataset_root}/combined_test/csv/{city_full_name}_footprint_gt_minarea500.csv"
+        elif city in ['omnicity','hk']:
+            city_full_name = city_types_to_full[city]
+            print('################ Use City {} for Eval ################'.format(city_full_name))
+            self.test_image_dir = f"{self.dataset_root}/{city_full_name}/test/images/"
+            self.anno_file = f"{self.dataset_root}/{city_full_name}/coco/{city_full_name}_test_roof.json"
+            self.gt_roof_csv_file = f"{self.dataset_root}/{city_full_name}/csv/{city_full_name}_roof_gt_minarea500.csv"
+            self.gt_footprint_csv_file = f"{self.dataset_root}/{city_full_name}/csv/{city_full_name}_footprint_gt_minarea500.csv"
+        else:
+            print('################ No Such TEST City Type: {}! ################'.format(city))
+            print('################ Use Default City BONAI shanghai_xian for Eval! ################')
+
+        # detection result files
+        self.mmdetection_pkl_file = pkl
+        self.save_root = csv_root
+
+        self.csv_info = "merged" if self.save_merged_csv else "splitted"
+
+        if not os.path.exists(self.save_root):
+            os.makedirs(self.save_root)
+        self.pred_roof_csv_file = os.path.join(self.save_root, "roof_pred.csv")
+        self.pred_footprint_csv_file = os.path.join(self.save_root, "footprint_offset.csv")
+        self.direct_footprint_csv_file = os.path.join(self.save_root, "footprint_direct.csv")
+
+        # vis
+        if self.with_vis or self.with_only_vis:
+            self.vis_boundary_dir = f'{self.save_root}/vis/boundary' + ("_pred" if self.with_only_pred else "")
+            self.vis_offset_dir = f'{self.save_root}/vis/offset'
+
+        self.summary_file = self.save_root + "eval_summary_{self.csv_info}.csv"
+
+    def post_processing(self):
+        mkdir_or_exist(self.vis_boundary_dir)
+        mkdir_or_exist(self.vis_offset_dir)
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    warnings.filterwarnings("ignore")
+    eval_parameters = EvaluationParameters(
+        city=args.city, model=args.model, pkl=args.pkl_file_path, csv_root=args.csv_save_path
+    )
+    # eval_parameters.post_processing()
+    print(f"========== {args.model} ========== {args.city} ==========")
+
+    pkl_file = eval_parameters.mmdetection_pkl_file
+    pkl_test = mmcv.load(pkl_file)
+    eval_offset = False if pkl_test[0][2] is None else True
+    eval_height = False if pkl_test[0][3] is None else True
+    eval_direct_footprint = False if pkl_test[0][5] is None else True
+    eval_offset_angle = False if pkl_test[0][6] is None else True
+    eval_naidr_angle = False if pkl_test[0][7] is None else True
+
+    evaluation = Evaluation(
+        model=eval_parameters.model,
+        anno_file=eval_parameters.anno_file,
+        pkl_file=pkl_file,
+        gt_roof_csv_file=eval_parameters.gt_roof_csv_file,
+        gt_footprint_csv_file=eval_parameters.gt_footprint_csv_file,
+        roof_csv_file=eval_parameters.pred_roof_csv_file,
+        footprint_csv_file=eval_parameters.pred_footprint_csv_file,
+        footprint_direct_csv_file=eval_parameters.direct_footprint_csv_file,
+        iou_threshold=0.1,
+        score_threshold=eval_parameters.score_threshold,
+        with_offset=eval_offset,
+        show=False,
+        save_merged_csv=eval_parameters.save_merged_csv,
+    )
+
+    if eval_parameters.with_only_vis is False:
+        # evaluation
+        if evaluation.dump_result:
+            # calculate the F1 score
+            offset = {"aEPE": []}
+            offset_angle = []
+            nadir_angle = []
+            mae = []
+            rmse = []
+
+            if eval_offset:
+                offset = evaluation.offset_error_vector()
+                roof_and_cal_footprint = evaluation.segmentation_evaluate()
+            else:
+                roof_and_cal_footprint = evaluation.segmentation_evaluate(mask_types=["roof"])
+                roof_and_cal_footprint["footprint"] = {}
+                roof_and_cal_footprint["footprint"]["F1_score"] = []
+                roof_and_cal_footprint["footprint"]["Precision"] = []
+                roof_and_cal_footprint["footprint"]["Recall"] = []
+            if eval_naidr_angle:
+                nadir_angle = evaluation.nadir_angle_evaluate()
+            if eval_offset_angle:
+                offset_angle = evaluation.offset_angle_evaluate()
+            if eval_height:
+                mae, rmse = evaluation.height_calculate()
+            if eval_direct_footprint:
+                direct_footprint = evaluation.direct_footprint_evaluate()
+            else:
+                direct_footprint = {"F1_score":[],"Precision":[],"Recall":[]}
+
+            print("roof_F1:                 ", roof_and_cal_footprint["roof"]["F1_score"])
+            print("calculated_footprint_F1: ", roof_and_cal_footprint["footprint"]["F1_score"])
+
+            print("inferenced_footprint_F1: ", direct_footprint['F1_score'])
+            print("inferenced_footprint_Precision: ", direct_footprint['Precision'])
+            print("inferenced_footprint_Recall: ", direct_footprint['Recall'])
+        
+            print("offset_EPE:              ", offset["aEPE"])
+            print("height_MAE:              ", mae)
+            print("height_RMSE:              ", rmse)
+            print("offset_angle_mae:        ", offset_angle)
+            print("nadir_angle_mae:         ", nadir_angle)
+
+            with open(os.path.join(eval_parameters.save_root, "results.txt"), "w") as w:
+                w.write("roof_F1: {}\n".format(roof_and_cal_footprint["roof"]["F1_score"]))
+                w.write("roof_Precision: {}\n".format(roof_and_cal_footprint["roof"]["Precision"]))
+                w.write("roof_Recall: {}\n".format(roof_and_cal_footprint["roof"]["Recall"]))
+
+                w.write("calculated_footprint_F1: {}\n".format(roof_and_cal_footprint["footprint"]["F1_score"]))
+                w.write("calculated_footprint_Precision: {}\n".format(roof_and_cal_footprint["footprint"]["Precision"]))
+                w.write("calculated_footprint_Recall: {}\n".format(roof_and_cal_footprint["footprint"]["Recall"]))
+
+                w.write("inferenced_footprint_F1: {}\n".format(direct_footprint['F1_score']))
+                w.write("inferenced_footprint_Precision: {}\n".format(direct_footprint['Precision']))
+                w.write("inferenced_footprint_Recall: {}\n".format(direct_footprint['Recall']))
+
+                w.write("offset_EPE: {}\n".format(offset["aEPE"]))
+                w.write("height_MAE: {}\n".format(mae))
+                w.write("height_RMSE: {}\n".format(rmse))
+                w.write("offset_angle_mae: {}\n".format(offset_angle))
+                w.write("nadir_angle_mae: {}\n".format(nadir_angle))
+
+        else:
+            print(
+                "!!!!!!!!!!!!!!!!!!!!!! ALl the results of images are empty !!!!!!!!!!!!!!!!!!!!!!!!!!!"
+            )
+
+        # vis
+        if eval_parameters.with_vis:
+            # generate the vis results
+            evaluation.visualization_boundary(
+                image_dir=eval_parameters.test_image_dir,
+                vis_dir=eval_parameters.vis_boundary_dir,
+                with_gt=True,
+            )
+            # draw offset in the image (not used in this file)
+            # for with_footprint in [True, False]:
+            #     evaluation.visualization_offset(image_dir=image_dir, vis_dir=vis_offset_dir, with_footprint=with_footprint)
+    else:
+        # generate the vis results
+        evaluation.visualization_boundary(
+            image_dir=eval_parameters.test_image_dir,
+            vis_dir=eval_parameters.vis_boundary_dir,
+            with_gt=True,
+            with_only_pred=eval_parameters.with_only_pred,
+            with_image=eval_parameters.with_image,
+        )
+        # draw offset in the image (not used in this file)
+        # for with_footprint in [False, True]:
+        #     evaluation.visualization_offset(image_dir=eval_parameters.test_image_dir, vis_dir=eval_parameters.vis_offset_dir, with_footprint=with_footprint)
\ No newline at end of file
diff --git a/tools/bonai/bonai_test.py b/tools/bonai/bonai_test.py
new file mode 100644
index 00000000..72e06abc
--- /dev/null
+++ b/tools/bonai/bonai_test.py
@@ -0,0 +1,218 @@
+import argparse
+import os
+import warnings
+
+import mmcv
+import torch
+from mmcv import Config, DictAction
+from mmcv.cnn import fuse_conv_bn
+from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
+from mmcv.runner import get_dist_info, init_dist, load_checkpoint, wrap_fp16_model
+
+from mmdet.apis import multi_gpu_test, single_gpu_test
+from mmdet.datasets import build_dataloader, build_dataset
+from mmdet.models import build_detector
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="MMDet test (and eval) a model")
+    parser.add_argument("config", help="test config file path")
+    parser.add_argument("checkpoint", help="checkpoint file")
+    parser.add_argument("--out", help="output result file in pickle format")
+    parser.add_argument("--merged-out", help="output merged result file in pickle format")
+    parser.add_argument("--merge-iou-threshold", type=float, default=0.1, help="threshold of iou")
+    parser.add_argument(
+        "--fuse-conv-bn",
+        action="store_true",
+        help="Whether to fuse conv and bn, this will slightly increase" "the inference speed",
+    )
+    parser.add_argument(
+        "--format-only",
+        action="store_true",
+        help="Format the output results without perform evaluation. It is"
+        "useful when you want to format the result to a specific format and "
+        "submit it to the test server",
+    )
+    parser.add_argument(
+        "--eval",
+        type=str,
+        nargs="+",
+        default="segm",
+        help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
+        ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC',
+    )
+    parser.add_argument(
+        "--city", type=str, default="bonai_shanghai_xian", help="dataset for evaluation"
+    )
+    parser.add_argument("--show", action="store_true", help="show results")
+    parser.add_argument("--show-dir", help="directory where painted images will be saved")
+    parser.add_argument(
+        "--show-score-thr", type=float, default=0.3, help="score threshold (default: 0.3)"
+    )
+    parser.add_argument(
+        "--gpu-collect", action="store_true", help="whether to use gpu to collect results."
+    )
+    parser.add_argument(
+        "--tmpdir",
+        help="tmp directory used for collecting results from multiple "
+        "workers, available when gpu-collect is not specified",
+    )
+    parser.add_argument("--options", nargs="+", action=DictAction, help="arguments in dict")
+    parser.add_argument(
+        "--launcher",
+        choices=["none", "pytorch", "slurm", "mpi"],
+        default="none",
+        help="job launcher",
+    )
+    parser.add_argument("--local_rank", type=int, default=0)
+    parser.add_argument("--nms-score", type=float, default=0.5, help="nms threshold (default: 0.5)")
+    args = parser.parse_args()
+    if "LOCAL_RANK" not in os.environ:
+        os.environ["LOCAL_RANK"] = str(args.local_rank)
+    return args
+
+
+def choose_test_dataset(cfg, args, mask_type, data_root):
+    city_types_to_full = {
+        "omnicity": "OmniCityView3WithOffset",
+        "hk": "hongkong",
+        "bonai_hk": "bonai_hongkong",
+    }
+    if "roof" in mask_type or "footprint" in mask_type:
+        mask_short = "footprint" if "footprint" in mask_type else "roof"
+
+        cfg.data.test.ann_file = f"{data_root}BONAI/coco/bonai_shanghai_xian_test_{mask_short}.json"
+        cfg.data.test.img_prefix = data_root + "BONAI/test/images/"
+        if args.city == "bonai":
+            print("################ Use Default City BONAI shanghai_xian for TEST ################")
+        elif args.city in ["bonai_hk"]:  # For City Group
+            city_full_name = city_types_to_full[args.city]
+            print(
+                "################ Use City Group {} for {} TEST ################".format(
+                    city_full_name, mask_short
+                )
+            )
+            cfg.data.test.ann_file = (
+                f"{data_root}combined_test/coco/{city_full_name}_test_{mask_short}.json"
+            )
+            cfg.data.test.img_prefix = f"{data_root}combined_test/images/{city_full_name}/"
+        elif args.city in ["omnicity", "hk"]:
+            city_full_name = city_types_to_full[args.city]
+            print(
+                "################ Use Single City {} for {} TEST ################".format(
+                    city_full_name, mask_short
+                )
+            )
+            cfg.data.test.ann_file = (
+                f"{data_root}{city_full_name}/coco/{city_full_name}_test_{mask_short}.json"
+            )
+            cfg.data.test.img_prefix = f"{data_root}{city_full_name}/test/images/"
+        else:
+            print("################ No Such TEST City Type: {}! ################".format(args.city))
+            print(
+                "################ Use Default City BONAI shanghai_xian for {} TEST! ################".format(
+                    mask_short
+                )
+            )
+    else:
+        raise ValueError(f"Wrong mask type for test: {mask_type}")
+
+
+def eval_different_mask(cfg, args, kwargs, mask_type, outputs, data_root):
+    choose_test_dataset(cfg, args, mask_type, data_root)
+
+    print("Dataset for evaluation: ", cfg.data.test.ann_file)
+    print("################", mask_type, " evaluate start ################")
+    dataset = build_dataset(cfg.data.test)
+    dataset.evaluate(outputs, args.eval, mask_type, **kwargs)
+    print("################", mask_type, " evaluate end ################")
+
+
+def main():
+    args = parse_args()
+
+    assert args.out or args.eval or args.format_only or args.show or args.show_dir, (
+        "Please specify at least one operation (save/eval/format/show the "
+        'results / save the results) with the argument "--out", "--eval"'
+        ', "--format-only", "--show" or "--show-dir"'
+    )
+
+    if args.eval and args.format_only:
+        raise ValueError("--eval and --format_only cannot be both specified")
+
+    if args.out is not None and not args.out.endswith((".pkl", ".pickle")):
+        raise ValueError("The output file must be a pkl file.")
+
+    cfg = Config.fromfile(args.config)
+    # set cudnn_benchmark
+    if cfg.get("cudnn_benchmark", False):
+        torch.backends.cudnn.benchmark = True
+    cfg.model.pretrained = None
+    cfg.data.test.test_mode = True
+
+    data_root = "./data/"
+
+    choose_test_dataset(cfg, args, "roof", data_root)
+
+    if cfg.test_cfg.get("rcnn", False):
+        cfg.test_cfg.rcnn.nms.iou_threshold = args.nms_score
+        print("NMS config for testing: {}".format(cfg.test_cfg.rcnn.nms))
+    # init distributed env first, since logger depends on the dist info.
+    if args.launcher == "none":
+        distributed = False
+    else:
+        distributed = True
+        init_dist(args.launcher, **cfg.dist_params)
+
+    # build the dataloader
+    # TODO: support multiple images per gpu (only minor changes are needed)
+    dataset = build_dataset(cfg.data.test)
+    data_loader = build_dataloader(
+        dataset,
+        samples_per_gpu=cfg.data.test_dataloader.samples_per_gpu,
+        workers_per_gpu=cfg.data.test_dataloader.workers_per_gpu,
+        dist=distributed,
+        shuffle=False,
+    )
+    # build the model and load checkpoint
+    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    fp16_cfg = cfg.get("fp16", None)
+    if fp16_cfg is not None:
+        wrap_fp16_model(model)
+    checkpoint = load_checkpoint(model, args.checkpoint, map_location="cpu")
+    if args.fuse_conv_bn:
+        model = fuse_conv_bn(model)
+    # old versions did not save class info in checkpoints, this walk around is
+    # for backward compatibility
+    if "CLASSES" in checkpoint["meta"]:
+        model.CLASSES = checkpoint["meta"]["CLASSES"]
+    else:
+        model.CLASSES = dataset.CLASSES
+
+    if not distributed:
+        model = MMDataParallel(model, device_ids=[0])
+        outputs = single_gpu_test(model, data_loader, args.show, args.show_dir, args.show_score_thr)
+    else:
+        model = MMDistributedDataParallel(
+            model.cuda(), device_ids=[torch.cuda.current_device()], broadcast_buffers=False
+        )
+        outputs = multi_gpu_test(model, data_loader, args.tmpdir, args.gpu_collect)
+
+    rank, _ = get_dist_info()
+    if rank == 0:
+        if args.out:
+            print(f"\nwriting results to {args.out}")
+            mmcv.dump(outputs, args.out)
+        kwargs = {} if args.options is None else args.options
+        if args.format_only:
+            dataset.format_results(outputs, **kwargs)
+        if args.eval:
+            mask_types = ["roof", "offset_footprint", "direct_footprint"]
+            eval_different_mask(cfg, args, kwargs, mask_types[0], outputs, data_root)
+            eval_different_mask(cfg, args, kwargs, mask_types[1], outputs, data_root)
+            eval_different_mask(cfg, args, kwargs, mask_types[2], outputs, data_root)
+
+
+if __name__ == "__main__":
+    warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.functional")
+    main()
diff --git a/tools/bonai/dataset_process.py b/tools/bonai/dataset_process.py
new file mode 100644
index 00000000..1002d23f
--- /dev/null
+++ b/tools/bonai/dataset_process.py
@@ -0,0 +1,102 @@
+import json
+import random
+from collections import defaultdict
+from pathlib import Path
+
+DATASETS_DIR = Path(__file__).parent.parent.parent / "data"
+# DATASET = "hongkong"
+# DATASET = "OmniCityView3WithOffset"
+DATASET = "BONAI"
+DATASET_DIR = DATASETS_DIR / DATASET
+
+
+def check_ratios(ann_dir):
+    total_cnt = 0
+    oh_cnt = 0
+    h_cnt = 0
+    for ann_path in ann_dir.iterdir():
+        str_path = str(ann_path)
+        print(str_path)
+        if str_path.startswith(".") or "test" in str_path:
+            continue
+
+        with open(ann_path, "r", encoding="UTF-8") as fp:
+            content = json.load(fp)
+
+        anns_per_image = defaultdict(list)
+        for ann in content["annotations"]:
+            anns_per_image[ann["image_id"]].append(ann)
+        for anns in anns_per_image.values():
+            total_cnt += 1
+            if "offset" in anns[0] and "building_height" in anns[0]:
+                oh_cnt += 1
+            elif "building_height" in anns[0]:
+                h_cnt += 1
+    print("with offset&height anns:", oh_cnt / total_cnt)
+    print("with height anns:", h_cnt / total_cnt)
+    print("with footprint anns:", (total_cnt - oh_cnt - h_cnt) / total_cnt)
+
+
+def let_segmentation_equal_to_roof():
+    ann_path = DATASET_DIR / "coco" / "bonai_shanghai_xian_test_roof.json"
+    with open(ann_path, "r", encoding="UTF-8") as fp:
+        content = json.load(fp)
+    for ann in content["annotations"]:
+        ann["segmentation"] = [ann["roof_mask"]]
+    with open(ann_path, "w", encoding="UTF-8") as fp:
+        json.dump(content, fp, indent=4, separators=(",", ":"))
+
+
+def create_wsl_dataset(oh_ratio, h_ratio, n_ratio):
+    assert oh_ratio >= 0.0 and h_ratio >= 0.0 and n_ratio >= 0.0
+    assert oh_ratio + h_ratio + n_ratio <= 1.0
+
+    ann_dir = DATASET_DIR / "coco"
+    oh_suffix = f"_{str(int(oh_ratio*100))}oh" if oh_ratio > 0.0 else ""
+    h_suffix = f"_{str(int(h_ratio*100))}h" if h_ratio > 0.0 else ""
+    n_suffix = f"_{str(int(n_ratio*100))}n" if n_ratio > 0.0 else ""
+    new_ann_dir = DATASET_DIR / f"coco{oh_suffix}{h_suffix}{n_suffix}"
+
+    if not new_ann_dir.exists():
+        new_ann_dir.mkdir()
+
+    for ann_path in ann_dir.iterdir():
+        str_path = str(ann_path).rsplit("/", 1)[-1]
+        print(str_path)
+        if str_path.startswith(".") or "test" in str_path:
+            continue
+
+        with open(ann_path, "r", encoding="UTF-8") as fp:
+            content = json.load(fp)
+
+        anns_per_image = defaultdict(list)
+        new_anns = []
+        for ann in content["annotations"]:
+            anns_per_image[ann["image_id"]].append(ann)
+
+        for anns in anns_per_image.values():
+            x = random.random()
+            if x < oh_ratio:
+                new_anns += anns
+            elif x < oh_ratio + h_ratio:
+                for ann in anns:
+                    ann.pop("offset")
+                new_anns += anns
+            elif x < oh_ratio + h_ratio + n_ratio:
+                for ann in anns:
+                    ann.pop("offset")
+                    ann.pop("building_height")
+                new_anns += anns
+
+        content["annotations"] = new_anns
+
+        new_ann_path = new_ann_dir / str_path
+        with open(new_ann_path, "w", encoding="UTF-8") as fp:
+            json.dump(content, fp, indent=4, separators=(",", ":"))
+        print(new_ann_path)
+
+
+if __name__ == "__main__":
+    # let_segmentation_equal_to_roof()
+    # create_wsl_dataset(0.65, 0.15, 0.2)
+    check_ratios(DATASET_DIR / "coco_30oh_30h_40n")
diff --git a/tools/bonai/statistical_visualization.ipynb b/tools/bonai/statistical_visualization.ipynb
new file mode 100644
index 00000000..d97d7564
--- /dev/null
+++ b/tools/bonai/statistical_visualization.ipynb
@@ -0,0 +1,116 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from pathlib import Path\n",
+    "from collections import defaultdict\n",
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DATASETS_DIR = Path(\"data/\")\n",
+    "DATASET = \"hongkong\"\n",
+    "# dataset = \"OmniCityView3WithOffset\"\n",
+    "# dataset = \"BONAI\"\n",
+    "DATASET_DIR = DATASETS_DIR / DATASET"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def count_height():\n",
+    "    test_ann_path = DATASET_DIR / \"coco\" / \"hongkong_test_roof.json\"\n",
+    "    train_ann_path = DATASET_DIR / \"coco\" / \"hongkong_trainval.json\"\n",
+    "    height_count = defaultdict(int)\n",
+    "    for ann_path in (test_ann_path, train_ann_path):\n",
+    "        with open(ann_path, \"r\") as fp:\n",
+    "            content = json.load(fp)\n",
+    "        for ann in content[\"annotations\"]:\n",
+    "            height = ann[\"building_height\"]\n",
+    "            if height >= 200:\n",
+    "                height_count[200] += 1\n",
+    "            else:\n",
+    "                height_count[int(float(height) / 20) * 20] += 1\n",
+    "    # print(height_count)\n",
+    "    keys = list(sorted(height_count.keys()))\n",
+    "    values = []\n",
+    "    for key in keys:\n",
+    "        value = height_count[key]\n",
+    "        values.append(value)\n",
+    "        # print(f\"{key}: {value}\")\n",
+    "    return keys, values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def draw_hist(data, labels):\n",
+    "    # data = [5, 20, 15, 25, 10]\n",
+    "    # labels = [\"Tom\", \"Dick\", \"Harry\", \"Slim\", \"Jim\"]\n",
+    "    plt.xlabel('entry a')\n",
+    "    plt.ylabel('entry b')\n",
+    "    plt.bar(range(len(data)), data, tick_label=labels, width=0.8, align=\"edge\")\n",
+    "    plt.show()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAGwCAYAAABIC3rIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAwV0lEQVR4nO3df1RVdb7/8dcJ5IeE54oGRxKVbmQaag52Ea83bPydRo2t0bLIykrH1CjNdGxuTFNirkmd9GbmdbT8Ed17027lDImTUWb+CKPUvGZFhSZiDR60CEg+3z9mtb8d8RcG7IOf52OtvVZn7/c557VPAa/22fscjzHGCAAAwGIXuB0AAADAbRQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrhbodoLmora3VV199pejoaHk8HrfjAACAs2CM0dGjRxUfH68LLjj1cSAK0Vn66quvlJCQ4HYMAABwDkpKStS+fftTbqcQnaXo6GhJ/3hBW7Vq5XIaAABwNioqKpSQkOD8HT8VCtFZ+vFtslatWlGIAABoZs50ugsnVQMAAOtRiAAAgPVcLUTZ2dnyeDwBi8/nc7YbY5Sdna34+HhFRkaqX79+2r17d8BjVFVVadKkSWrbtq2ioqKUkZGh/fv3B8yUl5crMzNTXq9XXq9XmZmZOnLkSFPsIgAAaAZcP0J0xRVX6ODBg86yc+dOZ9ucOXM0d+5cLVy4UNu3b5fP59PAgQN19OhRZyYrK0tr165Vbm6uNm3apGPHjmn48OE6fvy4MzN69GgVFRUpLy9PeXl5KioqUmZmZpPuJwAACF6un1QdGhoacFToR8YYzZ8/XzNnztSIESMkSc8995zi4uK0evVqjRs3Tn6/X0uXLtWKFSs0YMAASdLKlSuVkJCgDRs2aPDgwdqzZ4/y8vK0ZcsWpaamSpKWLFmitLQ07d27V507d266nQUAAEHJ9SNE+/btU3x8vBITE3XTTTfps88+kyQVFxertLRUgwYNcmbDw8OVnp6uzZs3S5IKCwtVU1MTMBMfH6/k5GRn5t1335XX63XKkCT17t1bXq/XmTmZqqoqVVRUBCwAAOD85GohSk1N1fPPP6/XX39dS5YsUWlpqfr06aNvvvlGpaWlkqS4uLiA+8TFxTnbSktLFRYWptatW592JjY2ts5zx8bGOjMnk5OT45xz5PV6+VBGAADOY64WoqFDh+rGG29Ut27dNGDAAK1bt07SP94a+9GJnxtgjDnjZwmcOHOy+TM9zowZM+T3+52lpKTkrPYJAAA0P66/ZfZTUVFR6tatm/bt2+ecV3TiUZyysjLnqJHP51N1dbXKy8tPO3Po0KE6z3X48OE6R59+Kjw83PkQRj6MEQCA81tQFaKqqirt2bNH7dq1U2Jionw+n/Lz853t1dXVKigoUJ8+fSRJKSkpatGiRcDMwYMHtWvXLmcmLS1Nfr9f27Ztc2a2bt0qv9/vzAAAALu5epXZ1KlTdd1116lDhw4qKyvTY489poqKCo0ZM0Yej0dZWVmaNWuWkpKSlJSUpFmzZqlly5YaPXq0JMnr9Wrs2LGaMmWK2rRpo5iYGE2dOtV5C06SunTpoiFDhujuu+/W4sWLJUn33HOPhg8fzhVmAABAksuFaP/+/br55pv19ddf66KLLlLv3r21ZcsWdezYUZI0bdo0VVZWasKECSovL1dqaqrWr18f8AVt8+bNU2hoqEaOHKnKykr1799fy5cvV0hIiDOzatUqTZ482bkaLSMjQwsXLmzanQUAAEHLY4wxbodoDioqKuT1euX3+zmfCACAZuJs/34H1TlEAAAAbqAQAQAA67n+1R1oPjpNX+d2BH0+e5jbEQAA5yGOEAEAAOtRiAAAgPV4yywIBMNbURJvRwEA7MURIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9YKmEOXk5Mjj8SgrK8tZZ4xRdna24uPjFRkZqX79+mn37t0B96uqqtKkSZPUtm1bRUVFKSMjQ/v37w+YKS8vV2Zmprxer7xerzIzM3XkyJEm2CsAANAcBEUh2r59u5599ll17949YP2cOXM0d+5cLVy4UNu3b5fP59PAgQN19OhRZyYrK0tr165Vbm6uNm3apGPHjmn48OE6fvy4MzN69GgVFRUpLy9PeXl5KioqUmZmZpPtHwAACG6uF6Jjx47plltu0ZIlS9S6dWtnvTFG8+fP18yZMzVixAglJyfrueee03fffafVq1dLkvx+v5YuXaonn3xSAwYMUM+ePbVy5Urt3LlTGzZskCTt2bNHeXl5+s///E+lpaUpLS1NS5Ys0Wuvvaa9e/e6ss8AACC4uF6I7r33Xg0bNkwDBgwIWF9cXKzS0lINGjTIWRceHq709HRt3rxZklRYWKiampqAmfj4eCUnJzsz7777rrxer1JTU52Z3r17y+v1OjMnU1VVpYqKioAFAACcn0LdfPLc3Fzt2LFD27dvr7OttLRUkhQXFxewPi4uTl988YUzExYWFnBk6ceZH+9fWlqq2NjYOo8fGxvrzJxMTk6Ofv/739dvhwAAQLPk2hGikpIS3XfffVq5cqUiIiJOOefxeAJuG2PqrDvRiTMnmz/T48yYMUN+v99ZSkpKTvucAACg+XKtEBUWFqqsrEwpKSkKDQ1VaGioCgoK9NRTTyk0NNQ5MnTiUZyysjJnm8/nU3V1tcrLy087c+jQoTrPf/jw4TpHn34qPDxcrVq1ClgAAMD5ybVC1L9/f+3cuVNFRUXO0qtXL91yyy0qKirSJZdcIp/Pp/z8fOc+1dXVKigoUJ8+fSRJKSkpatGiRcDMwYMHtWvXLmcmLS1Nfr9f27Ztc2a2bt0qv9/vzAAAALu5dg5RdHS0kpOTA9ZFRUWpTZs2zvqsrCzNmjVLSUlJSkpK0qxZs9SyZUuNHj1akuT1ejV27FhNmTJFbdq0UUxMjKZOnapu3bo5J2l36dJFQ4YM0d13363FixdLku655x4NHz5cnTt3bsI9BgAAwcrVk6rPZNq0aaqsrNSECRNUXl6u1NRUrV+/XtHR0c7MvHnzFBoaqpEjR6qyslL9+/fX8uXLFRIS4sysWrVKkydPdq5Gy8jI0MKFC5t8fwAAQHDyGGOM2yGag4qKCnm9Xvn9/gY/n6jT9HUN+njn6vPZw067PRhynikjAAA/dbZ/v13/HCIAAAC3UYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOuFuh0AQPDqNH2d2xEkSZ/PHuZ2BADnOQoRzjvB8EecP+AA0LzwlhkAALAehQgAAFiPQgQAAKznaiFatGiRunfvrlatWqlVq1ZKS0vTX//6V2e7MUbZ2dmKj49XZGSk+vXrp927dwc8RlVVlSZNmqS2bdsqKipKGRkZ2r9/f8BMeXm5MjMz5fV65fV6lZmZqSNHjjTFLgIAgGbA1ULUvn17zZ49W++9957ee+89/fKXv9T111/vlJ45c+Zo7ty5WrhwobZv3y6fz6eBAwfq6NGjzmNkZWVp7dq1ys3N1aZNm3Ts2DENHz5cx48fd2ZGjx6toqIi5eXlKS8vT0VFRcrMzGzy/QUAAMHJ1avMrrvuuoDbjz/+uBYtWqQtW7aoa9eumj9/vmbOnKkRI0ZIkp577jnFxcVp9erVGjdunPx+v5YuXaoVK1ZowIABkqSVK1cqISFBGzZs0ODBg7Vnzx7l5eVpy5YtSk1NlSQtWbJEaWlp2rt3rzp37nzSbFVVVaqqqnJuV1RUNMZLAAAAgkDQnEN0/Phx5ebm6ttvv1VaWpqKi4tVWlqqQYMGOTPh4eFKT0/X5s2bJUmFhYWqqakJmImPj1dycrIz8+6778rr9TplSJJ69+4tr9frzJxMTk6O8xab1+tVQkJCQ+8yAAAIEq4Xop07d+rCCy9UeHi4xo8fr7Vr16pr164qLS2VJMXFxQXMx8XFOdtKS0sVFham1q1bn3YmNja2zvPGxsY6MyczY8YM+f1+ZykpKflZ+wkAAIKX6x/M2LlzZxUVFenIkSN66aWXNGbMGBUUFDjbPR5PwLwxps66E504c7L5Mz1OeHi4wsPDz3Y3AABAM+b6EaKwsDBdeuml6tWrl3JyctSjRw/96U9/ks/nk6Q6R3HKysqco0Y+n0/V1dUqLy8/7cyhQ4fqPO/hw4frHH0CAAB2cr0QncgYo6qqKiUmJsrn8yk/P9/ZVl1drYKCAvXp00eSlJKSohYtWgTMHDx4ULt27XJm0tLS5Pf7tW3bNmdm69at8vv9zgwAALCbq2+Z/fa3v9XQoUOVkJCgo0ePKjc3V2+++aby8vLk8XiUlZWlWbNmKSkpSUlJSZo1a5Zatmyp0aNHS5K8Xq/Gjh2rKVOmqE2bNoqJidHUqVPVrVs356qzLl26aMiQIbr77ru1ePFiSdI999yj4cOHn/IKMwAAYBdXC9GhQ4eUmZmpgwcPyuv1qnv37srLy9PAgQMlSdOmTVNlZaUmTJig8vJypaamav369YqOjnYeY968eQoNDdXIkSNVWVmp/v37a/ny5QoJCXFmVq1apcmTJztXo2VkZGjhwoVNu7MAACBouVqIli5detrtHo9H2dnZys7OPuVMRESEFixYoAULFpxyJiYmRitXrjzXmAAA4DwXdOcQAQAANLWfVYhKSkrqfG8YAABAc1PvQvTDDz/od7/7nbxerzp16qSOHTvK6/Xq4YcfVk1NTWNkBAAAaFT1Podo4sSJWrt2rebMmaO0tDRJ//h6jOzsbH399dd65plnGjwkAABAY6p3IXrhhReUm5uroUOHOuu6d++uDh066KabbqIQAQCAZqfeb5lFRESoU6dOddZ36tRJYWFhDZEJAACgSdW7EN177736wx/+oKqqKmddVVWVHn/8cU2cOLFBwwEAADSFs3rLbMSIEQG3N2zYoPbt26tHjx6SpA8++EDV1dXq379/wycEAABoZGdViLxeb8DtG2+8MeB2QkJCwyUCAABoYmdViJYtW9bYOQAAAFzDJ1UDAADrUYgAAID1KEQAAMB6FCIAAGC9ehei4uLixsgBAADgmnoXoksvvVTXXHONVq5cqe+//74xMgEAADSpeheiDz74QD179tSUKVPk8/k0btw4bdu2rTGyAQAANIl6F6Lk5GTNnTtXBw4c0LJly1RaWqq+ffvqiiuu0Ny5c3X48OHGyAkAANBozvmk6tDQUP3qV7/Sf/3Xf+mJJ57Qp59+qqlTp6p9+/a67bbbdPDgwYbMCQAA0GjOuRC99957mjBhgtq1a6e5c+dq6tSp+vTTT/XGG2/owIEDuv766xsyJwAAQKM5q6/u+Km5c+dq2bJl2rt3r6699lo9//zzuvbaa3XBBf/oVomJiVq8eLEuv/zyBg8LAADQGOpdiBYtWqQ777xTd9xxh3w+30lnOnTooKVLl/7scAAAAE2hXm+Z/fDDD7rlllt06623nrIMSVJYWJjGjBnzs8MBAAA0hXoVotDQUD355JM6fvx4Y+UBAABocvU+qbp///568803GyEKAACAO+p9DtHQoUM1Y8YM7dq1SykpKYqKigrYnpGR0WDhAAAAmkK9C9FvfvMbSf+42uxEHo+Ht9MAAECzU+9CVFtb2xg5AAAAXFPvQvT8889r1KhRCg8PD1hfXV2t3Nxc3XbbbQ0WDjifdZq+zu0I+nz2MLcjAEBQqPdJ1XfccYf8fn+d9UePHtUdd9zRIKEAAACaUr0LkTFGHo+nzvr9+/fL6/U2SCgAAICmdNZvmfXs2VMej0cej0f9+/dXaOj/v+vx48dVXFysIUOGNEpIAACAxnTWheiGG26QJBUVFWnw4MG68MILnW1hYWHq1KmTbrzxxgYPCAAA0NjOuhA98sgjkqROnTpp1KhRioiIaLRQAAAATaneV5n9+B1l1dXVKisrq3MZfocOHRomGQAAQBOpdyHat2+f7rzzTm3evDlg/Y8nW/PBjAAAoLmpdyG6/fbbFRoaqtdee03t2rU76RVnAAAAzUm9C1FRUZEKCwt1+eWXN0YeAACAJlfvQtS1a1d9/fXXjZEFAM4Jn/oN4Oeq9wczPvHEE5o2bZrefPNNffPNN6qoqAhYAAAAmpt6HyEaMGCAJKl///4B6zmpGgAANFf1LkQbN25sjBwAAACuqXchSk9Pb4wcAAAArqn3OUSS9Pbbb+vWW29Vnz59dODAAUnSihUrtGnTpgYNBwAA0BTqXYheeuklDR48WJGRkdqxY4eqqqokSUePHtWsWbMaPCAAAEBjq3cheuyxx/TMM89oyZIlatGihbO+T58+2rFjR4OGAwAAaAr1LkR79+7V1VdfXWd9q1atdOTIkYbIBAAA0KTqXYjatWunTz75pM76TZs26ZJLLmmQUAAAAE2p3oVo3Lhxuu+++7R161Z5PB599dVXWrVqlaZOnaoJEyY0RkYAAIBGVe/L7qdNmya/369rrrlG33//va6++mqFh4dr6tSpmjhxYmNkBAAAaFT1LkSS9Pjjj2vmzJn66KOPVFtbq65du+rCCy9s6GwAAABN4pwKkSS1bNlSvXr1asgsAAAArjinD2YEAAA4n1CIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADruVqIcnJydNVVVyk6OlqxsbG64YYbtHfv3oAZY4yys7MVHx+vyMhI9evXT7t37w6Yqaqq0qRJk9S2bVtFRUUpIyND+/fvD5gpLy9XZmamvF6vvF6vMjMzdeTIkcbeRQAA0Ay4WogKCgp07733asuWLcrPz9cPP/ygQYMG6dtvv3Vm5syZo7lz52rhwoXavn27fD6fBg4cqKNHjzozWVlZWrt2rXJzc7Vp0yYdO3ZMw4cP1/Hjx52Z0aNHq6ioSHl5ecrLy1NRUZEyMzObdH8BAEBwOufvMmsIeXl5AbeXLVum2NhYFRYW6uqrr5YxRvPnz9fMmTM1YsQISdJzzz2nuLg4rV69WuPGjZPf79fSpUu1YsUKDRgwQJK0cuVKJSQkaMOGDRo8eLD27NmjvLw8bdmyRampqZKkJUuWKC0tTXv37lXnzp3rZKuqqlJVVZVzu6KiorFeBgAA4LKgOofI7/dLkmJiYiRJxcXFKi0t1aBBg5yZ8PBwpaena/PmzZKkwsJC1dTUBMzEx8crOTnZmXn33Xfl9XqdMiRJvXv3ltfrdWZOlJOT47y95vV6lZCQ0LA7CwAAgkbQFCJjjB544AH17dtXycnJkqTS0lJJUlxcXMBsXFycs620tFRhYWFq3br1aWdiY2PrPGdsbKwzc6IZM2bI7/c7S0lJyc/bQQAAELRcfcvspyZOnKgPP/xQmzZtqrPN4/EE3DbG1Fl3ohNnTjZ/uscJDw9XeHj42UQHAADNXFAcIZo0aZJeeeUVbdy4Ue3bt3fW+3w+SapzFKesrMw5auTz+VRdXa3y8vLTzhw6dKjO8x4+fLjO0ScAAGAfVwuRMUYTJ07UmjVr9MYbbygxMTFge2Jionw+n/Lz85111dXVKigoUJ8+fSRJKSkpatGiRcDMwYMHtWvXLmcmLS1Nfr9f27Ztc2a2bt0qv9/vzAAAAHu5+pbZvffeq9WrV+t///d/FR0d7RwJ8nq9ioyMlMfjUVZWlmbNmqWkpCQlJSVp1qxZatmypUaPHu3Mjh07VlOmTFGbNm0UExOjqVOnqlu3bs5VZ126dNGQIUN09913a/HixZKke+65R8OHDz/pFWYAAMAurhaiRYsWSZL69esXsH7ZsmW6/fbbJUnTpk1TZWWlJkyYoPLycqWmpmr9+vWKjo525ufNm6fQ0FCNHDlSlZWV6t+/v5YvX66QkBBnZtWqVZo8ebJzNVpGRoYWLlzYuDsIAACaBVcLkTHmjDMej0fZ2dnKzs4+5UxERIQWLFigBQsWnHImJiZGK1euPJeYAADgPBcUJ1UDAAC4iUIEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANYLdTsAANii0/R1bkfQ57OHuR0BCEocIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANZztRC99dZbuu666xQfHy+Px6OXX345YLsxRtnZ2YqPj1dkZKT69eun3bt3B8xUVVVp0qRJatu2raKiopSRkaH9+/cHzJSXlyszM1Ner1der1eZmZk6cuRII+8dAABoLlwtRN9++6169OihhQsXnnT7nDlzNHfuXC1cuFDbt2+Xz+fTwIEDdfToUWcmKytLa9euVW5urjZt2qRjx45p+PDhOn78uDMzevRoFRUVKS8vT3l5eSoqKlJmZmaj7x8AAGgeQt188qFDh2ro0KEn3WaM0fz58zVz5kyNGDFCkvTcc88pLi5Oq1ev1rhx4+T3+7V06VKtWLFCAwYMkCStXLlSCQkJ2rBhgwYPHqw9e/YoLy9PW7ZsUWpqqiRpyZIlSktL0969e9W5c+em2VkAABC0gvYcouLiYpWWlmrQoEHOuvDwcKWnp2vz5s2SpMLCQtXU1ATMxMfHKzk52Zl599135fV6nTIkSb1795bX63VmTqaqqkoVFRUBCwAAOD8FbSEqLS2VJMXFxQWsj4uLc7aVlpYqLCxMrVu3Pu1MbGxsncePjY11Zk4mJyfHOefI6/UqISHhZ+0PAAAIXkFbiH7k8XgCbhtj6qw70YkzJ5s/0+PMmDFDfr/fWUpKSuqZHAAANBdBW4h8Pp8k1TmKU1ZW5hw18vl8qq6uVnl5+WlnDh06VOfxDx8+XOfo00+Fh4erVatWAQsAADg/BW0hSkxMlM/nU35+vrOuurpaBQUF6tOnjyQpJSVFLVq0CJg5ePCgdu3a5cykpaXJ7/dr27ZtzszWrVvl9/udGQAAYDdXrzI7duyYPvnkE+d2cXGxioqKFBMTow4dOigrK0uzZs1SUlKSkpKSNGvWLLVs2VKjR4+WJHm9Xo0dO1ZTpkxRmzZtFBMTo6lTp6pbt27OVWddunTRkCFDdPfdd2vx4sWSpHvuuUfDhw/nCjMAOIlO09e5HUGfzx7mdgRYxtVC9N577+maa65xbj/wwAOSpDFjxmj58uWaNm2aKisrNWHCBJWXlys1NVXr169XdHS0c5958+YpNDRUI0eOVGVlpfr376/ly5crJCTEmVm1apUmT57sXI2WkZFxys8+AgAA9nG1EPXr10/GmFNu93g8ys7OVnZ29ilnIiIitGDBAi1YsOCUMzExMVq5cuXPiQoAAM5jQXsOEQAAQFOhEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD1XrzIDAADu47OnOEIEAABAIQIAAKAQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGC9ULcDAABQX52mr3M7giTp89nD3I6ABsIRIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHqhbgcAAOB81Wn6Orcj6PPZw9yO0CxwhAgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPUoRAAAwHoUIgAAYD0KEQAAsB6FCAAAWI9CBAAArEchAgAA1qMQAQAA61GIAACA9ShEAADAehQiAABgPasK0dNPP63ExERFREQoJSVFb7/9ttuRAABAELCmEL344ovKysrSzJkz9f777+vf/u3fNHToUH355ZduRwMAAC6zphDNnTtXY8eO1V133aUuXbpo/vz5SkhI0KJFi9yOBgAAXBbqdoCmUF1drcLCQk2fPj1g/aBBg7R58+aT3qeqqkpVVVXObb/fL0mqqKho8Hy1Vd81+GOeizPtWzDkPJvXn5xnrzn8O5eaR87z5d+51DxyBkNGqXnkPF/+nf/cxzXGnH7QWODAgQNGknnnnXcC1j/++OPmsssuO+l9HnnkESOJhYWFhYWF5TxYSkpKTtsVrDhC9COPxxNw2xhTZ92PZsyYoQceeMC5XVtbq7///e9q06bNKe9zLioqKpSQkKCSkhK1atWqwR63oZGzYTWHnM0ho0TOhtYccjaHjBI5g4UxRkePHlV8fPxp56woRG3btlVISIhKS0sD1peVlSkuLu6k9wkPD1d4eHjAun/6p39qrIhq1apVs/gPkZwNqznkbA4ZJXI2tOaQszlklMgZDLxe7xlnrDipOiwsTCkpKcrPzw9Yn5+frz59+riUCgAABAsrjhBJ0gMPPKDMzEz16tVLaWlpevbZZ/Xll19q/PjxbkcDAAAus6YQjRo1St98840effRRHTx4UMnJyfrLX/6ijh07uporPDxcjzzySJ2354INORtWc8jZHDJK5GxozSFnc8gokbO58RhzpuvQAAAAzm9WnEMEAABwOhQiAABgPQoRAACwHoUIAABYj0LksqefflqJiYmKiIhQSkqK3n77bdey5OTk6KqrrlJ0dLRiY2N1ww03aO/evQEzxhhlZ2crPj5ekZGR6tevn3bv3u1S4n/IycmRx+NRVlaWsy5Ych44cEC33nqr2rRpo5YtW+rKK69UYWFh0OT84Ycf9PDDDysxMVGRkZG65JJL9Oijj6q2ttbVjG+99Zauu+46xcfHy+Px6OWXXw7YfjaZqqqqNGnSJLVt21ZRUVHKyMjQ/v37myxnTU2NHnroIXXr1k1RUVGKj4/Xbbfdpq+++iqocp5o3Lhx8ng8mj9/flDm3LNnjzIyMuT1ehUdHa3evXvryy+/bLKcZ8p47NgxTZw4Ue3bt1dkZKS6dOlS50vEm+K1bKjf502RNWj8zK8Jw8+Qm5trWrRoYZYsWWI++ugjc99995moqCjzxRdfuJJn8ODBZtmyZWbXrl2mqKjIDBs2zHTo0MEcO3bMmZk9e7aJjo42L730ktm5c6cZNWqUadeunamoqHAl87Zt20ynTp1M9+7dzX333RdUOf/+97+bjh07mttvv91s3brVFBcXmw0bNphPPvkkaHI+9thjpk2bNua1114zxcXF5r//+7/NhRdeaObPn+9qxr/85S9m5syZ5qWXXjKSzNq1awO2n02m8ePHm4svvtjk5+ebHTt2mGuuucb06NHD/PDDD02S88iRI2bAgAHmxRdfNP/3f/9n3n33XZOammpSUlICHsPtnD+1du1a06NHDxMfH2/mzZsXdDk/+eQTExMTYx588EGzY8cO8+mnn5rXXnvNHDp0qMlyninjXXfdZf75n//ZbNy40RQXF5vFixebkJAQ8/LLLzdZRmMa7vd5U2QNFhQiF/3Lv/yLGT9+fMC6yy+/3EyfPt2lRIHKysqMJFNQUGCMMaa2ttb4fD4ze/ZsZ+b77783Xq/XPPPMM02e7+jRoyYpKcnk5+eb9PR0pxAFS86HHnrI9O3b95TbgyHnsGHDzJ133hmwbsSIEebWW28Nmown/tE5m0xHjhwxLVq0MLm5uc7MgQMHzAUXXGDy8vKaJOfJbNu2zUhy/qcnmHLu37/fXHzxxWbXrl2mY8eOAYUoWHKOGjXK+W/zZJo658kyXnHFFebRRx8NWPeLX/zCPPzww65k/NG5/D53K6tbeMvMJdXV1SosLNSgQYMC1g8aNEibN292KVUgv98vSYqJiZEkFRcXq7S0NCBzeHi40tPTXcl87733atiwYRowYEDA+mDJ+corr6hXr1769a9/rdjYWPXs2VNLliwJqpx9+/bV3/72N3388ceSpA8++ECbNm3StddeGzQZT3Q2mQoLC1VTUxMwEx8fr+TkZFd/vvx+vzwej/O9iMGSs7a2VpmZmXrwwQd1xRVX1NkeDDlra2u1bt06XXbZZRo8eLBiY2OVmpoa8JZVMOTs27evXnnlFR04cEDGGG3cuFEff/yxBg8e7GrGc/l9HgyvZ1OiELnk66+/1vHjx+t8uWxcXFydL6F1gzFGDzzwgPr27avk5GRJcnIFQ+bc3Fzt2LFDOTk5dbYFS87PPvtMixYtUlJSkl5//XWNHz9ekydP1vPPPx80OR966CHdfPPNuvzyy9WiRQv17NlTWVlZuvnmm4Mm44nOJlNpaanCwsLUunXrU840te+//17Tp0/X6NGjnS/QDJacTzzxhEJDQzV58uSTbg+GnGVlZTp27Jhmz56tIUOGaP369frVr36lESNGqKCgIGhyPvXUU+ratavat2+vsLAwDRkyRE8//bT69u3rWsZz/X0eDK9nU7LmqzuClcfjCbhtjKmzzg0TJ07Uhx9+qE2bNtXZ5nbmkpIS3XfffVq/fr0iIiJOOed2ztraWvXq1UuzZs2SJPXs2VO7d+/WokWLdNtttwVFzhdffFErV67U6tWrdcUVV6ioqEhZWVmKj4/XmDFjgiLjqZxLJrdy19TU6KabblJtba2efvrpM843Zc7CwkL96U9/0o4dO+r9nE2Z88cT/a+//nrdf//9kqQrr7xSmzdv1jPPPKP09PSgyPnUU09py5YteuWVV9SxY0e99dZbmjBhgtq1a1fnaHZTZWzo3+fB8PPfGDhC5JK2bdsqJCSkTssuKyur09ib2qRJk/TKK69o48aNat++vbPe5/NJkuuZCwsLVVZWppSUFIWGhio0NFQFBQV66qmnFBoa6mRxO2e7du3UtWvXgHVdunRxrogJhtfzwQcf1PTp03XTTTepW7duyszM1P333+8ceQuGjCc6m0w+n0/V1dUqLy8/5UxTqamp0ciRI1VcXKz8/Hzn6FCw5Hz77bdVVlamDh06OD9PX3zxhaZMmaJOnToFTc62bdsqNDT0jD9TbuasrKzUb3/7W82dO1fXXXedunfvrokTJ2rUqFH64x//6ErGn/P73O3Xs6lRiFwSFhamlJQU5efnB6zPz89Xnz59XMlkjNHEiRO1Zs0avfHGG0pMTAzYnpiYKJ/PF5C5urpaBQUFTZq5f//+2rlzp4qKipylV69euuWWW1RUVKRLLrkkKHL+67/+a53LXD/++GPnC4WD4fX87rvvdMEFgb8GQkJCnP8bD4aMJzqbTCkpKWrRokXAzMGDB7Vr164mzf1jGdq3b582bNigNm3aBGwPhpyZmZn68MMPA36e4uPj9eCDD+r1118PmpxhYWG66qqrTvsz5XbOmpoa1dTUnPZnqqkyNsTvc7dfzybX1Gdx4//78bL7pUuXmo8++shkZWWZqKgo8/nnn7uS5ze/+Y3xer3mzTffNAcPHnSW7777zpmZPXu28Xq9Zs2aNWbnzp3m5ptvdvWy+x/99CozY4Ij57Zt20xoaKh5/PHHzb59+8yqVatMy5YtzcqVK4Mm55gxY8zFF1/sXHa/Zs0a07ZtWzNt2jRXMx49etS8//775v333zeSzNy5c83777/vXJ11NpnGjx9v2rdvbzZs2GB27NhhfvnLXzb45cKny1lTU2MyMjJM+/btTVFRUcDPVFVVVdDkPJkTrzILlpxr1qwxLVq0MM8++6zZt2+fWbBggQkJCTFvv/12k+U8U8b09HRzxRVXmI0bN5rPPvvMLFu2zERERJinn366yTIa03C/z5sia7CgELnsP/7jP0zHjh1NWFiY+cUvfuFcEukGSSddli1b5szU1taaRx55xPh8PhMeHm6uvvpqs3PnTtcy/+jEQhQsOV999VWTnJxswsPDzeWXX26effbZgO1u56yoqDD33Xef6dChg4mIiDCXXHKJmTlzZsAfbDcybty48aT/LY4ZM+asM1VWVpqJEyeamJgYExkZaYYPH26+/PLLJstZXFx8yp+pjRs3Bk3OkzlZIQqWnEuXLjWXXnqpiYiIMD169Aj4fJ+myHmmjAcPHjS33367iY+PNxEREaZz587mySefNLW1tU2W0ZiG+33eFFmDhccYYxr+uBMAAEDzwTlEAADAehQiAABgPQoRAACwHoUIAABYj0IEAACsRyECAADWoxABAADrUYgAAID1KEQAAMB6FCIA1sjOztaVV17pdgwAQYhCBAAnqKmpcTsCgCZGIQLQLBhjNGfOHF1yySWKjIxUjx499D//8z/O9jfffFMej0d/+9vf1KtXL7Vs2VJ9+vTR3r17JUnLly/X73//e33wwQfyeDzyeDxavny5JMnj8eiZZ57R9ddfr6ioKD322GO69NJL9cc//jEgw65du3TBBRfo008/PWnG7du3a+DAgWrbtq28Xq/S09O1Y8eOxnlBADQovtwVQLMwc+ZMrVmzRvPnz1dSUpLeeustjR8/Xq+//rrS09P15ptv6pprrlFqaqqeeOIJXXTRRRo/fryOHz+ud955R5WVlfrd736nvLw8bdiwQZLk9XoVGRkpj8ej2NhY5eTkqF+/fgoJCdGqVau0atUq7d6928nwwAMPqLCwUAUFBSfN+MYbb+irr75SSkqKJOnJJ5/Ua6+9pn379ik6OrrxXyQA584AQJA7duyYiYiIMJs3bw5YP3bsWHPzzTcbY4zZuHGjkWQ2bNjgbF+3bp2RZCorK40xxjzyyCOmR48edR5fksnKygpY99VXX5mQkBCzdetWY4wx1dXV5qKLLjLLly8/69w//PCDiY6ONq+++upZ3weAO3jLDEDQ++ijj/T9999r4MCBuvDCC53l+eefr/P2Vffu3Z1/bteunSSprKzsjM/Rq1evgNvt2rXTsGHD9Oc//1mS9Nprr+n777/Xr3/961M+RllZmcaPH6/LLrtMXq9XXq9Xx44d05dffnnW+wrAHaFuBwCAM6mtrZUkrVu3ThdffHHAtvDw8IDbLVq0cP7Z4/EE3P90oqKi6qy76667lJmZqXnz5mnZsmUaNWqUWrZsecrHuP3223X48GHNnz9fHTt2VHh4uNLS0lRdXX3G5wfgLgoRgKDXtWtXhYeH68svv1R6evo5P05YWJiOHz9+1vPXXnutoqKitGjRIv31r3/VW2+9ddr5t99+W08//bSuvfZaSVJJSYm+/vrrc84LoOlQiAAEvejoaE2dOlX333+/amtr1bdvX1VUVGjz5s268MILNWbMmLN6nE6dOqm4uFhFRUVq3769oqOj6xxh+qmQkBDdfvvtmjFjhi699FKlpaWd9vEvvfRSrVixQr169VJFRYUefPBBRUZG1mtfAbiDc4gANAt/+MMf9O///u/KyclRly5dNHjwYL366qtKTEw868e48cYbNWTIEF1zzTW66KKL9MILL5zxPmPHjlV1dbXuvPPOM87++c9/Vnl5uXr27KnMzExNnjxZsbGxZ50PgHu47B4ATuOdd95Rv379tH//fsXFxbkdB0AjoRABwElUVVWppKRE99xzj9q1a6dVq1a5HQlAI+ItMwA4iRdeeEGdO3eW3+/XnDlz3I4DoJFxhAgAAFiPI0QAAMB6FCIAAGA9ChEAALAehQgAAFiPQgQAAKxHIQIAANajEAEAAOtRiAAAgPX+HydTxgFmlEpCAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "labels, data = count_height()\n",
+    "draw_hist(data, labels)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "bonai_env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tools/bonai/transform_gt_to_csv.py b/tools/bonai/transform_gt_to_csv.py
new file mode 100644
index 00000000..ff79903c
--- /dev/null
+++ b/tools/bonai/transform_gt_to_csv.py
@@ -0,0 +1,55 @@
+import json
+import pandas as pd
+
+
+def save_csv(fileName, saveDict):
+    df = pd.DataFrame(saveDict)
+    df.to_csv(fileName, index=False, header=True)
+
+
+def process_points(points):
+    # points = points[0]
+    start = "{} {},".format(points[0], points[1])
+    end = "{} {}".format(points[0], points[1])
+    result_str = ""
+    for i in range(len(points)):
+        if i % 2 == 0:
+            result_str += str(points[i])
+            result_str += " "
+        else:
+            result_str += str(points[i])
+            result_str += ","
+    result_str = result_str + start + start + end
+    result_str = "(({}))".format(result_str)
+    return result_str
+
+
+gt_data_path = "./bonai_shanghai_xian_test_roof.json"
+json_file = open(gt_data_path, "r")
+content = json_file.read()
+json_content = json.loads(content)
+annotations = json_content["annotations"]
+images = json_content["images"]
+images_ids = []
+images_filenames = []
+results = []
+for item in images:
+    images_ids.append(item["id"])
+    images_filenames.append(item["file_name"])
+id_2_filename = dict(zip(images_ids, images_filenames))
+for image_id in images_ids:
+    building_index = 0
+    for ann in annotations:
+        if ann["area"] > 500 and ann["image_id"] == image_id:
+            points = ann["roof_mask"]
+            points = process_points(points)
+            result = dict(
+                ImageId=id_2_filename[image_id],
+                BuildingId=building_index,
+                PolygonWKT_Pix=f"POLYGON {points}",
+                Confidence=1,
+            )
+            results.append(result)
+            building_index += 1
+
+save_csv("./test.csv", results)
\ No newline at end of file
diff --git a/tools/coco_error_analysis_f1.py b/tools/coco_error_analysis_f1.py
new file mode 100644
index 00000000..9203e291
--- /dev/null
+++ b/tools/coco_error_analysis_f1.py
@@ -0,0 +1,238 @@
+import copy
+import os
+from argparse import ArgumentParser
+from multiprocessing import Pool
+
+import matplotlib.pyplot as plt
+import numpy as np
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+
+
+def makeplot(rs, ps, outDir, class_name, iou_type):
+    cs = np.vstack(
+        [
+            np.ones((2, 3)),
+            np.array([0.31, 0.51, 0.74]),
+            np.array([0.75, 0.31, 0.30]),
+            np.array([0.36, 0.90, 0.38]),
+            np.array([0.50, 0.39, 0.64]),
+            np.array([1, 0.6, 0]),
+        ]
+    )
+    areaNames = ["allarea", "small", "medium", "large"]
+    types = ["C75", "C50", "Loc", "Sim", "Oth", "BG", "FN"]
+    for i in range(len(areaNames)):
+        area_ps = ps[..., i, 0]
+        figure_tile = iou_type + "-" + class_name + "-" + areaNames[i]
+        aps = [ps_.mean() for ps_ in area_ps]
+        ps_curve = [ps_.mean(axis=1) if ps_.ndim > 1 else ps_ for ps_ in area_ps]
+        ps_curve.insert(0, np.zeros(ps_curve[0].shape))
+        fig = plt.figure()
+        ax = plt.subplot(111)
+        # f = open("precision.txt", 'a')
+        # sps = str(ps_curve)
+        # f.write(sps)
+        for k in range(len(types)):
+            ax.plot(rs, ps_curve[k + 1], color=[0, 0, 0], linewidth=0.5)
+            ax.fill_between(
+                rs,
+                ps_curve[k],
+                ps_curve[k + 1],
+                color=cs[k],
+                label=str(f"[{aps[k]:.3f}]" + types[k]),
+            )
+        plt.xlabel("recall")
+        plt.ylabel("precision")
+        plt.xlim(0, 1.0)
+        plt.ylim(0, 1.0)
+        plt.title(figure_tile)
+        plt.legend()
+        # plt.show()
+        fig.savefig(outDir + f"/{figure_tile}.png")
+        plt.close(fig)
+
+
+# calcaulate F1 score
+def makef1plot(rs, ps, outDir, class_name, iou_type):
+    cs = np.vstack(
+        [
+            np.ones((2, 3)),
+            np.array([0.31, 0.51, 0.74]),
+            np.array([0.75, 0.31, 0.30]),
+            np.array([0.36, 0.90, 0.38]),
+            np.array([0.50, 0.39, 0.64]),
+            np.array([1, 0.6, 0]),
+        ]
+    )
+    areaNames = ["allarea", "small", "medium", "large"]
+    types = ["C75", "C50", "Loc", "Sim", "Oth", "BG", "FN"]
+    for i in range(len(areaNames)):
+        area_ps = ps[..., i, 0]
+        figure_tile = iou_type + "-" + class_name + "-" + areaNames[i] + "- F1"
+        aps = [ps_.mean() for ps_ in area_ps]
+        ps_curve = [ps_.mean(axis=1) if ps_.ndim > 1 else ps_ for ps_ in area_ps]
+        ps_curve.insert(0, np.zeros(ps_curve[0].shape))
+        fig = plt.figure()
+        ax = plt.subplot(111)
+        for k in range(len(types)):
+            psarray = ps_curve[k + 1]
+            for count in range(len(ps_curve[k + 1])):
+                psarray[count] = (
+                    2 * rs[count] * psarray[count] / (rs[count] + psarray[count] + 1e-6)
+                )
+            ax.plot(rs, psarray, color=[0, 0, 0], linewidth=0.5)
+            maxf1precision = max(psarray)
+            inds = psarray.argmax()
+            maxf1recall = rs[inds]
+            input = (
+                areaNames[i]
+                + " "
+                + types[k]
+                + ": "
+                + "precision: "
+                + str(maxf1precision)
+                + ", recall:"
+                + str(maxf1recall)
+                + "\n"
+            )
+            f = open(outDir + "/" + "maxF1score.txt", "a")
+            f.write(input)
+            ps_curve[k + 1] = psarray
+            ax.fill_between(
+                rs,
+                ps_curve[k],
+                ps_curve[k + 1],
+                color=cs[k],
+                label=str(f"[{aps[k]:.3f}]" + types[k]),
+            )
+        plt.xlabel("recall")
+        plt.ylabel("F1")
+        plt.xlim(0, 1.0)
+        plt.ylim(0, 1.0)
+        plt.title(figure_tile)
+        plt.legend()
+        # plt.show()
+        fig.savefig(outDir + f"/{figure_tile}.png")
+        plt.close(fig)
+
+
+def analyze_individual_category(k, cocoDt, cocoGt, catId, iou_type):
+    nm = cocoGt.loadCats(catId)[0]
+    print(f'--------------analyzing {k + 1}-{nm["name"]}---------------')
+    ps_ = {}
+    dt = copy.deepcopy(cocoDt)
+    nm = cocoGt.loadCats(catId)[0]
+    imgIds = cocoGt.getImgIds()
+    dt_anns = dt.dataset["annotations"]
+    select_dt_anns = []
+    for ann in dt_anns:
+        if ann["category_id"] == catId:
+            select_dt_anns.append(ann)
+    dt.dataset["annotations"] = select_dt_anns
+    dt.createIndex()
+    # compute precision but ignore superclass confusion
+    gt = copy.deepcopy(cocoGt)
+    child_catIds = gt.getCatIds(supNms=[nm["supercategory"]])
+    for idx, ann in enumerate(gt.dataset["annotations"]):
+        if ann["category_id"] in child_catIds and ann["category_id"] != catId:
+            gt.dataset["annotations"][idx]["ignore"] = 1
+            gt.dataset["annotations"][idx]["iscrowd"] = 1
+            gt.dataset["annotations"][idx]["category_id"] = catId
+    cocoEval = COCOeval(gt, copy.deepcopy(dt), iou_type)
+    cocoEval.params.imgIds = imgIds
+    cocoEval.params.maxDets = [1500]
+    cocoEval.params.iouThrs = [0.3]
+    cocoEval.params.useCats = 1
+    cocoEval.evaluate()
+    cocoEval.accumulate()
+    ps_supercategory = cocoEval.eval["precision"][0, :, k, :, :]
+    ps_["ps_supercategory"] = ps_supercategory
+    # compute precision but ignore any class confusion
+    gt = copy.deepcopy(cocoGt)
+    for idx, ann in enumerate(gt.dataset["annotations"]):
+        if ann["category_id"] != catId:
+            gt.dataset["annotations"][idx]["ignore"] = 1
+            gt.dataset["annotations"][idx]["iscrowd"] = 1
+            gt.dataset["annotations"][idx]["category_id"] = catId
+    cocoEval = COCOeval(gt, copy.deepcopy(dt), iou_type)
+    cocoEval.params.imgIds = imgIds
+    cocoEval.params.maxDets = [1500]
+    cocoEval.params.iouThrs = [0.3]
+    cocoEval.params.useCats = 1
+    cocoEval.evaluate()
+    cocoEval.accumulate()
+    ps_allcategory = cocoEval.eval["precision"][0, :, k, :, :]
+    ps_["ps_allcategory"] = ps_allcategory
+    return k, ps_
+
+
+def analyze_results(res_file, ann_file, res_types, out_dir):
+    for res_type in res_types:
+        assert res_type in ["bbox", "segm"]
+
+    directory = os.path.dirname(out_dir + "/")
+    if not os.path.exists(directory):
+        print(f"-------------create {out_dir}-----------------")
+        os.makedirs(directory)
+
+    cocoGt = COCO(ann_file)
+    cocoDt = cocoGt.loadRes(res_file)
+    imgIds = cocoGt.getImgIds()
+    for res_type in res_types:
+        res_out_dir = out_dir + "/" + res_type + "/"
+        res_directory = os.path.dirname(res_out_dir)
+        if not os.path.exists(res_directory):
+            print(f"-------------create {res_out_dir}-----------------")
+            os.makedirs(res_directory)
+        iou_type = res_type
+        cocoEval = COCOeval(copy.deepcopy(cocoGt), copy.deepcopy(cocoDt), iou_type)
+        cocoEval.params.imgIds = imgIds
+        cocoEval.params.iouThrs = [0.75, 0.5, 0.3]
+        cocoEval.params.maxDets = [1500]
+        cocoEval.evaluate()
+        cocoEval.accumulate()
+        ps = cocoEval.eval["precision"]
+        ps = np.vstack([ps, np.zeros((4, *ps.shape[1:]))])
+        catIds = cocoGt.getCatIds()
+        recThrs = cocoEval.params.recThrs
+        with Pool(processes=48) as pool:
+            args = [(k, cocoDt, cocoGt, catId, iou_type) for k, catId in enumerate(catIds)]
+            analyze_results = pool.starmap(analyze_individual_category, args)
+        for k, catId in enumerate(catIds):
+            nm = cocoGt.loadCats(catId)[0]
+            print(f'--------------saving {k + 1}-{nm["name"]}---------------')
+            analyze_result = analyze_results[k]
+            assert k == analyze_result[0]
+            ps_supercategory = analyze_result[1]["ps_supercategory"]
+            ps_allcategory = analyze_result[1]["ps_allcategory"]
+            # compute precision but ignore superclass confusion
+            ps[3, :, k, :, :] = ps_supercategory
+            # compute precision but ignore any class confusion
+            ps[4, :, k, :, :] = ps_allcategory
+            # fill in background and false negative errors and plot
+            ps[ps == -1] = 0
+            ps[5, :, k, :, :] = ps[4, :, k, :, :] > 0
+            ps[6, :, k, :, :] = 1.0
+            makeplot(recThrs, ps[:, :, k], res_out_dir, nm["name"], iou_type)
+        makeplot(recThrs, ps, res_out_dir, "allclass", iou_type)
+        makef1plot(recThrs, ps, res_out_dir, "allclass", iou_type)
+        """f = open("precision.txt", 'a')
+        sps=str(ps)
+        f.write(sps)"""
+
+
+def main():
+    parser = ArgumentParser(description="COCO Error Analysis Tool")
+    parser.add_argument("result", help="result file (json format) path")
+    parser.add_argument("out_dir", help="dir to save analyze result images")
+    parser.add_argument(
+        "--ann", default="data/coco/annotations/instances_val2017.json", help="annotation file path"
+    )
+    parser.add_argument("--types", type=str, nargs="+", default=["bbox"], help="result types")
+    args = parser.parse_args()
+    analyze_results(args.result, args.ann, args.types, out_dir=args.out_dir)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/dist_test.sh b/tools/dist_test.sh
index dea131b4..63faaa99 100755
--- a/tools/dist_test.sh
+++ b/tools/dist_test.sh
@@ -1,13 +1,23 @@
 #!/usr/bin/env bash
 
 CONFIG=$1
-CHECKPOINT=$2
-GPUS=$3
+TIME=$2
+
+GPUS=1
+
+JOB_DIR="work_dirs/${CONFIG}[${TIME}]"
+CONFIG_FILE="${JOB_DIR}/${CONFIG}.py" 
+CHECKPOINT="${JOB_DIR}/latest.pth"
+PKL_FILE="${JOB_DIR}/result.pkl" 
+CITY="bonai"
+
 NNODES=${NNODES:-1}
 NODE_RANK=${NODE_RANK:-0}
-PORT=${PORT:-29500}
+PORT=${PORT:-29705}
+
 MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
 
+
 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 python -m torch.distributed.launch \
     --nnodes=$NNODES \
@@ -15,8 +25,17 @@ python -m torch.distributed.launch \
     --master_addr=$MASTER_ADDR \
     --nproc_per_node=$GPUS \
     --master_port=$PORT \
-    $(dirname "$0")/test.py \
-    $CONFIG \
-    $CHECKPOINT \
+    $(dirname "$0")/bonai/bonai_test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT} \
+    --out ${PKL_FILE} \
+    --city ${CITY} \
     --launcher pytorch \
-    ${@:4}
+    ${@:3}
+
+python $(dirname "$0")/bonai/bonai_evaluation.py \
+    ${PKL_FILE} \
+    ${JOB_DIR} \
+    --city ${CITY}
+
+# ./tools/dist_test.sh loft_foahfm_r50_fpn_2x_bonai_ssl <timestamp>
\ No newline at end of file
diff --git a/tools/dist_train.sh b/tools/dist_train.sh
index aa71bf4a..6a99169d 100755
--- a/tools/dist_train.sh
+++ b/tools/dist_train.sh
@@ -1,12 +1,21 @@
 #!/usr/bin/env bash
 
-CONFIG=$1
-GPUS=$2
+MODEL=$1
+CONFIG=$2
+
+GPUS=4
 NNODES=${NNODES:-1}
 NODE_RANK=${NODE_RANK:-0}
-PORT=${PORT:-29500}
+PORT=${PORT:-29502}
+
 MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
 
+TIME=$(date "+%Y%m%d-%H%M%S")
+
+JOB_NAME=${CONFIG}[${TIME}]
+CONFIG_FILE="configs/${MODEL}/${CONFIG}.py" 
+WORK_DIR="work_dirs/${JOB_NAME}"
+
 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 python -m torch.distributed.launch \
     --nnodes=$NNODES \
@@ -15,6 +24,10 @@ python -m torch.distributed.launch \
     --nproc_per_node=$GPUS \
     --master_port=$PORT \
     $(dirname "$0")/train.py \
-    $CONFIG \
+    --config ${CONFIG_FILE} \
+    --work-dir=${WORK_DIR} \
     --seed 0 \
-    --launcher pytorch ${@:3}
+    --launcher pytorch ${@:3}\
+    --no-validate
+
+# bash ./tools/dist_train.sh loft_foahfm_ssl loft_foahfm_r50_fpn_2x_bonai_ssl
\ No newline at end of file
diff --git a/tools/slurm_test.sh b/tools/slurm_test.sh
index 6dd67e57..9c6b9274 100755
--- a/tools/slurm_test.sh
+++ b/tools/slurm_test.sh
@@ -2,23 +2,49 @@
 
 set -x
 
-PARTITION=$1
-JOB_NAME=$2
-CONFIG=$3
-CHECKPOINT=$4
-GPUS=${GPUS:-8}
-GPUS_PER_NODE=${GPUS_PER_NODE:-8}
-CPUS_PER_TASK=${CPUS_PER_TASK:-5}
-PY_ARGS=${@:5}
-SRUN_ARGS=${SRUN_ARGS:-""}
+GPUS=1
+CPUS_PER_TASK=5
+GPUS_PER_NODE=$GPUS
+SRUN_ARGS=""
+# SRUN_ARGS="--debug"
+PARTITION="PARTITION"
+
+
+CONFIG=$1
+TIME=$2
+PY_ARGS=${@:3:$#-3}
+
+EVAL_TYPE="segm"
+EPOCH_NAME="latest"
+
+JOB_NAME="${CONFIG}[${TIME}]"
+TEST_JOB_NAME="${JOB_NAME}_test"
+EVALUATE_JOB_NAME="${JOB_NAME}_evaluate"
+JOB_DIR="work_dirs/${JOB_NAME}"
+CONFIG_FILE="${JOB_DIR}/${CONFIG}.py"
+CHECKPOINT="${JOB_DIR}/${EPOCH_NAME}.pth"
+PKL_FILE="${JOB_DIR}/result.pkl"
+CITY="bonai"
+TEST_PY_ARGS="$PY_ARGS --eval $EVAL_TYPE --city $CITY --out $PKL_FILE"
+EVAL_PY_ARGS="${PKL_FILE} ${JOB_DIR} --city $CITY"
 
 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 srun -p ${PARTITION} \
-    --job-name=${JOB_NAME} \
+    --job-name=${TEST_JOB_NAME} \
     --gres=gpu:${GPUS_PER_NODE} \
     --ntasks=${GPUS} \
     --ntasks-per-node=${GPUS_PER_NODE} \
     --cpus-per-task=${CPUS_PER_TASK} \
     --kill-on-bad-exit=1 \
     ${SRUN_ARGS} \
-    python -u tools/test.py ${CONFIG} ${CHECKPOINT} --launcher="slurm" ${PY_ARGS}
+    python -u tools/bonai/bonai_test.py ${CONFIG_FILE} ${CHECKPOINT} --launcher="slurm" ${TEST_PY_ARGS}
+
+srun -p ${PARTITION} \
+    --job-name=${EVALUATE_JOB_NAME} \
+    --ntasks=1 \
+    --cpus-per-task=${CPUS_PER_TASK} \
+    --kill-on-bad-exit=1 \
+    ${SRUN_ARGS} \
+    python -u tools/bonai/bonai_evaluation.py ${EVAL_PY_ARGS}
+# ==================== The command to call this shell script ====================
+# ./tools/slurm_test.sh loft_foahfm_r50_fpn_2x_bonai_ssl <time> <node>
diff --git a/tools/slurm_train.sh b/tools/slurm_train.sh
index b3feb3d9..d04a97c9 100755
--- a/tools/slurm_train.sh
+++ b/tools/slurm_train.sh
@@ -2,15 +2,25 @@
 
 set -x
 
-PARTITION=$1
-JOB_NAME=$2
-CONFIG=$3
-WORK_DIR=$4
-GPUS=${GPUS:-8}
-GPUS_PER_NODE=${GPUS_PER_NODE:-8}
-CPUS_PER_TASK=${CPUS_PER_TASK:-5}
-SRUN_ARGS=${SRUN_ARGS:-""}
-PY_ARGS=${@:5}
+# GPUS=1
+GPUS=2
+# GPUS=4
+CPUS_PER_TASK=5
+GPUS_PER_NODE=$GPUS
+SRUN_ARGS=""
+# SRUN_ARGS="--debug"
+PARTITION="PARTITION"
+
+
+MODEL=$1
+CONFIG=$2
+PY_ARGS=${@:3:$#-3}
+
+TIME=$(date "+%Y%m%d-%H%M%S")
+
+JOB_NAME=${CONFIG}[${TIME}]
+JOB_DIR="work_dirs/${JOB_NAME}"
+CONFIG_FILE="configs/${MODEL}/${CONFIG}.py"
 
 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 srun -p ${PARTITION} \
@@ -21,4 +31,7 @@ srun -p ${PARTITION} \
     --cpus-per-task=${CPUS_PER_TASK} \
     --kill-on-bad-exit=1 \
     ${SRUN_ARGS} \
-    python -u tools/train.py ${CONFIG} --work-dir=${WORK_DIR} --launcher="slurm" ${PY_ARGS}
+    python -u tools/train.py --config=${CONFIG_FILE} --work-dir=${JOB_DIR} --launcher="slurm" ${PY_ARGS} --no-validate
+
+# ==================== The command to call this shell script ====================
+# ./tools/slurm_train.sh loft_foahfm loft_foahfm_r50_fpn_2x_bonai
\ No newline at end of file
diff --git a/tools/test.py b/tools/test.py
index 5051c2f1..6a6810c1 100644
--- a/tools/test.py
+++ b/tools/test.py
@@ -9,111 +9,120 @@
 import torch
 from mmcv import Config, DictAction
 from mmcv.cnn import fuse_conv_bn
-from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
-                         wrap_fp16_model)
+from mmcv.runner import get_dist_info, init_dist, load_checkpoint, wrap_fp16_model
 
 from mmdet.apis import multi_gpu_test, single_gpu_test
-from mmdet.datasets import (build_dataloader, build_dataset,
-                            replace_ImageToTensor)
+from mmdet.datasets import build_dataloader, build_dataset, replace_ImageToTensor
 from mmdet.models import build_detector
-from mmdet.utils import (build_ddp, build_dp, compat_cfg, get_device,
-                         replace_cfg_vals, rfnext_init_model,
-                         setup_multi_processes, update_data_root)
+from mmdet.utils import (
+    build_ddp,
+    build_dp,
+    compat_cfg,
+    get_device,
+    replace_cfg_vals,
+    rfnext_init_model,
+    setup_multi_processes,
+    update_data_root,
+)
 
 
 def parse_args():
-    parser = argparse.ArgumentParser(
-        description='MMDet test (and eval) a model')
-    parser.add_argument('config', help='test config file path')
-    parser.add_argument('checkpoint', help='checkpoint file')
+    parser = argparse.ArgumentParser(description="MMDet test (and eval) a model")
+    parser.add_argument("config", help="test config file path")
+    parser.add_argument("checkpoint", help="checkpoint file")
     parser.add_argument(
-        '--work-dir',
-        help='the directory to save the file containing evaluation metrics')
-    parser.add_argument('--out', help='output result file in pickle format')
+        "--work-dir", help="the directory to save the file containing evaluation metrics"
+    )
+    parser.add_argument("--out", help="output result file in pickle format")
     parser.add_argument(
-        '--fuse-conv-bn',
-        action='store_true',
-        help='Whether to fuse conv and bn, this will slightly increase'
-        'the inference speed')
+        "--fuse-conv-bn",
+        action="store_true",
+        help="Whether to fuse conv and bn, this will slightly increase" "the inference speed",
+    )
     parser.add_argument(
-        '--gpu-ids',
+        "--gpu-ids",
         type=int,
-        nargs='+',
-        help='(Deprecated, please use --gpu-id) ids of gpus to use '
-        '(only applicable to non-distributed training)')
+        nargs="+",
+        help="(Deprecated, please use --gpu-id) ids of gpus to use "
+        "(only applicable to non-distributed training)",
+    )
     parser.add_argument(
-        '--gpu-id',
+        "--gpu-id",
         type=int,
         default=0,
-        help='id of gpu to use '
-        '(only applicable to non-distributed testing)')
+        help="id of gpu to use " "(only applicable to non-distributed testing)",
+    )
     parser.add_argument(
-        '--format-only',
-        action='store_true',
-        help='Format the output results without perform evaluation. It is'
-        'useful when you want to format the result to a specific format and '
-        'submit it to the test server')
+        "--format-only",
+        action="store_true",
+        help="Format the output results without perform evaluation. It is"
+        "useful when you want to format the result to a specific format and "
+        "submit it to the test server",
+    )
     parser.add_argument(
-        '--eval',
+        "--eval",
         type=str,
-        nargs='+',
+        nargs="+",
         help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
-        ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
-    parser.add_argument('--show', action='store_true', help='show results')
+        ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC',
+    )
+    parser.add_argument("--show", action="store_true", help="show results")
+    parser.add_argument("--show-dir", help="directory where painted images will be saved")
     parser.add_argument(
-        '--show-dir', help='directory where painted images will be saved')
+        "--show-score-thr", type=float, default=0.3, help="score threshold (default: 0.3)"
+    )
     parser.add_argument(
-        '--show-score-thr',
-        type=float,
-        default=0.3,
-        help='score threshold (default: 0.3)')
+        "--gpu-collect", action="store_true", help="whether to use gpu to collect results."
+    )
     parser.add_argument(
-        '--gpu-collect',
-        action='store_true',
-        help='whether to use gpu to collect results.')
+        "--tmpdir",
+        help="tmp directory used for collecting results from multiple "
+        "workers, available when gpu-collect is not specified",
+    )
     parser.add_argument(
-        '--tmpdir',
-        help='tmp directory used for collecting results from multiple '
-        'workers, available when gpu-collect is not specified')
-    parser.add_argument(
-        '--cfg-options',
-        nargs='+',
+        "--cfg-options",
+        nargs="+",
         action=DictAction,
-        help='override some settings in the used config, the key-value pair '
-        'in xxx=yyy format will be merged into config file. If the value to '
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file. If the value to "
         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
-        'Note that the quotation marks are necessary and that no white space '
-        'is allowed.')
+        "Note that the quotation marks are necessary and that no white space "
+        "is allowed.",
+    )
     parser.add_argument(
-        '--options',
-        nargs='+',
+        "--options",
+        nargs="+",
         action=DictAction,
-        help='custom options for evaluation, the key-value pair in xxx=yyy '
-        'format will be kwargs for dataset.evaluate() function (deprecate), '
-        'change to --eval-options instead.')
+        help="custom options for evaluation, the key-value pair in xxx=yyy "
+        "format will be kwargs for dataset.evaluate() function (deprecate), "
+        "change to --eval-options instead.",
+    )
     parser.add_argument(
-        '--eval-options',
-        nargs='+',
+        "--eval-options",
+        nargs="+",
         action=DictAction,
-        help='custom options for evaluation, the key-value pair in xxx=yyy '
-        'format will be kwargs for dataset.evaluate() function')
+        help="custom options for evaluation, the key-value pair in xxx=yyy "
+        "format will be kwargs for dataset.evaluate() function",
+    )
     parser.add_argument(
-        '--launcher',
-        choices=['none', 'pytorch', 'slurm', 'mpi'],
-        default='none',
-        help='job launcher')
-    parser.add_argument('--local_rank', type=int, default=0)
+        "--launcher",
+        choices=["none", "pytorch", "slurm", "mpi"],
+        default="none",
+        help="job launcher",
+    )
+    parser.add_argument("--local_rank", type=int, default=0)
     args = parser.parse_args()
-    if 'LOCAL_RANK' not in os.environ:
-        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    if "LOCAL_RANK" not in os.environ:
+        os.environ["LOCAL_RANK"] = str(args.local_rank)
 
     if args.options and args.eval_options:
         raise ValueError(
-            '--options and --eval-options cannot be both '
-            'specified, --options is deprecated in favor of --eval-options')
+            "--options and --eval-options cannot be both "
+            "specified, --options is deprecated in favor of --eval-options"
+        )
     if args.options:
-        warnings.warn('--options is deprecated in favor of --eval-options')
+        warnings.warn("--options is deprecated in favor of --eval-options")
         args.eval_options = args.options
     return args
 
@@ -121,17 +130,17 @@ def parse_args():
 def main():
     args = parse_args()
 
-    assert args.out or args.eval or args.format_only or args.show \
-        or args.show_dir, \
-        ('Please specify at least one operation (save/eval/format/show the '
-         'results / save the results) with the argument "--out", "--eval"'
-         ', "--format-only", "--show" or "--show-dir"')
+    assert args.out or args.eval or args.format_only or args.show or args.show_dir, (
+        "Please specify at least one operation (save/eval/format/show the "
+        'results / save the results) with the argument "--out", "--eval"'
+        ', "--format-only", "--show" or "--show-dir"'
+    )
 
     if args.eval and args.format_only:
-        raise ValueError('--eval and --format_only cannot be both specified')
+        raise ValueError("--eval and --format_only cannot be both specified")
 
-    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
-        raise ValueError('The output file must be a pkl file.')
+    if args.out is not None and not args.out.endswith((".pkl", ".pickle")):
+        raise ValueError("The output file must be a pkl file.")
 
     cfg = Config.fromfile(args.config)
 
@@ -150,68 +159,67 @@ def main():
     setup_multi_processes(cfg)
 
     # set cudnn_benchmark
-    if cfg.get('cudnn_benchmark', False):
+    if cfg.get("cudnn_benchmark", False):
         torch.backends.cudnn.benchmark = True
 
-    if 'pretrained' in cfg.model:
+    if "pretrained" in cfg.model:
         cfg.model.pretrained = None
-    elif 'init_cfg' in cfg.model.backbone:
+    elif "init_cfg" in cfg.model.backbone:
         cfg.model.backbone.init_cfg = None
 
-    if cfg.model.get('neck'):
+    if cfg.model.get("neck"):
         if isinstance(cfg.model.neck, list):
             for neck_cfg in cfg.model.neck:
-                if neck_cfg.get('rfp_backbone'):
-                    if neck_cfg.rfp_backbone.get('pretrained'):
+                if neck_cfg.get("rfp_backbone"):
+                    if neck_cfg.rfp_backbone.get("pretrained"):
                         neck_cfg.rfp_backbone.pretrained = None
-        elif cfg.model.neck.get('rfp_backbone'):
-            if cfg.model.neck.rfp_backbone.get('pretrained'):
+        elif cfg.model.neck.get("rfp_backbone"):
+            if cfg.model.neck.rfp_backbone.get("pretrained"):
                 cfg.model.neck.rfp_backbone.pretrained = None
 
     if args.gpu_ids is not None:
         cfg.gpu_ids = args.gpu_ids[0:1]
-        warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
-                      'Because we only support single GPU mode in '
-                      'non-distributed testing. Use the first GPU '
-                      'in `gpu_ids` now.')
+        warnings.warn(
+            "`--gpu-ids` is deprecated, please use `--gpu-id`. "
+            "Because we only support single GPU mode in "
+            "non-distributed testing. Use the first GPU "
+            "in `gpu_ids` now."
+        )
     else:
         cfg.gpu_ids = [args.gpu_id]
     cfg.device = get_device()
     # init distributed env first, since logger depends on the dist info.
-    if args.launcher == 'none':
+    if args.launcher == "none":
         distributed = False
     else:
         distributed = True
         init_dist(args.launcher, **cfg.dist_params)
 
     test_dataloader_default_args = dict(
-        samples_per_gpu=1, workers_per_gpu=2, dist=distributed, shuffle=False)
+        samples_per_gpu=1, workers_per_gpu=2, dist=distributed, shuffle=False
+    )
 
     # in case the test dataset is concatenated
     if isinstance(cfg.data.test, dict):
         cfg.data.test.test_mode = True
-        if cfg.data.test_dataloader.get('samples_per_gpu', 1) > 1:
+        if cfg.data.test_dataloader.get("samples_per_gpu", 1) > 1:
             # Replace 'ImageToTensor' to 'DefaultFormatBundle'
-            cfg.data.test.pipeline = replace_ImageToTensor(
-                cfg.data.test.pipeline)
+            cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
     elif isinstance(cfg.data.test, list):
         for ds_cfg in cfg.data.test:
             ds_cfg.test_mode = True
-        if cfg.data.test_dataloader.get('samples_per_gpu', 1) > 1:
+        if cfg.data.test_dataloader.get("samples_per_gpu", 1) > 1:
             for ds_cfg in cfg.data.test:
                 ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
 
-    test_loader_cfg = {
-        **test_dataloader_default_args,
-        **cfg.data.get('test_dataloader', {})
-    }
+    test_loader_cfg = {**test_dataloader_default_args, **cfg.data.get("test_dataloader", {})}
 
     rank, _ = get_dist_info()
     # allows not to create
     if args.work_dir is not None and rank == 0:
         mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
-        timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
-        json_file = osp.join(args.work_dir, f'eval_{timestamp}.json')
+        timestamp = time.strftime("%Y%m%d_%H%M%S", time.localtime())
+        json_file = osp.join(args.work_dir, f"eval_{timestamp}.json")
 
     # build the dataloader
     dataset = build_dataset(cfg.data.test)
@@ -219,59 +227,64 @@ def main():
 
     # build the model and load checkpoint
     cfg.model.train_cfg = None
-    model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
+    model = build_detector(cfg.model, test_cfg=cfg.get("test_cfg"))
     # init rfnext if 'RFSearchHook' is defined in cfg
     rfnext_init_model(model, cfg=cfg)
-    fp16_cfg = cfg.get('fp16', None)
-    if fp16_cfg is None and cfg.get('device', None) == 'npu':
-        fp16_cfg = dict(loss_scale='dynamic')
+    fp16_cfg = cfg.get("fp16", None)
+    if fp16_cfg is None and cfg.get("device", None) == "npu":
+        fp16_cfg = dict(loss_scale="dynamic")
     if fp16_cfg is not None:
         wrap_fp16_model(model)
-    checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
+    checkpoint = load_checkpoint(model, args.checkpoint, map_location="cpu")
     if args.fuse_conv_bn:
         model = fuse_conv_bn(model)
     # old versions did not save class info in checkpoints, this walkaround is
     # for backward compatibility
-    if 'CLASSES' in checkpoint.get('meta', {}):
-        model.CLASSES = checkpoint['meta']['CLASSES']
+    if "CLASSES" in checkpoint.get("meta", {}):
+        model.CLASSES = checkpoint["meta"]["CLASSES"]
     else:
         model.CLASSES = dataset.CLASSES
 
     if not distributed:
         model = build_dp(model, cfg.device, device_ids=cfg.gpu_ids)
-        outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
-                                  args.show_score_thr)
+        outputs = single_gpu_test(model, data_loader, args.show, args.show_dir, args.show_score_thr)
     else:
         model = build_ddp(
-            model,
-            cfg.device,
-            device_ids=[int(os.environ['LOCAL_RANK'])],
-            broadcast_buffers=False)
+            model, cfg.device, device_ids=[int(os.environ["LOCAL_RANK"])], broadcast_buffers=False
+        )
 
         # In multi_gpu_test, if tmpdir is None, some tesnors
         # will init on cuda by default, and no device choice supported.
         # Init a tmpdir to avoid error on npu here.
-        if cfg.device == 'npu' and args.tmpdir is None:
-            args.tmpdir = './npu_tmpdir'
+        if cfg.device == "npu" and args.tmpdir is None:
+            args.tmpdir = "./npu_tmpdir"
 
         outputs = multi_gpu_test(
-            model, data_loader, args.tmpdir, args.gpu_collect
-            or cfg.evaluation.get('gpu_collect', False))
+            model,
+            data_loader,
+            args.tmpdir,
+            args.gpu_collect or cfg.evaluation.get("gpu_collect", False),
+        )
 
     rank, _ = get_dist_info()
     if rank == 0:
         if args.out:
-            print(f'\nwriting results to {args.out}')
+            print(f"\nwriting results to {args.out}")
             mmcv.dump(outputs, args.out)
         kwargs = {} if args.eval_options is None else args.eval_options
         if args.format_only:
             dataset.format_results(outputs, **kwargs)
         if args.eval:
-            eval_kwargs = cfg.get('evaluation', {}).copy()
+            eval_kwargs = cfg.get("evaluation", {}).copy()
             # hard-code way to remove EvalHook args
             for key in [
-                    'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
-                    'rule', 'dynamic_intervals'
+                "interval",
+                "tmpdir",
+                "start",
+                "gpu_collect",
+                "save_best",
+                "rule",
+                "dynamic_intervals",
             ]:
                 eval_kwargs.pop(key, None)
             eval_kwargs.update(dict(metric=args.eval, **kwargs))
@@ -282,5 +295,5 @@ def main():
                 mmcv.dump(metric_dict, json_file)
 
 
-if __name__ == '__main__':
+if __name__ == "__main__":
     main()
diff --git a/tools/train.py b/tools/train.py
index 27aa818e..3b202e6f 100644
--- a/tools/train.py
+++ b/tools/train.py
@@ -1,4 +1,5 @@
 # Copyright (c) OpenMMLab. All rights reserved.
+
 import argparse
 import copy
 import os
@@ -17,95 +18,121 @@
 from mmdet.apis import init_random_seed, set_random_seed, train_detector
 from mmdet.datasets import build_dataset
 from mmdet.models import build_detector
-from mmdet.utils import (collect_env, get_device, get_root_logger,
-                         replace_cfg_vals, rfnext_init_model,
-                         setup_multi_processes, update_data_root)
+from mmdet.utils import (
+    collect_env,
+    get_device,
+    get_root_logger,
+    replace_cfg_vals,
+    rfnext_init_model,
+    setup_multi_processes,
+    update_data_root,
+)
+
+
+def none_or_str(value):
+    if value == "None":
+        return None
+    return value
 
 
 def parse_args():
-    parser = argparse.ArgumentParser(description='Train a detector')
-    parser.add_argument('config', help='train config file path')
-    parser.add_argument('--work-dir', help='the dir to save logs and models')
+    parser = argparse.ArgumentParser(description="Train a detector")
+    parser.add_argument(
+        "--config",
+        help="train config file path",
+    )
+    parser.add_argument("--work-dir", help="the dir to save logs and models")
     parser.add_argument(
-        '--resume-from', help='the checkpoint file to resume from')
+        "--resume-from", type=none_or_str, default=None, help="the checkpoint file to resume from"
+    )
     parser.add_argument(
-        '--auto-resume',
-        action='store_true',
-        help='resume from the latest checkpoint automatically')
+        "--auto-resume",
+        action="store_true",
+        help="resume from the latest checkpoint automatically",
+    )
     parser.add_argument(
-        '--no-validate',
-        action='store_true',
-        help='whether not to evaluate the checkpoint during training')
+        "--no-validate",
+        action="store_true",
+        help="whether not to evaluate the checkpoint during training",
+    )
     group_gpus = parser.add_mutually_exclusive_group()
     group_gpus.add_argument(
-        '--gpus',
+        "--gpus",
         type=int,
-        help='(Deprecated, please use --gpu-id) number of gpus to use '
-        '(only applicable to non-distributed training)')
+        help="(Deprecated, please use --gpu-id) number of gpus to use "
+        "(only applicable to non-distributed training)",
+    )
     group_gpus.add_argument(
-        '--gpu-ids',
+        "--gpu-ids",
         type=int,
-        nargs='+',
-        help='(Deprecated, please use --gpu-id) ids of gpus to use '
-        '(only applicable to non-distributed training)')
+        nargs="+",
+        help="(Deprecated, please use --gpu-id) ids of gpus to use "
+        "(only applicable to non-distributed training)",
+    )
     group_gpus.add_argument(
-        '--gpu-id',
+        "--gpu-id",
         type=int,
         default=0,
-        help='id of gpu to use '
-        '(only applicable to non-distributed training)')
-    parser.add_argument('--seed', type=int, default=None, help='random seed')
+        help="id of gpu to use " "(only applicable to non-distributed training)",
+    )
+    parser.add_argument("--seed", type=int, default=None, help="random seed")
     parser.add_argument(
-        '--diff-seed',
-        action='store_true',
-        help='Whether or not set different seeds for different ranks')
+        "--diff-seed",
+        action="store_true",
+        help="Whether or not set different seeds for different ranks",
+    )
     parser.add_argument(
-        '--deterministic',
-        action='store_true',
-        help='whether to set deterministic options for CUDNN backend.')
+        "--deterministic",
+        action="store_true",
+        help="whether to set deterministic options for CUDNN backend.",
+    )
     parser.add_argument(
-        '--options',
-        nargs='+',
+        "--options",
+        nargs="+",
         action=DictAction,
-        help='override some settings in the used config, the key-value pair '
-        'in xxx=yyy format will be merged into config file (deprecate), '
-        'change to --cfg-options instead.')
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file (deprecate), "
+        "change to --cfg-options instead.",
+    )
     parser.add_argument(
-        '--cfg-options',
-        nargs='+',
+        "--cfg-options",
+        nargs="+",
         action=DictAction,
-        help='override some settings in the used config, the key-value pair '
-        'in xxx=yyy format will be merged into config file. If the value to '
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file. If the value to "
         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
-        'Note that the quotation marks are necessary and that no white space '
-        'is allowed.')
+        "Note that the quotation marks are necessary and that no white space "
+        "is allowed.",
+    )
     parser.add_argument(
-        '--launcher',
-        choices=['none', 'pytorch', 'slurm', 'mpi'],
-        default='none',
-        help='job launcher')
-    parser.add_argument('--local_rank', type=int, default=0)
+        "--launcher",
+        choices=["none", "pytorch", "slurm", "mpi"],
+        default="none",
+        help="job launcher",
+    )
+    parser.add_argument("--local_rank", type=int, default=0)
     parser.add_argument(
-        '--auto-scale-lr',
-        action='store_true',
-        help='enable automatically scaling LR.')
+        "--auto-scale-lr", action="store_true", help="enable automatically scaling LR."
+    )
     args = parser.parse_args()
-    if 'LOCAL_RANK' not in os.environ:
-        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    if "LOCAL_RANK" not in os.environ:
+        os.environ["LOCAL_RANK"] = str(args.local_rank)
 
     if args.options and args.cfg_options:
         raise ValueError(
-            '--options and --cfg-options cannot be both '
-            'specified, --options is deprecated in favor of --cfg-options')
+            "--options and --cfg-options cannot be both "
+            "specified, --options is deprecated in favor of --cfg-options"
+        )
     if args.options:
-        warnings.warn('--options is deprecated in favor of --cfg-options')
+        warnings.warn("--options is deprecated in favor of --cfg-options")
         args.cfg_options = args.options
 
     return args
 
 
 def main():
+    warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.functional")
     args = parse_args()
 
     cfg = Config.fromfile(args.config)
@@ -120,52 +147,59 @@ def main():
         cfg.merge_from_dict(args.cfg_options)
 
     if args.auto_scale_lr:
-        if 'auto_scale_lr' in cfg and \
-                'enable' in cfg.auto_scale_lr and \
-                'base_batch_size' in cfg.auto_scale_lr:
+        if (
+            "auto_scale_lr" in cfg
+            and "enable" in cfg.auto_scale_lr
+            and "base_batch_size" in cfg.auto_scale_lr
+        ):
             cfg.auto_scale_lr.enable = True
         else:
-            warnings.warn('Can not find "auto_scale_lr" or '
-                          '"auto_scale_lr.enable" or '
-                          '"auto_scale_lr.base_batch_size" in your'
-                          ' configuration file. Please update all the '
-                          'configuration files to mmdet >= 2.24.1.')
+            warnings.warn(
+                'Can not find "auto_scale_lr" or '
+                '"auto_scale_lr.enable" or '
+                '"auto_scale_lr.base_batch_size" in your'
+                " configuration file. Please update all the "
+                "configuration files to mmdet >= 2.24.1."
+            )
 
     # set multi-process settings
     setup_multi_processes(cfg)
 
     # set cudnn_benchmark
-    if cfg.get('cudnn_benchmark', False):
+    if cfg.get("cudnn_benchmark", False):
         torch.backends.cudnn.benchmark = True
 
     # work_dir is determined in this priority: CLI > segment in file > filename
     if args.work_dir is not None:
         # update configs according to CLI args if args.work_dir is not None
         cfg.work_dir = args.work_dir
-    elif cfg.get('work_dir', None) is None:
+    elif cfg.get("work_dir", None) is None:
         # use config filename as default work_dir if cfg.work_dir is None
-        cfg.work_dir = osp.join('./work_dirs',
-                                osp.splitext(osp.basename(args.config))[0])
+        cfg.work_dir = osp.join("./work_dirs", osp.splitext(osp.basename(args.config))[0])
 
     if args.resume_from is not None:
         cfg.resume_from = args.resume_from
     cfg.auto_resume = args.auto_resume
     if args.gpus is not None:
         cfg.gpu_ids = range(1)
-        warnings.warn('`--gpus` is deprecated because we only support '
-                      'single GPU mode in non-distributed training. '
-                      'Use `gpus=1` now.')
+        warnings.warn(
+            "`--gpus` is deprecated because we only support "
+            "single GPU mode in non-distributed training. "
+            "Use `gpus=1` now."
+        )
     if args.gpu_ids is not None:
         cfg.gpu_ids = args.gpu_ids[0:1]
-        warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
-                      'Because we only support single GPU mode in '
-                      'non-distributed training. Use the first GPU '
-                      'in `gpu_ids` now.')
+        warnings.warn(
+            "`--gpu-ids` is deprecated, please use `--gpu-id`. "
+            "Because we only support single GPU mode in "
+            "non-distributed training. Use the first GPU "
+            "in `gpu_ids` now."
+        )
     if args.gpus is None and args.gpu_ids is None:
         cfg.gpu_ids = [args.gpu_id]
 
     # init distributed env first, since logger depends on the dist info.
-    if args.launcher == 'none':
+    if args.launcher == "none":
         distributed = False
     else:
         distributed = True
@@ -179,8 +213,8 @@ def main():
     # dump config
     cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
     # init the logger before other steps
-    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
-    log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
+    timestamp = time.strftime("%Y%m%d_%H%M%S", time.localtime())
+    log_file = osp.join(cfg.work_dir, f"{timestamp}.log")
     logger = get_root_logger(log_file=log_file, log_level=cfg.log_level)
 
     # init the meta dict to record some important information such as
@@ -188,31 +222,26 @@ def main():
     meta = dict()
     # log env info
     env_info_dict = collect_env()
-    env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()])
-    dash_line = '-' * 60 + '\n'
-    logger.info('Environment info:\n' + dash_line + env_info + '\n' +
-                dash_line)
-    meta['env_info'] = env_info
-    meta['config'] = cfg.pretty_text
+    env_info = "\n".join([(f"{k}: {v}") for k, v in env_info_dict.items()])
+    dash_line = "-" * 60 + "\n"
+    logger.info("Environment info:\n" + dash_line + env_info + "\n" + dash_line)
+    meta["env_info"] = env_info
+    meta["config"] = cfg.pretty_text
     # log some basic info
-    logger.info(f'Distributed training: {distributed}')
-    logger.info(f'Config:\n{cfg.pretty_text}')
+    logger.info(f"Distributed training: {distributed}")
+    logger.info(f"Config:\n{cfg.pretty_text}")
 
     cfg.device = get_device()
     # set random seeds
     seed = init_random_seed(args.seed, device=cfg.device)
     seed = seed + dist.get_rank() if args.diff_seed else seed
-    logger.info(f'Set random seed to {seed}, '
-                f'deterministic: {args.deterministic}')
+    logger.info(f"Set random seed to {seed}, " f"deterministic: {args.deterministic}")
     set_random_seed(seed, deterministic=args.deterministic)
     cfg.seed = seed
-    meta['seed'] = seed
-    meta['exp_name'] = osp.basename(args.config)
+    meta["seed"] = seed
+    meta["exp_name"] = osp.basename(args.config)
 
-    model = build_detector(
-        cfg.model,
-        train_cfg=cfg.get('train_cfg'),
-        test_cfg=cfg.get('test_cfg'))
+    model = build_detector(cfg.model, train_cfg=cfg.get("train_cfg"), test_cfg=cfg.get("test_cfg"))
     model.init_weights()
 
     # init rfnext if 'RFSearchHook' is defined in cfg
@@ -220,17 +249,18 @@ def main():
 
     datasets = [build_dataset(cfg.data.train)]
     if len(cfg.workflow) == 2:
-        assert 'val' in [mode for (mode, _) in cfg.workflow]
+        assert "val" in [mode for (mode, _) in cfg.workflow]
         val_dataset = copy.deepcopy(cfg.data.val)
         val_dataset.pipeline = cfg.data.train.get(
-            'pipeline', cfg.data.train.dataset.get('pipeline'))
+            "pipeline", cfg.data.train.dataset.get("pipeline")
+        )
         datasets.append(build_dataset(val_dataset))
     if cfg.checkpoint_config is not None:
         # save mmdet version, config file content and class names in
         # checkpoints as meta data
         cfg.checkpoint_config.meta = dict(
-            mmdet_version=__version__ + get_git_hash()[:7],
-            CLASSES=datasets[0].CLASSES)
+            mmdet_version=__version__ + get_git_hash()[:7], CLASSES=datasets[0].CLASSES
+        )
     # add an attribute for visualization convenience
     model.CLASSES = datasets[0].CLASSES
     train_detector(
@@ -240,8 +270,9 @@ def main():
         distributed=distributed,
         validate=(not args.no_validate),
         timestamp=timestamp,
-        meta=meta)
+        meta=meta,
+    )
 
 
-if __name__ == '__main__':
+if __name__ == "__main__":
     main()

- Backbones -	- Necks -	- Loss -	- Common -
- - VGG (ICLR'2015) - ResNet (CVPR'2016) - ResNeXt (CVPR'2017) - MobileNetV2 (CVPR'2018) - HRNet (CVPR'2019) - Generalized Attention (ICCV'2019) - GCNet (ICCVW'2019) - Res2Net (TPAMI'2020) - RegNet (CVPR'2020) - ResNeSt (CVPRW'2022) - PVT (ICCV'2021) - Swin (ICCV'2021) - PVTv2 (CVMJ'2022) - ResNet strikes back (NeurIPSW'2021) - EfficientNet (ICML'2019) - ConvNeXt (CVPR'2022) - -	- - PAFPN (CVPR'2018) - NAS-FPN (CVPR'2019) - CARAFE (ICCV'2019) - FPG (ArXiv'2020) - GRoIE (ICPR'2020) - DyHead (CVPR'2021) - -	- - GHM (AAAI'2019) - Generalized Focal Loss (NeurIPS'2020) - Seasaw Loss (CVPR'2021) - -	- - OHEM (CVPR'2016) - Group Normalization (ECCV'2018) - DCN (ICCV'2017) - DCNv2 (CVPR'2019) - Weight Standardization (ArXiv'2019) - Prime Sample Attention (CVPR'2020) - Strong Baselines (CVPR'2021) - Resnet strikes back (NeurIPSW'2021) - RF-Next (TPAMI'2022) - -