This is the PyTorch implementation of the paper MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient
, In ECAI'23, Sen Wang and Jin Zheng.
[📕Paper
]
Monocular 3D object detection is an inherently ill-posed problem, as it is challenging to predict accurate 3D localization from a single image. Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly. Transferring LiDAR-based model knowledge to RGB-based models is more complex, so a general distillation strategy is needed. To alleviate cross-modal problem, we propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient, to learn the relative correlation between cross-modal features. Considering the large gap between these features, strict alignment of features may mislead the training, so we propose a looser Spearman loss. Furthermore, by selecting appropriate distillation locations and removing redundant modules, our scheme saves more GPU resources and trains faster than existing methods. Extensive experiments are performed to verify the effectiveness of our framework on the challenging KITTI 3D object detection benchmark. Our method achieves state-of-the-art performance until submission with no additional inference computational cost. Our code will be made public once accepted.
2023/12/08
Release the checkpoint of teacher and student.2023/10/21
Release the checkpoint of our distilled DID-M3D.2023/07/22
Release the codes of our MonoSKD framework.
a. Clone this repository.
b. Install the dependent libraries as follows:
-
Install the dependent python libraries:
pip install torch==1.12.0 torchvision==0.13.0 pyyaml scikit-image opencv-python numba tqdm torchsort
-
We test this repository on Nvidia 3090 GPUs and Ubuntu 18.04. You can also follow the install instructions in GUPNet (This respository is based on it) to perform experiments with lower PyTorch/GPU versions.
- Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows:
this repo
├── data
│ │── KITTI3D
| │ │── training
| │ │ ├──calib & label_2 & image_2 & depth_dense
| │ │── testing
| │ │ ├──calib & image_2
├── config
├── ...
-
You can also choose to link your KITTI dataset path by
KITTI_DATA_PATH=~/data/kitti_object ln -s $KITTI_DATA_PATH ./data/KITTI3D
-
To ease the usage, the pre-generated dense depth files at: Google Drive
CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config config/monoskd.yaml -e
CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/train_val.py --config configs/monoskd.yaml
To ease the usage, we will provide the pre-trained model at Google Drive
We also provide the pre-trained Teacher model at Google Drive and pre-trained Student model at Google Drive
Considering that the trained model usually contains the weights of the teacher network, we use the script of tools/pth_transfer.py
to delete the teacher network weights.
We provide the model reported in the paper and training logs for everyone to verify (mAP=20.21
).
It is worth noting that drop_last = True
during the training process, so the final inference result will have negligible accuracy error, which is reasonable.
Here we give the comparison.
Models | Car@BEV IoU=0.7 | Car@3D IoU=0.7 | ||||
Easy | Mod | Hard | Easy | Mod | Hard | |
original paper | 37.66 | 26.41 | 23.39 | 28.91 | 20.21 | 16.99 |
this repo | 37.66 | 26.41 | 23.39 | 28.89 | 20.19 | 16.98 |
If you have any questions, please contact me: [email protected]
If you find our work helpful, you can cite our paper
@misc{wang2023monoskd,
title={MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient},
author={Sen Wang and Jin Zheng},
year={2023},
eprint={2310.11316},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This respository is mainly based on DID-M3D, and it also benefits from mmRazor. Thanks for their great works!