Chengtao Lv*, Hong Chen*, Jinyang Guo📧, Yifu Ding, Xianglong Liu
(* denotes equal contribution, 📧 denotes corresponding author.)
Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly.
🍺🍺🍺 You can refer the environment.sh
in the root directory or install step by step.
- Install PyTorch
conda create -n ptq4sam python=3.7 -y
pip install torch torchvision
- Install MMCV
pip install -U openmim
mim install "mmcv-full<2.0.0"
- Install other requirements
pip install -r requirements.txt
- Compile CUDA operators
cd projects/instance_segment_anything/ops
python setup.py build install
cd ../../..
- Install mmdet
cd mmdetection/
python3 setup.py build develop
cd ..
Download the official COCO dataset, put them into the corresponding folders of datasets/
and recollect them as the following form:
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
Download the pretrain weights (SAM and detectors), put them into the corresponding folders of ckpt/
:
sam_b
: ViT-B SAMsam_l
: ViT-L SAMsam_h
: ViT-H SAMfaster rcnn
: R-50-FPN Faster R-CNNyolox
: YOLOX-ldetr
: H-Deformable-DETRdino
: DINO
To perform quantization on models, specify the model configuration and quantization configuration. For example, to perform W6A6 quantization for SAM-B with a YOLO detector, use the following command:
python ptq4sam/solver/test_quant.py \
--config ./projects/configs/yolox/yolo_l-sam-vit-l.py \
--q_config exp/config66.yaml --quant-encoder
- yolo_l-sam-vit-l.py: configuration file for the SAM-B model with YOLO detector.
- config66.yaml: configuration file for W6A6 quantization.
- quant-encoder: quant the encoder of SAM.
We recommend using a GPU with more than 40GB for experiments.
If you want to visualize the prediction results, you can achieve this by specifying --show-dir
.
Bimodal distributions mainly occur in the mask decoder
of SAM-B and SAM-L.
If you find this repo useful for your research, please consider citing the paper.
@inproceedings{lv2024ptq4sam,
title={PTQ4SAM: Post-Training Quantization for Segment Anything},
author={Lv, Chengtao and Chen, Hong and Guo, Jinyang and Ding, Yifu and Liu, Xianglong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={15941--15951},
year={2024}
}
The code of PTQ4SAM was based on Prompt-Segment-Anything and QDrop. We thank for their open-sourced code.