Skip to content

[WACV 2024] TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

License

Notifications You must be signed in to change notification settings

tb2-sy/TSP-Transformer

Repository files navigation

TSP-Transformer

Abstract

Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc. The key aspect of this problem is to learn representation effectively, as each subtask builds upon not only correlated but also distinct attributes. Inspired by visual-prompt tuning, we propose a Task-Specific Prompts Transformer, dubbed TSP-Transformer, for holistic scene understanding. It features a vanilla transformer in the early stage and tasks-specific prompts transformer encoder in the lateral stage, where tasks-specific prompts are augmented. By doing so, the transformer layer learns the generic information from the shared parts and is endowed with task-specific capacity. First, the tasks-specific prompts serve as induced priors for each task effectively. Moreover, the task-specific prompts can be seen as switches to favor task-specific representation learning for different tasks. Extensive experiments on NYUD-v2 and PASCAL-Context show that our method achieves state-of-the-art performance, validating the effectiveness of our method for holistic scene understanding.

Setup

Tested with PyTorch 1.11 and CUDA 11.3:

git clone https://github.com/tb2-sy/TSP-Transformer.git
conda create -n tsp python=3.7
conda activate tsp

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

pip install tqdm Pillow easydict pyyaml imageio scikit-image tensorboard
pip install opencv-python==4.5.4.60 setuptools==59.5.0

pip install timm==0.5.4 einops==0.4.1

Dataset

We use the same dataset (PASCAL-Context and NYUD-v2) as InvPT. You can download the data from here. And then extract the datasets by:

tar xfvz NYUDv2.tar.gz
tar xfvz PASCALContext.tar.gz

You need to specify the dataset directory as db_root variable in configs/mypath.py.

Training

Set the config files in ./conifigs, with PASCAL-Context and NYUD-v2 dataset.

# Train the NYUD-v2 dataset
./run_nyud.sh

# Train the PASCAL-Context dataset
./run_pascal.sh

Evaluation

# Evaluation the NYUD-v2 dataset
./infer_nyud.sh

# Evaluation the PASCAL-Context dataset
./infer_pascal.sh

Pre-trained models

Please download the weights of our SOTA results.

Version Dataset Download Segmentation Human parsing Saliency Normals Boundary
TSP-Transformer (our paper) PASCAL-Context google drive 81.48 70.64 84.86 13.69 74.80
TaskPrompter (ICLR 2023) PASCAL-Context - 80.89 68.89 84.83 13.72 73.50
InvPT (ECCV 2022) PASCAL-Context - 79.03 67.61 84.81 14.15 73.00
Version Dataset Download Segmentation Depth Normals Boundary
TSP-Transformer (our paper) NYUD-v2 google drive 55.39 0.4961 18.44 77.50
TaskPrompter (ICLR 2023) NYUD-v2 - 55.30 0.5152 18.47 78.20
InvPT (ECCV 2022) NYUD-v2 - 53.56 0.5183 19.04 78.10

TODO/Future work

  • Upload paper and init project
  • Training and Inference code
  • Reproducible checkpoints
  • Speed training and inference
  • Reduce cuda memory usage

Contact

For any questions related to our paper and implementation, please email [email protected].

Citation

If you find our code or paper helps, please consider citing:

@article{wang2023tsp,
  title={TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding},
  author={Wang, Shuo and Li, Jing and Zhao, Zibo and Lian, Dongze and Huang, Binbin and Wang, Xiaomei and Li, Zhengxin and Gao, Shenghua},
  journal={arXiv preprint arXiv:2311.03427},
  year={2023}
}

Acknowledgements

The code is available under the MIT license and draws from InvPT, ATRC, and MIT-Net, which are also licensed under the MIT license.

About

[WACV 2024] TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published