Official implementation of the paper Text-Image Alignment for Diffusion-based Perception (CVPR 2024).
Neehar Kondapaneni*, Markus Marks*, Manuel Knott*, Rogerio Guimaraes, Pietro Perona
We have 2 seperate shell scripts for setting up the environment.
setup.sh
for setting up the environment for Pascal VOC Semantic Segmentation and Watercolor2k and Comic2k Object Detection.setup_mm.sh
for setting up the environment for ADE20k Semantic Segmentation, NYUv2 Depth Estimation, Nighttime Driving, and Dark Zurich Semantic Segmentation (using MM libraries).
bash setup.sh
If you want to use our models for inference, there are two options available:
We provide a simple interface to load our model checkpoints and run inference with custom image and text inputs. Please refer to the demo/ directory for examples.
export PYTHONPATH=$PYTHONPATH:$(pwd)
python demo/depth_inference.py
python demo/seg_inference.py
python demo/detection_inference.py
python demo/seg_inference_driving.py
If you want to generate results for a whole dataset that was used in our study (e.g., ADE20k, NYUv2) using pre-generated captions, please refer to the test_tadp_mm.py and test_tadp_depth.py scripts.
TODO
All results that are reported in our paper can be reproduced using the scripts in the cvpr_experiments/ directory.
This code is based on VPD, diffusers, stable-diffusion, mmsegmentation, LAVT, and MIM-Depth-Estimation.
@article{kondapaneni2023tadp,
title={Text-image Alignment for Diffusion-based Perception},
author={Kondapaneni, Neehar and Marks, Markus and Knott, Manuel and Guimaraes, Rogerio and Perona, Pietro},
journal={arXiv preprint arXiv:2310.00031},
year={2023}
}