Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Jing He^1✱, Haodong Li^1✱, Wei Yin², Yixun Liang¹, Leheng Li¹, Kaiqiang Zhou³, Hongbo Zhang³, Bingbing Liu³,
Ying-Cong Chen^1,4✉

¹HKUST(GZ) ²University of Adelaide ³Noah's Ark Lab ⁴HKUST
^✱Both authors contributed equally. ^✉Corresponding author.

We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.

📢 News

2025-01-17: Please check out our latest models (lotus-normal-g-v1-1, lotus-normal-d-v1-1), which were trained with aligned surface normals, leading to improved performance!
2024-11-13: The demo now supports video depth estimation!
2024-11-13: The Lotus disparity models (Generative & Discriminative) are now available, which achieve better performance!
2024-10-06: The demos are now available (Depth & Normal). Please have a try!
2024-10-05: The inference code is now available!
2024-09-26: Paper released. Click here if you are curious about the 3D point clouds of the teaser's depth maps!

🛠️ Setup

This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB.

Clone the repository (requires git):

git clone https://github.com/EnVision-Research/Lotus.git
cd Lotus

Install dependencies (requires conda):

conda create -n lotus python=3.10 -y
conda activate lotus
pip install -r requirements.txt

🤗 Gradio Demo

Online demo: Depth & Normal
Local demo

For depth estimation, run:
```
python app.py depth
```
For normal estimation, run:
```
python app.py normal
```

🕹️ Usage

Testing on your images

Place your images in a directory, for example, under assets/in-the-wild_example (where we have prepared several examples).
Run the inference command: bash infer.sh.

Evaluation on benchmark datasets

Prepare benchmark datasets:

For depth estimation, you can download the evaluation datasets (depth) by the following commands (referred to Marigold)：

cd datasets/eval/depth/

wget -r -np -nH --cut-dirs=4 -R "index.html*" -P . https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/

For normal estimation, you can download the evaluation datasets (normal) (dsine_eval.zip) into the path datasets/eval/normal/ and unzip it (referred to DSINE).

Run the evaluation command: bash eval_scripts/eval-[task]-[mode].sh, where [task] represents the task name (depth or normal) and [mode] refers to the mode name (d or g).
(Optional) To reproduce the results presented in our paper, you can set the --rng_state_path option in the evaluation command. The RNG state files are available at ./rng_states/.

Choose your model

Below are the released models and their corresponding configurations:

CHECKPOINT_DIR	TASK_NAME	MODE
`jingheya/lotus-depth-g-v1-0`	depth	`generation`
`jingheya/lotus-depth-d-v1-0`	depth	`regression`
`jingheya/lotus-depth-g-v2-1-disparity`	depth (disparity)	`generation`
`jingheya/lotus-depth-d-v2-0-disparity`	depth (disparity)	`regression`
`jingheya/lotus-normal-g-v1-1`	normal	`generation`
`jingheya/lotus-normal-d-v1-1`	normal	`regression`

🎓 Citation

If you find our work useful in your research, please consider citing our paper:

@article{he2024lotus,
    title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction},
    author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong},
    journal={arXiv preprint arXiv:2409.18124},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
assets		assets
datasets/eval		datasets/eval
eval_scripts		eval_scripts
evaluation		evaluation
rng_states		rng_states
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
eval.py		eval.py
infer.py		infer.py
infer.sh		infer.sh
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

📢 News

🛠️ Setup

🤗 Gradio Demo

🕹️ Usage

Testing on your images

Evaluation on benchmark datasets

Choose your model

🎓 Citation

About

Releases

Packages

Contributors 4

Languages

License

EnVision-Research/Lotus

Folders and files

Latest commit

History

Repository files navigation

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

📢 News

🛠️ Setup

🤗 Gradio Demo

🕹️ Usage

Testing on your images

Evaluation on benchmark datasets

Choose your model

🎓 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages