[TMLR - Nov'24] λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
- 🔥🔥🔥 Concept-specific finetuning: DreamBooth style concept-based fine-tuning is now available (without catastrophic forgetting)!!
- 🔥🔥🔥 Multi-concept interpolation: Quick and easy script to perform multiconcept interpolations!!
- 🔥🔥 Benchmark Release: Multibench (DropBox) -- Complex multi-subject personalization benchmark. This includes images with and without background.
News: Checkout our previous work, ECLIPSE on resource effeicient T2I accepted @ CVPR 2024.
This repository contains the inference code for our paper, λ-ECLIPSE.
-
The λ-ECLIPSE model is a light weight support for multi-concept personalization. λ-ECLIPSE is tiny T2I prior model designed for Kandinsky v2.2 diffusion image generator.
-
λ-ECLIPSE model extends the ECLIPSE-Prior via incorporating the image-text interleaved data.
-
λ-ECLIPSE shows that we do not need to train the Personalized T2I (P-T2I) models on lot of resources. For instance, λ-ECLIPSE is trained on mere 74 GPU Hours (A100) compared to it's couterparts BLIP-Diffusion (2304 GPU hours) and Kosmos-G (12300 GPU hours).
Please follow the below steps to run the inference locally.
git clone [email protected]:eclipse-t2i/lambda-eclipse-inference.git
conda create -p ./venv python=3.9
pip install -r requirements.txt
Note: λ-ECLIPSE prior is not a diffusion model -- while image decoders are.
We recommend either referring to the colab notebook or test.py script to understand the inner working of λ-ECLIPSE.
- Additionally, for stronger Canny edge controlled results, we refer the users to use
ControlNet
models, as λ-ECLIPSE's goal is not to strongly follow Canny edge map but find the balance between target concepts and canny edge map to produce the most optimal results with some trade-off.
# run the inference:
conda activate ./venv
# single-subject example
python test_quick.py --prompt="a cat on top of the snow mountain" --subject1_path="./assets/cat.png" --subject1_name="cat"
# single-subject canny example
python ./test_quick.py --prompt="a dog is surfing" --subject1_path="./assets/dog2.png" --subject1_name="dog" --canny_image="./assets/dog_surf_ref.jpg"
# multi-subject example
python test_quick.py --prompt="a cat wearing glasses at a park" --subject1_path="./assets/cat.png" --subject1_name="cat" --subject2_path="./assets/blue_sunglasses.png" --subject2_name="glasses"
## results will be stored in ./assets/
conda activate ./venv
gradio main.py
🔥🔥🔥 All concepts combined training:
export DATASET_PATH="<path-to-parent-folder-containing-concept-specific-folders>"
export OUTPUT_DIR="<output-dir>"
export TRAINING_STEPS=8000 # for 30 concepts --> ~250 iterations per concept
python train_text_to_image_decoder_whole_db.py \
--instance_data_dir=$DATASET_PATH \
--subject_data_dir=$DATASET_PATH \
--output_dir=$OUTPUT_DIR \
--validation_prompts='A dog' \
--resolution=768 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=$TRAINING_STEPS \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--checkpoints_total_limit=3 \
--lr_scheduler=constant \
--lr_warmup_steps=0 \
--report_to=wandb \
--validation_epochs=1000 \
--checkpointing_steps=1000 \
--push_to_hub
Individual concept training:
export DATASET_PATH="<path-to-folder-containing-images>"
export OUTPUT_DIR="<output-dir>"
export CONCEPT="<high-level-concept-name-like-dog>" # !!! Note: This is to check concept overfitting. This never supposed to generate your concept images.
export TRAINING_STEPS=400
python train_text_to_image_decoder.py \
--instance_data_dir=$DATASET_PATH \
--subject_data_dir=$DATASET_PATH \
--output_dir=$OUTPUT_DIR \
--validation_prompts="A $CONCEPT" \
--resolution=768 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=$TRAINING_STEPS \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--checkpoints_total_limit=4 \
--lr_scheduler=constant \
--lr_warmup_steps=0 \
--report_to=wandb \
--validation_epochs=100 \
--checkpointing_steps=100 \
--push_to_hub
To perform combined λ-ECLIPSE and finetuned UNet (previous step) inference:
# run the inference:
conda activate ./venv
# single/multi subject example
python test_quick.py --unet_checkpoint="mpatel57/backpack_dog" --prompt="a backpack at the beach" --subject1_path="./assets/backpack_dog.png" --subject1_name="backpack"
## results will be stored in ./assets/
Please refer to the following script to perform interpolations on your own concepts:
python ./interpolation.py
We would like to acknoweldge excellent open-source text-to-image models (Kalro and Kandinsky) without them this work would not have been possible. Also, we thank HuggingFace for streamlining the T2I models.