Acknowledgement

[TMLR - Nov'24] λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Version 2 of the paper is out!

🚀 Latest Updates (April 2024)

🔥🔥🔥 Concept-specific finetuning: DreamBooth style concept-based fine-tuning is now available (without catastrophic forgetting)!!
- Click here to perform custom finetuning.
🔥🔥🔥 Multi-concept interpolation: Quick and easy script to perform multiconcept interpolations!!
🔥🔥 Benchmark Release: Multibench (DropBox) -- Complex multi-subject personalization benchmark. This includes images with and without background.

News: Checkout our previous work, ECLIPSE on resource effeicient T2I accepted @ CVPR 2024.

Overview

This repository contains the inference code for our paper, λ-ECLIPSE.

The λ-ECLIPSE model is a light weight support for multi-concept personalization. λ-ECLIPSE is tiny T2I prior model designed for Kandinsky v2.2 diffusion image generator.
λ-ECLIPSE model extends the ECLIPSE-Prior via incorporating the image-text interleaved data.
λ-ECLIPSE shows that we do not need to train the Personalized T2I (P-T2I) models on lot of resources. For instance, λ-ECLIPSE is trained on mere 74 GPU Hours (A100) compared to it's couterparts BLIP-Diffusion (2304 GPU hours) and Kosmos-G (12300 GPU hours).

Please follow the below steps to run the inference locally.

Setup

Installation

git clone [email protected]:eclipse-t2i/lambda-eclipse-inference.git

conda create -p ./venv python=3.9
pip install -r requirements.txt

Run Inference

Note: λ-ECLIPSE prior is not a diffusion model -- while image decoders are.

We recommend either referring to the colab notebook or test.py script to understand the inner working of λ-ECLIPSE.

Additionally, for stronger Canny edge controlled results, we refer the users to use ControlNet models, as λ-ECLIPSE's goal is not to strongly follow Canny edge map but find the balance between target concepts and canny edge map to produce the most optimal results with some trade-off.

# run the inference:
conda activate ./venv

# single-subject example
python test_quick.py --prompt="a cat on top of the snow mountain" --subject1_path="./assets/cat.png" --subject1_name="cat"

# single-subject canny example
python ./test_quick.py --prompt="a dog is surfing" --subject1_path="./assets/dog2.png" --subject1_name="dog" --canny_image="./assets/dog_surf_ref.jpg"

# multi-subject example
python test_quick.py --prompt="a cat wearing glasses at a park" --subject1_path="./assets/cat.png" --subject1_name="cat" --subject2_path="./assets/blue_sunglasses.png" --subject2_name="glasses"

## results will be stored in ./assets/

Run Demo

conda activate ./venv
gradio main.py

Concept-specific finetuning

🔥🔥🔥 All concepts combined training:

export DATASET_PATH="<path-to-parent-folder-containing-concept-specific-folders>"
export OUTPUT_DIR="<output-dir>"
export TRAINING_STEPS=8000 # for 30 concepts --> ~250 iterations per concept

python train_text_to_image_decoder_whole_db.py \
        --instance_data_dir=$DATASET_PATH \
        --subject_data_dir=$DATASET_PATH \
        --output_dir=$OUTPUT_DIR \
        --validation_prompts='A dog' \
        --resolution=768 \
        --train_batch_size=1 \
        --gradient_accumulation_steps=4 \
        --gradient_checkpointing \
        --max_train_steps=$TRAINING_STEPS \
        --learning_rate=1e-05 \
        --max_grad_norm=1 \
        --checkpoints_total_limit=3 \
        --lr_scheduler=constant \
        --lr_warmup_steps=0 \
        --report_to=wandb \
        --validation_epochs=1000 \
        --checkpointing_steps=1000 \
        --push_to_hub

Individual concept training:

export DATASET_PATH="<path-to-folder-containing-images>"
export OUTPUT_DIR="<output-dir>"
export CONCEPT="<high-level-concept-name-like-dog>" # !!! Note: This is to check concept overfitting. This never supposed to generate your concept images.
export TRAINING_STEPS=400

python train_text_to_image_decoder.py \
        --instance_data_dir=$DATASET_PATH \
        --subject_data_dir=$DATASET_PATH \
        --output_dir=$OUTPUT_DIR \
        --validation_prompts="A $CONCEPT" \
        --resolution=768 \
        --train_batch_size=1 \
        --gradient_accumulation_steps=4 \
        --gradient_checkpointing \
        --max_train_steps=$TRAINING_STEPS \
        --learning_rate=1e-05 \
        --max_grad_norm=1 \
        --checkpoints_total_limit=4 \
        --lr_scheduler=constant \
        --lr_warmup_steps=0 \
        --report_to=wandb \
        --validation_epochs=100 \
        --checkpointing_steps=100 \
        --push_to_hub

Combined Inference (Prior + Finetunined UNet):

To perform combined λ-ECLIPSE and finetuned UNet (previous step) inference:

# run the inference:
conda activate ./venv

# single/multi subject example
python test_quick.py --unet_checkpoint="mpatel57/backpack_dog" --prompt="a backpack at the beach" --subject1_path="./assets/backpack_dog.png" --subject1_name="backpack"

## results will be stored in ./assets/

🚀 Multiconcept Interpolation

Please refer to the following script to perform interpolations on your own concepts:

python ./interpolation.py

Acknowledgement

We would like to acknoweldge excellent open-source text-to-image models (Kalro and Kandinsky) without them this work would not have been possible. Also, we thank HuggingFace for streamlining the T2I models.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interpolation.py		interpolation.py
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py
test_quick.py		test_quick.py
train.sh		train.sh
train_text_to_image_decoder.py		train_text_to_image_decoder.py
train_text_to_image_decoder_whole_db.py		train_text_to_image_decoder_whole_db.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[TMLR - Nov'24] λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

🚀 Latest Updates (April 2024)

Overview

Setup

Installation

Run Inference

Run Demo

Concept-specific finetuning

Combined Inference (Prior + Finetunined UNet):

🚀 Multiconcept Interpolation

Acknowledgement

About

Releases

Packages

Languages

License

eclipse-t2i/lambda-eclipse-inference

Folders and files

Latest commit

History

Repository files navigation

[TMLR - Nov'24] λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

🚀 Latest Updates (April 2024)

Overview

Setup

Installation

Run Inference

Run Demo

Concept-specific finetuning

Combined Inference (Prior + Finetunined UNet):

🚀 Multiconcept Interpolation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages