Novel Object Synthesis via Adaptive Text-Image Harmony

Zeren Xiong¹ · Zedong Zhang¹ · Zikun Chen¹ · Shuo Chen² · Xiang Li³ · Gan Sun⁴ ·Jian Yang¹ · Jun Li¹

¹Nanjing University of Science and Technology · ²RIKEN· ³Nankai University · ⁴South China University of Technology ·

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. Our contributions include:

Introducing a scale factor and an injection step to balance text and image features in cross-attention, while preserving image information in self-attention during the text-image inversion diffusion process.
Designing a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image.
Presenting a novel similarity score function that maximizes the similarities between the generated object image and the input text/image while balancing these similarities to harmonize text and image integration.

🚀 News

2024.12.20: 🎉 Our code is released! Explore the possibilities of novel object synthesis with our framework.

🛠️ 1. Set Environment

To set up the environment for running the code, follow these steps:

Clone the repository:

git clone https://github.com/xzr52/ATIH-code
cd ATIH-code

Create a conda environment and install dependencies:

conda create -n ATIH python=3.10
conda activate ATIH
pip install -r requirements.txt

Set CUDA paths:
```
export CUDA_HOME=/usr/local/cuda
```
Install the required submodule:
```
cd GroundingDino
pip install -e .
```

Download the segmentation model weights seg_ckpts, unzip them, and place them in the ckpts/ folder.

🚀 2. Quick Start

We provide a Gradio-based application for an intuitive user interface to interact with the framework.Our code is designed to run on two GPUs, each with 24GB of memory, by default. If your single GPU has more than 30GB of memory, you can modify the code by setting device2 to the same as device1, allowing the program to run on a single GPU.

To launch the app locally:

export no_proxy=127.0.0.1,localhost
python app.py

🖼️ 3. Inference One Image

To perform inference on a single image, use the following command:

python inference_one_image.py --image_path examples/rabbit.png --target_prompt 'cock'

--image_path: Path to the input image.
--target_prompt: Text description of the object.

🎨 4. Complex Prompt Generation

Our framework supports using complex prompts. Here's an example of how the results look:

python inference_one_image.py --image_path examples/lion.png --target_prompt 'Green triceratops with rough, scaly skin and massive frilled head'

🙌 Acknowledgment

This work was partially supported by the National Science Foundation of China under Grant Nos. 62072242 and 62361166670. We also thank the developers of the following projects, which our implementation builds upon:

We deeply appreciate their contributions , which have been instrumental to our work.

📖BibTeX

@inproceedings{
  xiong2024novel,
  title={Novel Object Synthesis via Adaptive Text-Image Harmony},
  author={Zeren Xiong and Ze-dong Zhang and Zikun Chen and Shuo Chen and Xiang Li and Gan Sun and Jian Yang and Jun Li},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=ENLsNDfys0}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
GroundingDino		GroundingDino
examples		examples
masactrl		masactrl
src		src
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference_one_image.py		inference_one_image.py
requirements.txt		requirements.txt
step.py		step.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Novel Object Synthesis via Adaptive Text-Image Harmony

🚀 News

🛠️ 1. Set Environment

🚀 2. Quick Start

🖼️ 3. Inference One Image

🎨 4. Complex Prompt Generation

🙌 Acknowledgment

📖BibTeX

About

Releases

Packages

Languages

License

xzr52/ATIH-code

Folders and files

Latest commit

History

Repository files navigation

Novel Object Synthesis via Adaptive Text-Image Harmony

🚀 News

🛠️ 1. Set Environment

🚀 2. Quick Start

🖼️ 3. Inference One Image

🎨 4. Complex Prompt Generation

🙌 Acknowledgment

📖BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages