TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

📢 Latest Updates

2024 May-18 : Our paper is available at here.
2024 Feb-20 : This work has been accepted by COLING 2024.
2023 Oct-13 : Updated the demo interface.
2023 Aug-02 : Released the demo video. [YouTube]

TIGER Framework 💡

Figure 1: The overview of TIGER. Given the dialogue context, response modal predictor $\mathcal{M}$ determines the timing to respond with images. If the predicted response modal is text, textual dialogue response generator $\mathcal{G}$ generates the text response. Conversely, $\mathcal{G}$ produces an image description, and Text-to-Image translator $\mathcal{F}$ leverages this description to generate an image as the visual response.

Contributions 🏆

We propose mulTImodal GEnerator for dialogue Response (TIGER), a unified generative model framework designed for multimodal dialogue response generation. Notably, this framework is capable of handling conversations involving any combination of modalities.
We implement a system for multimodal dialogue response generation, incorporating both text and images, based on TIGER.
Extensive experiments show that TIGER achieves new state-of-the-art results on both automatic and human evaluations, which validate the effectiveness of our system in providing a superior multimodal conversational experience.

Demo 🌐

☝ We implemented a multimodal dialogue system based on TIGER, as depicted in figure above.

Our system offers various modifiable components:

For the textual dialogue response generator, users can can choose decoding strategies and adjust related parameters.
For the Text-to-Image translator, users can freely modify prompt templates and negative prompt to suit different requirements. Default prompt templates and negative prompts are provided, enhancing the realism of generated images.

❗Note: It's worth mentioning that our research focuses on open-domain multimodal dialogue response generation. However, the system may not possess perfect instruction-following capabilities. Users can treat it as a companion or listener, but using it as a QA system or AI painting generator is not recommended.

Examples 🪧

Supplementary Instructions 🔍

Restricted by the limited number of pages, we only give a clear and easy-to-understand introduction of our method in the paper. More implementation details, experimental results and discussions can be found in supplement.

Getting Start ⏳

Hardware ⚙️

⭐ A GPU with 24GB memory (18GB at runtime) is enough for the demo.

Installation 🔧

1. Prepare the code and the environment

cd TIGER/
conda env create -f environment.yml
conda activate tiger

2. Prepare the model weights

✨ Please download our model weights from here (Google Drive). For Text-to-Image Translator's weights, we have already uploaded it to Hugging Face, so you don't need to download it locally now. More details can be sourced from friedrichor/stable-diffusion-2-1-realistic.

The final weights would be in a single folder in a structure similar to the following:

TIGER
├── demo
│   └── ...
├── model_weights
│   ├── tiger_response_modal_predictor.pth
│   ├── tiger_textual_dialogue_response_generator.pth
│   └── tiger_text2image_translator
│       ├── feature_extractor
│       │   └── preprocessor_config.json
│       ├── scheduler
│       │   └── scheduler_config.json
│       ├── text_encoder
│       │   ├── config.json
│       │   └── pytorch_model.bin
│       ├── tokenizer
│       ├── merges.txt
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       │   └── vocab.json
│       ├── unet
│       │   ├── config.json
│       │   └── diffusion_pytorch_model.bin
│       ├── vae
│       │   ├── config.json
│       │   └── diffusion_pytorch_model.bin
│       └── model_index.json
├── tiger
│   └── ...
├── utils
│   └── ...
├── demo.py
...

Launching Demo Locally 💻

python demo.py --config demo/demo_config.yaml

Citation

If you find our work useful in your research, please consider citing us:

@inproceedings{kong-etal-2024-tiger-unified,
    title = "{TIGER}: A Unified Generative Model Framework for Multimodal Dialogue Response Generation",
    author = "Kong, Fanheng  and
      Wang, Peidong  and
      Feng, Shi  and
      Wang, Daling  and
      Zhang, Yifei",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italy",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1403",
    pages = "16135--16141",
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
demo		demo
figs		figs
tiger		tiger
utils		utils
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
supplement.md		supplement.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

📢 Latest Updates

TIGER Framework 💡

Contributions 🏆

Demo 🌐

Examples 🪧

Supplementary Instructions 🔍

Getting Start ⏳

Hardware ⚙️

Installation 🔧

1. Prepare the code and the environment

2. Prepare the model weights

Launching Demo Locally 💻

Citation

About

Releases

Packages

Languages

friedrichor/TIGER

Folders and files

Latest commit

History

Repository files navigation

TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

📢 Latest Updates

TIGER Framework 💡

Contributions 🏆

Demo 🌐

Examples 🪧

Supplementary Instructions 🔍

Getting Start ⏳

Hardware ⚙️

Installation 🔧

1. Prepare the code and the environment

2. Prepare the model weights

Launching Demo Locally 💻

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages