VIGC: Visual Instruction Generation and Correction

We propose Visual Instruction Generation and Correction (VIGC), a framework capable of autonomously generating high-quality image-text instruction fine-tuning datasets.

Getting Started

Installation

(Optional) Creating conda environment

conda create -n vigc python=3.8
conda activate vigc

Install mmpretrain

you can follow the tutorial

You may build from source

git clone https://gitlab.pjlab.org.cn/fdc/mllm/vigc.git
cd vigc
pip install -e .

Prepare Models

obtain vicuna model

Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT. We currently use the v1.1 version of Vicuna-13B and 7B. If you already have the Vicuna weights with correct version, modify the llm_model in Model Config to the folder that contains your Vicuna weights. Otherwise, you can follow this instruction to get them, remenber that modify the config file too.
download pretrain model

We support two different kinds of pretrain checkpoints to load from: minigpt-4 and instrucblip. You can download them from the table below, more details please visit their original repositories: minigpt-4 and instrucblip.

Model Type Checkpoint pretrained with Vicuna 7B Checkpoint pretrained with Vicuna 13B

minigpt-4 Download Download

instrucblip Download Download

After download the pretrained checkpoints, please modify the pretrained in Model Config to the folder that contains pretrain weights.
download fintuned vigc model

Download the pretrained vigc checkpoints according to fintuned dataset and the Vicuna model you prepared.

Fintuned Dataset Checkpoint Fintuned with Vicuna 7B Checkpoint Fintuned with Vicuna 13B

LLaVA Download Download

OKVQA Download /

A-OKVQA Download /

Launching Demo

To Launch a demo locally, you should:

Download the pretrain weight and finetune weight of minigpt-4 and instructblip to local;
Update MODEL_CKPT in line 9 of vigc_demo.py；
Run python vigc_demo.py and then follow the instruction on the prompts to view in browser. Arguments are as follows:
- device0: The gpu id of the first model
- device1: The gpu id of the second model

You can also visit to play with VIGC online demo.

Tutorials

Generate QA

generate QA based on COCO2017 for Llava

You should first download the finetuned vigc model
Then modify the finetuned in corresponding Inference Config to the path to the checkpoint file.

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna13b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna13b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna13b

generate QA based on Object365 for Llava

You should first download the finetuned vigc model
Then modify the finetuned in corresponding Inference Config to the path to the checkpoint file.

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml  # generate detail description data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna13b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna13b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna13b

generate QA based on COCO2017 for A-OKVQA or OKVQA

You should first download the finetuned vigc model
Then modify the pretrained in corresponding Inference Config to the path to the checkpoint file.

Generate the question first:

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_question.yaml   # generate questions for A-OKVQA using instruct-blip-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_question.yaml   # generate questions for OKVQA using instruct-blip-vicuna7b

Modify the annotaion in generate_answer.yaml to the path of the questions generated in the above step, then generate the answers:

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_answer.yaml   # generate answers for A-OKVQA using instruct-blip-vicuna7b

torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_answer.yaml   # generate answers for OKVQA using instruct-blip-vicuna7b

Train VIGC Model

Finetune VIGC Model on A-OKVQA Dataset
1. download our formatted A-OKVQA json files
2. download iamges follow the original repo, skip this step if you already have them.
3. modify images and annotation in these configs:train config, val config, with their actual paths.
4. run finetune script
```
torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/a-okvqa/normal_vigc.yaml
```
Finetune VIGC Model on OKVQA Dataset
1. download our formatted OKVQA json files
2. download iamges follow the original repo, skip this step if you already have them.
3. modify images and annotation in these configs:train config, val config, with their actual paths.
4. run finetune script
```
torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/okvqa/normal_vigc.yaml
```
Finetune VIGC Model on LLaVA-150k Dataset
1. download our formatted LLaVA json files
2. download iamges follow the original repo, skip this step if you already have them.
3. modify images and annotation in these configs:conversation config, detail config, complex config, val config, with their actual paths.
4. run finetune script
```
torchrun --nproc_per_node=8 train.py  --cfg-path vigc/projects/mini_gpt4_vicuna7b/vigc/llava-150k/normal_vigc.yaml  # using Mini-GPT4 Vicuna7b

torchrun --nproc_per_node=8 train.py  --cfg-path vigc/projects/mini_gpt4_vicuna13b/vigc/llava-150k/normal_vigc.yaml  # using Mini-GPT4 Vicuna13b
```

Acknowledgement

BLIP2. The model architecture of VIGC follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
InstrucBlip and MiniGPT-4. The pretrain models of VIGC are come from InstrucBlip and MiniGPT-4.
Lavis. This repository is built upon Lavis!
Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
LLaVA, A-OKVQA, OKVQA. The model of VIGC are finetuned on these datasets.

Paper and Citing VIGC

You can find more details in our paper.

If you're using VIGC in your research or applications, please cite using this BibTeX:

@article{wang2023vigc, 
      title={VIGC: Visual Instruction Generation and Correction},
      author={Wang, Bin and Wu, Fan and Han, Xiao and Peng, Jiahui and Zhong, Huaping and Zhang, Pan and Dong, Xiaoyi and Li, Weijia and Li, Wei and Wang, Jiaqi and He, Conghui},
      journal={arXiv preprint arXiv:2308.12714},
      year={2023}
}

Contact us

If you have any questions, comments or suggestions, please do not hesitate to contact us at wangbin@pjlab.org.cn or wufan@pjlab.org.cn.

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VIGC: Visual Instruction Generation and Correction

Table of Contents

Getting Started

Installation

Prepare Models

Launching Demo

Tutorials

Generate QA

Train VIGC Model

Acknowledgement

Paper and Citing VIGC

Contact us

License

Model Type	Checkpoint pretrained with Vicuna 7B	Checkpoint pretrained with Vicuna 13B
minigpt-4	Download	Download
instrucblip	Download	Download

Fintuned Dataset	Checkpoint Fintuned with Vicuna 7B	Checkpoint Fintuned with Vicuna 13B
LLaVA	Download	Download
OKVQA	Download	/
A-OKVQA	Download	/

Files

README.md

Latest commit

History

README.md

File metadata and controls

VIGC: Visual Instruction Generation and Correction

Table of Contents

Getting Started

Installation

Prepare Models

Launching Demo

Tutorials

Generate QA

Train VIGC Model

Acknowledgement

Paper and Citing VIGC

Contact us

License