👁️ ⛏️ Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models

Setup

$ cd src/scripts
$ yes | source setup.sh

IMPORTANT NOTES ⚠️ ⚠️

We only provide some image samples in /dataset/img/. Please refer to our dataset hub for complete data.

Hyperparameters Configuration

# This .yml file defines the key parameters for controlling experiment setups, including dataset details,
# model configurations, and inference behaviors. The results for each experiment will be saved under /result/{test_name}.

test_name:  # string: The name of the dataset being used. Results will be stored in the /result/{test_name} directory.
seed:       # int: The random seed for the experiment, used for reproducibility.

dataset:
  image_count:      # int, must be > 0: Defines the number of images to generate during the experiment.
  use_scene_graph:  # bool (0/1): Flag for whether to incorporate scene graph annotations.

model:
  name:             # string: The name of the Large Vision-Language Model (LVLM), following Huggingface tag format.
  path:             # string: The path to the LVLM being used.
  family:           # string, choices: ['llava', 'vip_llava']: Specifies the LVLM family. Default is 'llava'.
                    # Use 'vip_llava' if the ViP-LLaVA series is required.
  params:
    use_8_bit:      # bool (0/1): Enables or disables 8-bit quantization to reduce memory usage.
    device:         # string, default: 'cuda': Defines the computation device, such as 'cuda' or 'cpu'.
    low_cpu:        # bool (0/1): Enables the low CPU usage mode.

prompt:           # string: Specifies the instruction prompt to be used, formatted as '<dirname>-<filename>'.
                  # Example: If using '/prompt/naive/optim.txt', the value should be 'naive-optim'.

run_params:
  num_per_inference:  # int: The number of data points generated per image.
  use_img_ext:        # bool (0/1): Flag to indicate whether to include image extensions in img_id during data processing.
  q_prefix:           # list of strings: List of question prefixes for question generation.
  q_prefix_prop:      # list of ints: The proportion corresponding to each question prefix in q_prefix.

The configs that were used to generate huggingface datasets can be found in /src/config/sample.

LVLM Sources

Citation

@article{irawan2024towards,
  title={Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models},
  author={Irawan, Patrick Amadeus and Winata, Genta Indra and Cahyawijaya, Samuel and Purwarianti, Ayu},
  journal={arXiv preprint arXiv:2409.14785},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
dataset		dataset
prompt		prompt
result		result
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
eval.yml		eval.yml
gitignore		gitignore
main.py		main.py
main.yml		main.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👁️ ⛏️ Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models

Setup

IMPORTANT NOTES ⚠️ ⚠️

Hyperparameters Configuration

LVLM Sources

Citation

About

Contributors 3

Languages

License

patrickamadeus/vqa-nle-llava

Folders and files

Latest commit

History

Repository files navigation

👁️ ⛏️ Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models

Setup

IMPORTANT NOTES ⚠️ ⚠️

Hyperparameters Configuration

LVLM Sources

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages