Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

@inproceedings{chung2024selective,
  title={Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding},
  author={Chung, Jiwan and Lee, Sungjae and Kim, Minseo and Han, Seungju and Yousefpour, Ashkan and Hessel, Jack and Yu, Youngjae},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  year={2024}
}

Please cite our work if you find our data helpful.

Data

Our recommendation is to access the corpus on huggingface:

from datasets import load_dataset

# load main data
dset = load_dataset("jiwan-chung/visargs", "annotations")

# load in the predefined negative sets for retrieval in the "Identification of Premises" task.
dset = load_dataset("jiwan-chung/visargs", "negatives")

Here's an example instance:

{
    'url': 'https://i.pinimg.com/originals/5e/7f/10/5e7f108728fb848eb8e3cccfdd62ef8f.jpg',
    'visual_premises': [
        'A small plant is growing inside a plastic bag.',
        'The bag contains a bit of soil.',
        'The bag is tied at the top, enclosing the plant.'
    ],
    'conclusion': 'The image represents the struggle of nature to survive in a human-made, constraining environment, highlighting the need for environmental awareness and protection.',
    'b_box': [
        {'h': 41, 'startX': 302, 'startY': 554, 'w': 72},
        {'h': 51, 'startX': 223, 'startY': 589, 'w': 229},
        {'h': 421, 'startX': 46, 'startY': 219, 'w': 407}
    ],
    'commonsense_premises': [
        'Plants require soil, water, light, and air to grow.',
        'Plastic bags are not a natural environment for plant growth and can restrict access to necessary resources.',
        'The act of enclosing the plant in a bag could symbolize suffocation or limitation of growth.'
    ],
    'reasoning_steps': [
        '(VP1, VP2, CP1 -> IC1): The small plant is growing, showing its resilience and need for natural resources.',
        "(VP3, CP2, CP3 -> IC2): The plastic bag enclosing the plant symbolizes human-imposed constraints on nature's growth and survival.",
        "(IC1, IC2 -> C): The image represents nature's struggle to survive in a constrained environment, emphasizing the importance of environmental protection."
    ]
}

Usage

Installation

    pip install torch>=2.1
    pip install -r requirements.txt
    pip install -e .

Data preprocessing

    python src/visarg/others/preprocess.py

Evaluation

We provide three complementary tasks for assessing the machine capacity of visual argument understanding.

Task 1 (Localization of Premises)

    python src/visarg/main.py --task 1 --model_name $MODELNAME --grounding_type "openset"
    python src/visarg/main.py --task 1 --model_name $MODELNAME --grounding_type "closedset"

Task 2 (Idenfication of Premises)

    python src/visarg/main.py --task 2 --model_name $MODENAME

Task 3 (Deduction of Conclusion)

    python src/visarg/main.py --task 3 --model_name $MODELNAME --condition 0 --prompt_style 0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
static		static
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Data

Usage

About

Releases

Packages

Languages

JiwanChung/VisArgs

Folders and files

Latest commit

History

Repository files navigation

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Data

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages