Skip to content

xiang-xiang-zhu/ReadVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Read Before Grounding: Scene Knowledge Visual Grounding via Multi-step parsing

This is the official code implement of COLING 2025 paper Read Before Grounding: Scene Knowledge Visual Grounding via Multi-step parsing

Data Preparation

Download data follow https://github.com/zhjohnchan/SK-VG

Unzip the file to the current folder after the data download is complete

Main Experiment

First, you should generate the visual descriptor:

python qwen_api.py # you may need adjust the data path

then you could use these visual descriptors evaluate multimodal models.

  • We have also prepared visual descriptors for each experiment in the reading_results/,

    ours_{}.json is the generated result of the main experiment;

    ours_{}_baseline.json is the generated result of the ablation study;

    ours_{}_glm.json is the generated result of the analysis.

Ablation Study

python qwen_api_baseline.py # you may need adjust the data path

Analysis

python glm4_flash.py # you may need adjust the data path

Acknowledgement

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages