Skip to content

Latest commit

 

History

History
41 lines (30 loc) · 1.72 KB

README.md

File metadata and controls

41 lines (30 loc) · 1.72 KB

Med-VQA

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Modal Framework (MMF).

Model Name Accuracy Number of Epochs
Hierarchical Question-Image Co-attention 48.32% 42
MMF Transformer 51.76% 30
MMBT 86.78% 30

Test them for yourself!

Download the dataset from here and place it in a directory named /dataset/med-vqa-data/ in the directory where this repository is cloned.

MMF Transformer:

mmf_run config=projects/hateful_memes/configs/mmf_transformer/defaults.yaml     model=mmf_transformer     dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000

MMBT:

mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml     model=mmbt     dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000

Heirarchical Question-Image Co-attention:

cd hierarchical \ 
python main.py

Dataset details:

Dataset used for training the models was the VQA-MED dataset taken from "ImageCLEF 2019: Visual Question Answering in Medical Domain" competition. Following are few plots of some statistics of the dataset.

Distribution of the type of questions in the dataset.
Plot of frequency of words in answer.