EDIS: Entity-Driven Image Search over Multimodal Web Content

Introduction

We introduce Entity-Driven Image Search (EDIS), a challenging dataset for cross-modal image search in the news domain. EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
Our experimental results show that EDIS challenges state-of-the-art methods with dense entities and a large-scale candidate set.

Environment

git clone https://github.com/emerisly/EDIS.git
cd EDIS/
conda create -n edis
conda activate edis
pip install -r requirements.txt

Datasets

Download edis image and unzip

curl -L 'https://cornell.box.com/shared/static/w6rnuk14plns7xs0po6ksxwwvxz6s76y.part00' --output edis_image.tar.gz.part00
curl -L 'https://cornell.box.com/shared/static/vi3hzcb340efh4fko8xtycjh1cn6r79g.part01' --output edis_image.tar.gz.part01
curl -L 'https://cornell.box.com/shared/static/92t2nl89q8wxf5kk0ds6reba2wp9jeqi.part02' --output edis_image.tar.gz.part02

Download edis json and unzip

curl -L 'https://cornell.box.com/shared/static/0aln48iy3wkvzg2iklczazmqdpdf83lc' --output edis_json.zip

Training

Fine-tune
update image_root in retrieval_edis.yaml to directory of edis image

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --master_port=1234 train_edis.py \
--config ./configs/retrieval_edis.yaml \
--output_dir output/retrieval_edis_mblip_4gpus_5e-5

Evaluate
update image_root in retrieval_evaluate.yaml to directory of edis image

python evaluate_retrieval.py --config configs/retrieval_evaluate.yaml --image_bank restricted --cuda 0
python compute_metrics.py -d output/evaluate_results

You can download the pre-trained and fine-tuned checkpoint from below

checkpoints	mBLIP w/ ViT-B	mBLIP w/ ViT-L
Pre-trained	Download	Download
Fine-tuned	-	Download

Citation

If you find this code useful for your research, please cite our paper:

@article{liu2023edis,
  title={EDIS: Entity-Driven Image Search over Multimodal Web Content},
  author={Liu, Siqi and Feng, Weixi and Chen, Wenhu and Wang, William Yang},
  journal={arXiv preprint arXiv:2305.13631},
  year={2023}
}

Acknowledgement

We thank the authors of TARA, VisualNews, BLIP, CLIP, and Pyserini for their work and open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
data		data
models		models
transform		transform
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
compute_metrics.py		compute_metrics.py
download_dataset.sh		download_dataset.sh
evaluate_retrieval.py		evaluate_retrieval.py
model_script.sh		model_script.sh
pretrain.py		pretrain.py
requirements.txt		requirements.txt
train_edis.py		train_edis.py
train_retrieval.py		train_retrieval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDIS: Entity-Driven Image Search over Multimodal Web Content

Introduction

Environment

Datasets

Training

Citation

Acknowledgement

About

Releases

Packages

Languages

License

emerisly/EDIS

Folders and files

Latest commit

History

Repository files navigation

EDIS: Entity-Driven Image Search over Multimodal Web Content

Introduction

Environment

Datasets

Training

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages