INQUIRE

🌐 Homepage | 🖼️ Dataset | 🤗 HuggingFace | 📖 Paper

INQUIRE is an expert-level text-to-image retrieval benchmark designed to challenge multi-modal models.

Please note that this repository is preliminary. Both the code and dataset will be updated soon.

🔔 News

🚀 [2024-11-06] The paper for INQUIRE is up on arXiv! Check it out here.
🚀 [2024-10-08] INQUIRE was accepted to NeurIPS 2024 (Datasets and Benchmarks Track)!
🚀 [2024-06-07] INQUIRE is up!

🌟 Key Features

Large (5 million images) and exhaustively annotated (1-1.5k relevant images per query)
Queries come from experts (e.g., ecologists, biologists, ornithologists, entomologists, oceanographers)
Supports two-stage retrieval with CLIP models and reranking with large multimodal models.
Includes pre-computed embeddings and model outputs for faster evaluation.

Download

The INQUIRE benchmark and the iNaturalist 2024 dataset (iNat24) are available for public download. Please find information and download links here.

Setup

Clone the repository and navigate into it:

git clone https://github.com/inquire-benchmark/INQUIRE.git
cd INQUIRE

If you'd like, you can create a new environment in which to set up the repo:

conda create -n inquire python=3.10
conda activate inquire

Then, install the dependencies:

pip install -r requirements.txt

Our evaluations use pre-computed CLIP embeddings over iNat24. If you'd like to replicate our evaluations or just work with these embeddings, please download them here.

INQUIRE Fullrank Evaluation

INQUIRE-Fullrank is the full-dataset retrieval task, starting from all 5 million images of iNat24. We evaluate one-stage retrieval, using similarity search with CLIP-style models, and two-stage retrieval, where after the initial retrieval, a large multi-modal model is used to rerank the images.

One-stage retrieval with CLIP-style models

To evaluate full-dataset retrieval with different CLIP-style models, you don't necessarily need all 5 million images, but rather their embeddings. You can download our pre-computed embeddings for a variety of models from here. Then, use the following command to evaluate CLIP retrieval:

python src/eval_fullrank.py --split test --k 50

Two-stage retrieval

After the first stage, we can use large multi-modal models to re-rank the top k retrievals to improve results. This stage requires access to the iNat24 images, which you can download here. To run the second stage retrieval, use the following command:

python src/eval_fullrank_two_stage.py --split test --k 50 --from_k 50

The from_k parameter decides the number of top CLIP retrievals to rerank with the large multi-modal model, after which only the top 50 will be kept for final evaluation. In our paper, we use a from_k of 50 and 100.

INQUIRE-Rerank Evaluation

We recommend starting here, as INQUIRE-Rerank is much smaller and easier to work with. INQUIRE-Rerank is available on 🤗 HuggingFace!

INQUIRE-Rerank evaluates reranking performance by fixing an initial retrieval of 100 images for each query (from OpenClip's CLIP ViT-H-14-378). For each query (e.g. A mongoose standing upright alert), your task is to re-order the 100 images so that more of the relevant images are at the "top" of the reranked order.

Requirements

There are no extra requirements for evaluating INQUIRE-Rerank! The data will automatically download from HuggingFace if you don't already have it.

Reranking with embedding models like CLIP

Evaluate reranking performance with CLIP models:

python src/eval_rerank_with_clip.py --split test

Reranking with large multi-modal models

Evaluate reranking performance with large multi-modal models such as LLaVA-34B:

python src/eval_rerank_with_llm.py --split test

Since inference can take a long time, we've pre-computed the outputs for all large multi-modal models we work with! You can download these here.

Citation

If you use INQUIRE or find our work helpful, please consider starring our repo and citing our paper. Thanks!

@article{vendrow2024inquire,
  title={INQUIRE: A Natural World Text-to-Image Retrieval Benchmark}, 
  author={Vendrow, Edward and Pantazis, Omiros and Shepard, Alexander and Brostow, Gabriel and Jones, Kate E and Mac Aodha, Oisin and Beery, Sara and Van Horn, Grant},
  journal={NeurIPS},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
cache		cache
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INQUIRE

🔔 News

🌟 Key Features

Download

Setup

INQUIRE Fullrank Evaluation

One-stage retrieval with CLIP-style models

Two-stage retrieval

INQUIRE-Rerank Evaluation

Requirements

Reranking with embedding models like CLIP

Reranking with large multi-modal models

Citation

About

Releases

Packages

Contributors 3

Languages

License

inquire-benchmark/INQUIRE

Folders and files

Latest commit

History

Repository files navigation

INQUIRE

🔔 News

🌟 Key Features

Download

Setup

INQUIRE Fullrank Evaluation

One-stage retrieval with CLIP-style models

Two-stage retrieval

INQUIRE-Rerank Evaluation

Requirements

Reranking with embedding models like CLIP

Reranking with large multi-modal models

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages