Name		Name	Last commit message	Last commit date
parent directory ..
.readme_images		.readme_images
pods		pods
.gitignore		.gitignore
README.MD		README.MD
app.py		app.py
flow-index.yml		flow-index.yml
flow-query.yml		flow-query.yml
get_data.sh		get_data.sh
requirements.txt		requirements.txt
setup_run.sh		setup_run.sh

README.MD

Multimodal Search With TIRG & fashion200k

In this example we build a multimodal search engine for image retrieval using TIRG (Composing Text and Image for Image Retrieval). We use the Fashion200k dataset, where the input query is in the form of a clothing image plus some text that describes the desired modifications to the image.

At index time we encode the images with TIRG's image encoder. At query time we use the feature embeddings constructed by TIRG's multimodal encoder based on both the input image and text to search over the indexed images. The query text is the modification we want to apply over the query image.

TIRG's multimodal encoder requires both image and text to create the final encoding. This is made possible by leveraging the capabilities of Jina's MultiModalEncoder to handle any type of modality.

The Fashion200k model was only trained for certain types of image modifications, such as types of dresses, colors or lengths. Hence, it is limited in the types of modifications it can do, e.g. replace with 3/4 length, replace with beige.

Note: The TIRG paper reports a Recall@1 of 14.1 for the Fashion200k dataset and some queries might not have good results.

Table of Contents

Download and Extract Data
Build Encoder Images
Index Image Data
Query
Troubleshooting
Documentation
Community
License

Download and Extract Data

Run the following script to download the data from Kaggle.

Note: the size of the dataset is 6GB.

bash ./get_data.sh data/

Alternatively, You can Download and extract the data from google drive.

Index Image Data

Index 1000 images. This can take some time and you can try a smaller number as well. We use a custom TirgImageEncoder for encoding the images. Jina normalizes the images before sending them to the encoder. If you decide to index large datasets, it is recommended to increase the number of shards and parallelization.

python app.py --task index -n 1000 -overwrite True

If it's running successfully, you should be able to see and scroll through the logs in the console and in the dashboard:

Query

This will start the server, where you can then run your query and see the results as a pop-up. TIRG's multimodal encoder requires both input image and text to create the final encoding. This is made possible by leveraging our MultiModalEncoder capabilities. We use our QueryLanguageDriver to redirect text and image documents based on modality.

python app.py --task query --image_path path_to_image --text_query 'change color to red'

Troubleshooting

Memory Issues

If you are using Docker Desktop, make sure to assign enough memory for your Docker container, especially when you have multiple replicas. Below are my MacOS settings with two replicas:

Documentation

The best way to learn Jina in depth is to read our documentation. Documentation is built on every push, merge, and release event of the master branch. You can find more details about the following topics in our documentation.

Community

Slack channel - a communication platform for developers to discuss Jina
Community newsletter - subscribe to the latest update, release and event news of Jina
LinkedIn - get to know Jina AI as a company and find job opportunities
- follow us and interact with us using hashtag #JinaSearch
Company - know more about our company, we are fully committed to open-source!

License

Jina is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-search-tirg

multimodal-search-tirg

README.MD

Multimodal Search With TIRG & fashion200k

Download and Extract Data

Index Image Data

Query

Troubleshooting

Memory Issues

Documentation

Community

License

Files

multimodal-search-tirg

Directory actions

More options

Directory actions

More options

Latest commit

History

multimodal-search-tirg

Folders and files

parent directory

README.MD

Multimodal Search With TIRG & fashion200k

Download and Extract Data

Index Image Data

Query

Troubleshooting

Memory Issues

Documentation

Community

License