Skip to content

Latest commit

 

History

History
 
 

pokedex-with-bit

Search Pokemon Images with Jina

In this example, we use BiT (Big Transfer), to build an end-to-end neural image search system. You can use this demo to index an image dataset and query the most similar image from it.

Features that come out of the box:

  • Interactive query
  • Index with shards
  • REST and gRPC gateway
  • Dashboard monitor

To save you from dependency hell, we'll use the containerized version in these instructions. That means you only need to have Docker installed. No Python virtualenv, no Python package (un)install.

NOTE Use Python 3.7 for this example.

Table of Contents

Query from Docker

I want Pokémon! I don't care about Jina cloud-native neural search or whatever big names you throw around, just show me the Pokémon!

We have a pre-built Docker image ready to use, you need to run this on your console:

docker run -p 45678:45678 jinahub/app.example.pokedexwithbit:0.0.1-0.9.20

So now you're ready to query! And for that you have two options:

  • You can use Jinabox.js to find the Pokemon which matches most clearly. Just set the endpoint to http://127.0.0.1:45678/api/search and drag from the thumbnails on the left or from your file manager.
  • Or you can curl/query/js it via HTTP POST request. Details here.

Run without Docker

Download and Extract Data

For this example we're using Pokemon sprites from veekun.com. To download them run:

sh ./get_data.sh

Download and Extract Pretrained Model

In this example we use BiT (Big Transfer) model, To download it:

sh ./download.sh

Index Data

python app.py -t index

After this you should see a new workspace folder, which contains all the encoded data generated during indexing.

Query Data

python app.py -t query_restful

And then follow the Jinabox instructions from the Query from Docker section above.

Diving Deeper

Jina's REST API uses the data URI scheme to represent multimedia data. To query your indexed data, simply organize your picture(s) into this scheme and send a POST request to http://0.0.0.0:45678/api/search, e.g.:

curl --verbose --request POST -d '{"top_k": 10, "mode": "search",  "data": ["data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAIAAABLbSncAAAA2ElEQVR4nADIADf/AxWcWRUeCEeBO68T3u1qLWarHqMaxDnxhAEaLh0Ssu6ZGfnKcjP4CeDLoJok3o4aOPYAJocsjktZfo4Z7Q/WR1UTgppAAdguAhR+AUm9AnqRH2jgdBZ0R+kKxAFoAME32BL7fwQbcLzhw+dXMmY9BS9K8EarXyWLH8VYK1MACkxlLTY4Eh69XfjpROqjE7P0AeBx6DGmA8/lRRlTCmPkL196pC0aWBkVs2wyjqb/LABVYL8Xgeomjl3VtEMxAeaUrGvnIawVh/oBAAD///GwU6v3yCoVAAAAAElFTkSuQmCC", "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAIAAABLbSncAAAA2ElEQVR4nADIADf/AvdGjTZeOlQq07xSYPgJjlWRwfWEBx2+CgAVrPrP+O5ghhOa+a0cocoWnaMJFAsBuCQCgiJOKDBcIQTiLieOrPD/cp/6iZ/Iu4HqAh5dGzggIQVJI3WqTxwVTDjs5XJOy38AlgHoaKgY+xJEXeFTyR7FOfF7JNWjs3b8evQE6B2dTDvQZx3n3Rz6rgOtVlaZRLvR9geCAxuY3G+0mepEAhrTISES3bwPWYYi48OUrQOc//IaJeij9xZGGmDIG9kc73fNI7eA8VMBAAD//0SxXMMT90UdAAAAAElFTkSuQmCC"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/api/search'

JSON payload syntax and spec can be found in the docs.

The above explains how to use a REST gateway, but by default Jina uses a gRPC gateway, which has much higher performance and richer features. Read our documentation on Jina IO for more information.

Build a Docker Image

After playing with it for a while, you may want to change the code and rebuild the image. Simply run:

docker build -t jinaai/app.examples.pokedexwithbit .

Monitor Progress

If it's running successfully, you should be able to see logs scrolling in the console and in the dashboard:

Jina banner Jina banner

Under $(pwd)/workspace, you'll see a list of directories chunk_compound_indexer-* after indexing. This is because we set shards to 8.

Troubleshooting

Memory Issues

BiT model seems pretty resource-hungry. If you are using Docker Desktop, make sure to assign enough memory for your Docker container, especially when you have multiple shards. Below are my MacOS settings with two shards:

Jina banner

Incremental Indexing

Incremental indexing and entry-level deleting are yet not supported in this demo. Duplicate indexing may not throw exceptions, but may produce strange results. So make sure to clean $(pwd)/workspace before each run.

Meet other problems? Check our troubleshooting guide or submit a Github issue.

Documentation

The best way to learn Jina in depth is to read our documentation. Documentation is built on every push, merge, and release event of the master branch. You can find more details about the following topics in our documentation.

Community

  • Slack channel - a communication platform for developers to discuss Jina
  • Community newsletter - subscribe to the latest update, release and event news of Jina
  • LinkedIn - get to know Jina AI as a company and find job opportunities
  • Twitter Follow - follow us and interact with us using hashtag #JinaSearch
  • Company - know more about our company, we are fully committed to open-source!

License

Copyright (c) 2021 Jina AI Limited. All rights reserved.

Jina is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.