This is a search engine using CLIP image and text embedding. The basic mechanism for image search is simple: calculate the cosine similarity scores between CLIP embedding of the text query and images in the database.
There are two modes of image search.
-
Base Search: In this mode, we just calculate the cosine similarity and sort from highest to lowest to show the most relevant results on the top.
-
Fair Search: In the base mode, the top results may not show diversity. To rectify that, the Algorithm 2 from the paper is implemented: "Using Image Fairness Representations in Diversity-Based Re-ranking for Recommendations". The main idea of the algorithm is to penalize the relevance score if the top retrieved results do not maximize the diversity for the chosen demographics.
These are the search results for the query "Faces with eyeglasses". For the image repository, I used CelebA-HQ.