Skip to content

Semantic similarity search using approximate nearest neighbour for text using Cosine Similarity and Sentence Embeddings.

License

Notifications You must be signed in to change notification settings

abhinavthomas/semantic-similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Similarity

The semsim is Python3 based web application(Fast API) that can be used to identify the semantic similarity between agiven input text and a set of inputs. Semantic similarity is a measure of the degree to which two pieces of text carry the same meaning. This is broadly useful in obtaining good coverage over the numerous ways that a thought can be expressed using language without needing to manually enumerate them.

Semantic Similarity

We make use of Universal Sentence Encoder model Large 5. This module is about 800MB. Depending on your network speed, it might take a while to load the first time you instantiate it. After that, loading the model should be faster as modules are cached by default learn more about caching. Further, once a module is loaded to memory, inference time should be relatively fast.

Please see Universal Sentence Encoder 4 for details about the embedding

Currently the API end points are:

  • /similarity: Find similarity between the input text and list of texts and returns a json with texts as keys and similairty as value.

How to Contribute to this Project

Follow the steps described in CONTRIBUTE.md

How to use the latest version

  • Run the application locally:
    • Installing the required libraries.
  pip install -r requirements.txt
  • running the developement server
  uvicorn app.main:APP

Request parameters and body for the endpoints

  1. /similarity/
  2. Query Params: text, show
  restart system

Response: In the order of similarity

  [{"id":0,"text":""}]

Tests

No parts of the code are covered with tests.

About

Semantic similarity search using approximate nearest neighbour for text using Cosine Similarity and Sentence Embeddings.

Resources

License

Stars

Watchers

Forks

Languages