PaperMatch: arXiv Search with Embeddings and Milvus

Backend at embed_arxiv_simpler

This project allows users to search for arXiv papers either by ID or abstract. The search functionality is powered by a machine learning embedding model and Milvus, a vector database. Gradio is used to create a user-friendly web interface for interaction.

See implemented demo at papermatch.mitanshu.tech

See full explanation at the corresponding blog post: mitanshu.tech/posts/papermatch

Features

Search by Abstract: Convert the abstract into a vector and find similar papers based on cosine similarity.
Search by ID: Retrieve information directly by arXiv ID.
Top K Results: Display the top K results from Milvus based on similarity.
Embedding Model: The embedding model used is mixedbread-ai/mxbai-embed-large-v1 which happens to have these nice properties.

Requirements

Python 3.10+
Gradio for Frontend.
Milvus for Vector similarity search.
node.js for SSR.

Installation

Clone the repository:

git clone https://github.com/mitanshu7/PaperMatch.git
cd PaperMatch

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Setup app.py :

If using API to create embeddings, keep LOCAL=False:
- Get your key from Mixedbread and paste it in .env file. See .env.sample for config.
Keep FLOAT=True if you want to use float32 embeddings, else it will use binary embeddings.

Run the Gradio app:
```
python app.py
```
Interact with the web interface:
- Open your web browser and go to http://localhost:7860 to access the Gradio interface.
- Use the search bar to input arXiv ID or abstract and view the search results.

Example

Here is a basic example of how to use the search feature:

Search by Abstract:
- Enter the abstract of the paper in the provided text box.
- The system will convert it to a vector, query Milvus, and return the most relevant papers.
Search by ID:
- Input an arXiv ID directly.
- Retrieve and display the corresponding paper details.

Run at startup (systemd):

Create folder using mkdir -p ~/.config/systemd/user/ if it doesn't already exist.
Create a service file using: nano ~/.config/systemd/user/papermatch.service with the following contents (assuming using miniforge package manager with env name papermatch):

[Unit]
Description=PaperMatch App
After=network.target

[Service]
WorkingDirectory=/home/$USER/PaperMatch/
ExecStart=/bin/bash -c "source /home/$USER/miniforge3/bin/activate papermatch && python app.py"
Restart=always

[Install]
WantedBy=default.target

Issue systemctl --user daemon-reload to reload systemd.
Issue systemctl --user start papermatch.service to start the app.
Issue systemctl --user enable papermatch.service to enable app at start up.

Contributing

Feel free to contribute to the project by submitting issues, pull requests, or suggestions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or feedback, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo.gif		demo.gif
logo.png		logo.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperMatch: arXiv Search with Embeddings and Milvus

Backend at embed_arxiv_simpler

Features

Requirements

Installation

Usage

Example

Run at startup (systemd):

Contributing

License

Contact

About

Releases

Packages

Languages

License

mitanshu7/PaperMatch

Folders and files

Latest commit

History

Repository files navigation

PaperMatch: arXiv Search with Embeddings and Milvus

Backend at embed_arxiv_simpler

Features

Requirements

Installation

Usage

Example

Run at startup (systemd):

Contributing

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages