arxiv auto workflow

This is a automatic workflow for manage arxiv papers with notion.

if you like this project, please give me a star✨!

background & motivation

arxiv becomes a popular platform for sharing scientific papers, as a AI researcher, I get papers almost from arxiv
there is no efficient way to manage papers for files, citations, notes, and other information. Endnote manage citations mainly, Readpaper dones a good job for notes, files and inplace translations, but lacks of self-defined field and efficient search.
some paper release with arxiv, but soon will be accepted by a conference or journal,how to update the bibtex and other information if you want to cite it in your paper?
my solution is to build a visualized database for papers via notion, define my field and tags for papers, and use Readpaper to read them.

The problem is: when getting an interesting title of a new paper, I may do:

opening url to search
create a new page in notion
copy and paste title, abstract, and other information manually
manually download pdf and store to local directory

it is very time-consuming and error-prone.

solution and features

My solution is to build up a workflow，drop title or arxiv id，program will automatically search arxiv and get the paper information, then create a new page in notion with the information， and also download the pdf file and store it to local directory.

1. search arxiv and get paper information

when find a paper in abs/pdf url, just modify url using predefined api, then your auto workflow will be launched:

before	after

files will be downloaded, metainfos will be uploaded to notion!

2. auto bibtex refresh

Access 127.0.0.1:8000/bibtex?refresh=true to refresh bibtex by semanticscholar api, and update the bibtex field in notion. As there are rate limit for semanticscholar api, we choose to start a new thread in background to refresh bibtex with a long sleep interval. Access 127.0.0.1:8000/bibtex?refresh=true&all=true to refresh all bibtex in the database, no matter the item has an bib entry or not.

start refresh	check refresh

check `fetch.log` to see if refresh is successful.

3. export bibtex file for all your papers

export bibtex file for all your papers by accessing 127.0.0.1:8000/bibtex.

how to use

prepare notion database and notion token

refer to my released notion template, and add to your workspace.
get the database id accroding to notion doc
get the notion access token according to notion doc
test the notion api with curl command:

curl -X GET https://api.notion.com/v1/databases/{database_id} \
  -H "Authorization: Bearer {token}" \
  -H "Notion-Version: 2021-08-16"

from source code

pip install -r requirements.txt
export NOTION_TOKEN=<your_notion_token>
export NOTION_DATABASE_ID=<your_notion_database_id>
export DOWNLOAD_DIR=<your_download_directory>
export SS_KEY=<your_semanticscholar_api_key> # using an semanticscholar api key to get higher rate limit
export SS_SLEEP_INTERVAL=<your_semanticscholar_api_sleep_interval> # default 200s with random range -40 t0 40s
fastapi run server.py

using docker

docker build -t arxiv-workflow .

export NOTION_TOKEN=<your_notion_token>
export NOTION_DATABASE_ID=<your_notion_database_id>
export DOWNLOAD_DIR=<your_download_directory>
export SS_KEY=<your_semanticscholar_api_key>
export SS_SLEEP_INTERVAL=<your_semanticscholar_api_sleep_interval> # default 200s with random range -40 t0 40s

docker run -it --rm -e NOTION_TOKEN=$NOTION_TOKEN \
    -e NOTION_DATABASE_ID=$NOTION_DATABASE_ID \
    -e DOWNLOAD_DIR=/download \
    -v $DOWNLOAD_DIR:/download \
    -p 8000:8000 \
    arxiv-workflow

TODOs

release my notion database template
bibtex auto refresh
export bibtex file for all your papers,
support export bibtex file for specific paper with alias you've added
rest API documentation and CLI tools if needed
if system becomes complex, add config system

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxiv auto workflow

background & motivation

solution and features

1. search arxiv and get paper information

2. auto bibtex refresh

3. export bibtex file for all your papers

how to use

prepare notion database and notion token

from source code

using docker

TODOs

About

Releases

Packages

Languages

Xiang-cd/arxiv-workflow

Folders and files

Latest commit

History

Repository files navigation

arxiv auto workflow

background & motivation

solution and features

1. search arxiv and get paper information

2. auto bibtex refresh

3. export bibtex file for all your papers

how to use

prepare notion database and notion token

from source code

using docker

TODOs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages