This is a automatic workflow for manage arxiv papers with notion.
if you like this project, please give me a star✨!
- arxiv becomes a popular platform for sharing scientific papers, as a AI researcher, I get papers almost from arxiv
- there is no efficient way to manage papers for files, citations, notes, and other information. Endnote manage citations mainly, Readpaper dones a good job for notes, files and inplace translations, but lacks of self-defined field and efficient search.
- some paper release with arxiv, but soon will be accepted by a conference or journal,how to update the bibtex and other information if you want to cite it in your paper?
- my solution is to build a visualized database for papers via notion, define my field and tags for papers, and use
Readpaper
to read them.
The problem is: when getting an interesting title of a new paper, I may do:
- opening url to search
- create a new page in notion
- copy and paste title, abstract, and other information manually
- manually download pdf and store to local directory
it is very time-consuming and error-prone.
My solution is to build up a workflow,drop title or arxiv id,program will automatically search arxiv and get the paper information, then create a new page in notion with the information, and also download the pdf file and store it to local directory.
when find a paper in abs/pdf url, just modify url using predefined api, then your auto workflow will be launched:
before | after |
---|---|
files will be downloaded, metainfos will be uploaded to notion! |
Access 127.0.0.1:8000/bibtex?refresh=true
to refresh bibtex by semanticscholar api, and update the bibtex field in notion.
As there are rate limit for semanticscholar api, we choose to start a new thread in background to refresh bibtex with a long sleep interval.
Access 127.0.0.1:8000/bibtex?refresh=true&all=true
to refresh all bibtex in the database, no matter the item has an bib entry or not.
start refresh | check refresh |
---|---|
check fetch.log to see if refresh is successful. |
export bibtex file for all your papers by accessing 127.0.0.1:8000/bibtex
.
- refer to my released notion template, and add to your workspace.
- get the database id accroding to notion doc
- get the notion access token according to notion doc
- test the notion api with
curl
command:
curl -X GET https://api.notion.com/v1/databases/{database_id} \
-H "Authorization: Bearer {token}" \
-H "Notion-Version: 2021-08-16"
pip install -r requirements.txt
export NOTION_TOKEN=<your_notion_token>
export NOTION_DATABASE_ID=<your_notion_database_id>
export DOWNLOAD_DIR=<your_download_directory>
export SS_KEY=<your_semanticscholar_api_key> # using an semanticscholar api key to get higher rate limit
export SS_SLEEP_INTERVAL=<your_semanticscholar_api_sleep_interval> # default 200s with random range -40 t0 40s
fastapi run server.py
docker build -t arxiv-workflow .
export NOTION_TOKEN=<your_notion_token>
export NOTION_DATABASE_ID=<your_notion_database_id>
export DOWNLOAD_DIR=<your_download_directory>
export SS_KEY=<your_semanticscholar_api_key>
export SS_SLEEP_INTERVAL=<your_semanticscholar_api_sleep_interval> # default 200s with random range -40 t0 40s
docker run -it --rm -e NOTION_TOKEN=$NOTION_TOKEN \
-e NOTION_DATABASE_ID=$NOTION_DATABASE_ID \
-e DOWNLOAD_DIR=/download \
-v $DOWNLOAD_DIR:/download \
-p 8000:8000 \
arxiv-workflow
- release my notion database template
- bibtex auto refresh
- export bibtex file for all your papers,
- support export bibtex file for specific paper with alias you've added
- rest API documentation and CLI tools if needed
- if system becomes complex, add config system