Archive your web page.
- Python >= 3.8
- Playwright
pip
pip install pagesaver
playwright install --with-deps
✨🍰✨
Or you can use pip install git+https://github.com/ZhaoQi99/PageSaver.git
install latest version.
- Init PageSaver:
pagesaver init
- Start HTTP Server:
pagesaver server
nohup pagesaver server >> server.log 2>&1 &
- Examples:
~$ curl http://127.0.0.1:8001/api/record/https://www.baidu.com/?format=MHTML&format=PDF -H 'Authorization: <API_TOKEN>'
~$ curl http://127.0.0.1:8001/api/record/notion/https://www.baidu.com/?format=MHTML&format=PDF&api_token=api_token&database_id=1&token_v2=token_v2&title=test -H 'Authorization: <API_TOKEN>'
pagesaver export https://www.baidu.com -o . -f MHTML,PDF
Using the Authorization header, format is: Authorization: <API_TOKEN>
- GET
api/record/{url}?format=MHTML&format=PDF
Parameter | Type | Required | Description |
---|---|---|---|
format | string | No | Storage format, can be MHTML or PDF, defaults to all. |
- GET
api/record/notion/{url}?format=MHTML&format=PDF&api_token=<NOTION_API_TOKEN>&database_id=<NOTION_DATABASE_ID>&token_v2=<NOTION_TOKEN_V2>&title=test
- Notion API Token
- Notion Token V2: F12 -> Application -> Cookies -> token_v2
- Database ID: https://www.notion.so/{USERNAME}/{DATABASE_ID}
- Connection with: Notion ->Top right corner -> More -> Connections -> Connect to -> Your Integration
Parameter | Type | Required | Description |
---|---|---|---|
format | string | No | Storage format, can be MHTML or PDF, defaults to all. |
api_token* | string | Yes | Notion API Token |
database_id* | string | Yes | Notion Database ID |
title | string | No | Title stored in Notion. |
token_v2 | string | No | Obtained from Browser->Cookies->token_v2.To store files in Notion, this parameter is required. |
~$ pagesaver export -h
Usage: pagesaver export [OPTIONS] URL
Export page to the output file
Options:
-f, --format [MHTML,PDF] Format which you want to export [required]
-o, --output DIRECTORY Output directory of the file [required]
-n, --name TEXT Name of the exported file [default: exported]
-h, --help Show this message and exit.
~$ pagesaver init
~$ pagesaver server -h
Usage: pagesaver server [OPTIONS]
Run PageSaver HTTP server
Options:
-h, --help Show this message and exit.
-b, --bind TEXT The TCP host/address to bind to. [default: 0.0.0.0:8001]
PageSaver will read the configuration from config.py
automatically.
- type: storage type. Currently supported values are "local".
- path: path of storage.This is only used when type is set to "local".
The TCP host/address to bind to.
Default: 0.0.0.0:8001
The property name in Notion to use for the title of a page.
Default: title
The property name in Notion to use for the link of a page.
Default: link
The property name in Notion to use for the MHTML file of a page.
Default: mhtml
GNU General Public License v3.0
- Qi Zhao([email protected])