Repository for SemUN project. It is composed of a docker-compose stack, with:
- An API (
un-semun-api
) - A frontend (
un-semun-front
) - An NLP pipeline (
un-ml-pipeline
). - A Neo4j graph database (
neo4j
service indocker-compose.yml
) - Scripts to populate the database (
un-semun-misc
) - A scraper for the United Nations Digital Library
- To have more information on the project, please refer to the project proposal
- For more details about the final result, please refer to the paper
You also need to have Docker installed, I'm using OrbStack as a Docker desktop client for macOS, but regular Docker installation works perfectly fine as well.
When Docker is setup, you just have to run:
# Start the containers
docker-compose up -d
Open the frontend at http://localhost:8080/ if using Docker Desktop or http://un-semun-frontend.un-semun.orb.local/ if using OrbStack.
To stop the stack, just run:
docker-compose down
You are all set! 🎉
To ingest documents, you can use the ML pipeline API. You can find more information about it in the README.md
of the un-ml-pipeline
folder.
You basically need to send a POST
request to the /run
endpoint at URL http://un-semun-api.un-semun.orb.local
with a JSON body containing the following fields:
[
{"recordId": "<record_id_0>"},
{"recordId": "<record_id_1>"},
{"recordId": "<record_id_2>"},
...
]
You can also send a POST
request to the /run_search
endpoint, at the same URL, with a natural language query to the UN Digital Library. The API will then scrape the results and ingest them in the database.
{
"q": "<query>"
}
You can also include a limit number of results to scrape, by adding a field "n": <value>
in the payload.
For instance:
{
"q": "Women in peacekeeping",
"n": 256
}