Skip to content

Commit

Permalink
Moved documentation to Wiki
Browse files Browse the repository at this point in the history
  • Loading branch information
JonasGLund99 committed Dec 8, 2023
1 parent 8ba66b1 commit 24a41c3
Showing 1 changed file with 12 additions and 222 deletions.
234 changes: 12 additions & 222 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,232 +1,22 @@
# PreprocessingLayer_TripleConstruction

A repository for groups C (Relation Extraction) and D (Concept Linking) | KNOX 2023

PreprocessingLayer TripleConstruction is responsible for creating triples that can be utilised by group E to construct a knowledge graph.
The triples will be data stored in the form of a subject (entity IRI), predicate, and object (entity IRI), where the subject has some relation to the object.

## How to get started

### Helpful links:

1. <a href="https://github.com/Knox-AAU/PreprocessingLayer_TripleConstruction">Shared GitHub Repository for relation extraction and concept linking</a>.
2. <a href="https://www.overleaf.com/read/bqcxqfmhtvkx">The report on relation extraction.</a> Specifically read the abstract, and the following sections 'Requirements', 'Input and Output', 'Architecture for the multilingual RE solution utilising
Llama', and 'Future works' section of the report. Key topics for relation extraction are: RDF, knowledge graphs, ontology, turtle, and data sets for relation extraction.
3. <a href="https://www.overleaf.com/4114212188xccnszjmqrdx#4089f4"> Overleaf document that is shared between all groups.</a> Specifically read the section regarding the KNOX pipeline and the section regarding pre-processing layer because this solution is very much reliant on groups A, B, D (minor reliance), and E.

## Prerequisites

If you wish to run the solution locally (not through docker). Run the command `pip install -r requirements.txt` to install the necessary libraries/modules for the solution.\
Docker should be installed if you want to run the solution in a container <a href="https://www.docker.com/products/docker-desktop/">download docker desktop here</a>.

## Running the solution

### Run docker container on local machine using this command

`docker-compose up --build`

### Access the knox server via ssh

`ssh <[email protected]>@knox-preproc01.srv.aau.dk -L <your_port>:localhost:4444`

Note that the ports map to the ports used in the ssh command give in "your port".

### Deploy new version manually

Deployment is normally handled by Watchtower on push to main. However, in case of the need of manual deployment, run:

`docker run --name tc_api -p 0.0.0.0:4444:<your_port> --add-host=host.docker.internal:host-gateway -e API_SECRET=*** -e ACCESS_SECRET=*** -d ghcr.io/knox-aau/preprocessinglayer_tripleconstruction:main`

### Access through access API endpoint

`knox-proxy01.srv.aau.dk/tripleconstruction-api/tripleconstruction`

### Direct access to endpoint

`http://knox-preproc01.srv.aau.dk:4444/tripleconstruction`

## Testing
The testing framework <a href="https://docs.python.org/3/library/unittest.html"> unittest </a> has been utilised to test the solution. The testing framework discovers all directories with the naming convention `test_`. In addition, all Python files beginning with `test_` inside those directories will be run.

## Naming conventions

This repository uses the snake-case naming convention

## File structure

### /relation_extraction

The solution developed by group C to perform relation extraction on sentences with entity mentions and IRIs pointing to the entities.

### /concept_linking

Is the solution developed by group D to perform concept linking on sentences with entity mentions and IRIs pointing to the entities.

### /test
The directory containing the tests for the solutions. The testing framework <a href="https://docs.python.org/3/library/unittest.html"> unittest </a> has been utilised to test the solution. The testing framework discovers all directories inside with the naming convention `test_`. In addition, all Python files beginning with `test_` will be run.

## Server documentation

### post endpoints

/tripleconstruction

### Summary:

The tripleconstruction expects JSON-formatted data and starts the process of concept_linking and relation_extraction in parallel.

##### Example query:

```json
[
{
"filename": "path/to/Artikel.txt",
"language": "en",
"sentences": [
{
"sentence": "Barrack Obama is married to Michelle Obama.",
"sentenceStartIndex": 20,
"sentenceEndIndex": 62,
"entityMentions": [
{
"name": "Barrack Obama",
"startIndex": 0,
"endIndex": 12,
"iri": "knox-kb01.srv.aau.dk/Barack_Obama"
},
{
"name": "Michelle Obama",
"startIndex": 27,
"endIndex": 40,
"iri": "knox-kb01.srv.aau.dk/Michele_Obama"
}
]
}
]
}
]
```

#### Responses

| Code | Description | Schema |
| ---- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| 200 | The post request was correctly formatted, and has been received by the server. | concept_linking and relation_extraction will be run in parallel. |
| 422 | The post request was incorrectly formatted, and the server could therefore not parse the data. | Nothing is executed. |

## Configuration management

The solution utilises docker to fetch the latest updates that have been pushed to GitHub. This is done through GitHub worklfows.

### Workflow for testing and building project

The workflow for testing and building (Continuous integration) is defined as such.

```yml
name: test-and-build

on:
push:
branches: ["**"]

jobs:
test:
runs-on: ubuntu-latest

steps:
- name: Checkout Repository
uses: actions/checkout@v3

- name: Setup python
uses: actions/setup-python@v3
with:
python-version: 3.12.0

- name: Install dependencies
run: |
echo "Installing dependencies"
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: |
echo "Testing..."
python -m unittest || exit 1
#Run all test

build:
needs: test
runs-on: ubuntu-latest

steps:
- name: Checkout Repository
uses: actions/checkout@v3

- name: Setup python
uses: actions/setup-python@v3
with:
python-version: 3.12.0

- name: Install dependencies
run: |
echo "Installing dependencies"
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Build both projects if test pass
run: |
echo "Building projects"
python3 relation_extraction/main.py
python3 concept_linking/main.py
```
This workflow runs the tests discovered in the /test directory. If those tests pass, then the solution will be built. The workflow run on a push to all branches in the repository.
### Workflow for building and deploying the docker image
Every time something is pushed to main a new docker image will be build and deployed(Continuous deployment).
```yml
name: build-and-deploy-docker-image

on:
push:
branches: ["main"]
## Documentation
[Relation Extraction | Knox Wiki](http://wiki.knox.aau.dk/en/relation-extraction)

env:
# Use docker.io for Docker Hub if empty
REGISTRY: ghcr.io
# github.repository as <account>/<repo>
IMAGE_NAME: ${{ github.repository }}
[Concept Linking | Knox Wiki](http://wiki.knox.aau.dk/en/concept-linking)

jobs:
docker_build_and_deploy_image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- uses: actions/checkout@v3
- name: Log into registry ${{ env.REGISTRY }}
uses: docker/login-action@28218f9b04b4f3f62068d7b6ce6ca5b26e35336c
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
context: ./
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
```
## Authors
**Group C (Relation Extraction)**
- Anders Andersen Toft <[email protected]>
- Asbjørn Juncker Christensen <[email protected]>
- Casper Korfitz Mortensen <[email protected]>
- Johannes Karstoft Pedersen <[email protected]>
- Jonas Geertsen Lund <[email protected]>
- Rasmus Rytter Sørensen <[email protected]>

This workflow is responsible for creating and deploying the new docker image. The workflow runs only on push to the main branch.
**Group D (Concept Linking)**

0 comments on commit 24a41c3

Please sign in to comment.