Skip to content

Commit

Permalink
Concept linking (#28)
Browse files Browse the repository at this point in the history
* Update README for concept-linking
Change structure of requirements for entire project

* Fully integrated prompt-engineering solution in concept linking to new shared repo.

* Integrated UntrainedSpacy solution
Changed gitignore
Refactored some duplicate code for untrainedspacy and promptEng sol.

* Integrated StringComparison solution

* Update README for concept-linking
Change structure of requirements for entire project

* Fully integrated prompt-engineering solution in concept linking to new shared repo.

* Integrated UntrainedSpacy solution
Changed gitignore
Refactored some duplicate code for untrainedspacy and promptEng sol.

* Integrated StringComparison solution

* Minor bugfixing

* Fixed missing output.json

* Test requirements fix

* Test requirements fix 2

* Fix empty folders not being committed

* Added support for outputting sentence when doing test_run

* ML Solution refactor

* ML Solution refactor

* Deleted obsolete test files

* Evaluation files

* Fixed small label error

* Fixed ont mapping mistakes

* Evaluation Script

* Generated output for Promptbased

* Generated output for Promptbased

* ML UnitTest

* ML UnitTest

* Minor fixes

* String comparison test (#27)

* String comparison test

* Fixed requirements

* Update test-and-build.yml

* Added error handling for no triples generated.

* Changed python 3.12 to 3.11

* Fixed test_server.py for concept_linking.

---------

Co-authored-by: denBruneBarone <[email protected]>
Co-authored-by: Vi Thien Le <[email protected]>
Co-authored-by: Gamma <[email protected]>
Co-authored-by: Mikkel Wissing <[email protected]>
  • Loading branch information
5 people authored Dec 12, 2023
1 parent 716049f commit 140cff6
Show file tree
Hide file tree
Showing 56 changed files with 24,737 additions and 23,113 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- name: Setup python
uses: actions/setup-python@v3
with:
python-version: 3.12.0
python-version: 3.11.0

- name: Install dependencies
run: |
Expand Down
5 changes: 2 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@ FROM python:3.11-slim

WORKDIR /code

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
RUN pip install --no-cache-dir -r requirements_docker.txt


CMD ["python", "-u", "-m", "server.server", "--host", "0.0.0.0", "--port", "4444", "--reload"]
7 changes: 2 additions & 5 deletions concept_linking/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,15 +153,12 @@ dmypy.json
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# JetBrains specific template is maintained in a separate JetBrains..gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# and can be added to the global .gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Ignoring NDA classified files
data/files/PromptEngineering/
data/files/EvaluationData/output.json

#Llama model
tools/LlamaServer/llama-2-7b-chat.Q2_K.gguf
Expand Down
127 changes: 96 additions & 31 deletions concept_linking/README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,126 @@
# D: Preproscessing Layer - Concept Linking
# Concept Linking

Repository of group D in KNOX pipeline.
---
## Background
In group D, four different solutions have been implemented. These will be mentioned further in a later section.
At default the solution that will be running is PromptEngineering.

## Description
To change which solution to run in the pipeline perform the following changes...

This repository is for type classification of already provided sententes with given entity mentions. Several different solutions were created, in order to find the best one.
First change directory to the 'server' folder in the root directory.

### Dependencies
Next, open the server.py file.
On line 24, under the text "#Begin ConceptLinking"
Change the code, to run the desired solution.

- Python
- PIP
- Git

### Installing
## Requirements
For any of the four solutions, it is necessary to install the requirements found
in the requirements.txt file inside /{Solution_Name}/requirements.txt

However since this is a joint codebase for both relation-extraction and concept-linking,
there is a global requirements.txt file in the root directory.
It follows a nested structure, meaning that installing only to one if the root folder,
will install all the rest.

It will install both the necessary requirements for both groups' solutions.
However since it is possible to change which of the four concept-linking solutions to run,
it is also necessary to the requirements to be installed accordingly.
This is done by navigation to

```
git clone https://github.com/Knox-AAU/PreprocessingLayer_Concept-Linking
./concept_linking/requirements.txt
```
In this file, there is listed a reference to the four different requirements.txt files.
Remove the #(comment) from the one referencing the solution you want to run.

### Example
Install the requirements for the PromptEngineering solution

### Initial Setup
Navigate to the following directory

- Navigate to root folder
- Run the following command for installing all requirements:
```
../concept_linking/solutions/PromptEngineering/
```

And run the following command
```
pip install -r requirements.txt
```

### Adding modules
## Solutions

- Navigate to root folder.
- Run the following command to add all installed modules:
Following below is brief description of each of the four solutions and how to get started.

---

```
pip freeze > requirements.txt
```

### Executing program
### Machine Learning
description WIP

- Navigate to main.py in Program directory
- Run main.py with Python
### Prompt Engineering
Uses the LLM Llama2. A prompt is given to the model.

```
python .\program\main.py
prompt_template = {
"system_message": ("The input sentence is all your knowledge. \n"
"Do not answer if it can't be found in the sentence. \n"
"Do not use bullet points. \n"
"Do not identify entity mentions yourself, use the provided ones \n"
"Given the input in the form of the content from a file: \n"
"[Sentence]: {content_sentence} \n"
"[EntityMention]: {content_entity} \n"),
"user_message": ("Classify the [EntityMention] in regards to ontology classes: {ontology_classes} \n"
"The output answer must be in JSON in the following format: \n"
"{{ \n"
"'Entity': 'Eiffel Tower', \n"
"'Class': 'ArchitecturalStructure' \n"
"}} \n"),
"max_tokens": 4092
}
```

or:
The variables {content_sentence} and {content_entity} is found in a previous part of the KNOX pipeline.
The variable {ontology_classes} fetched by the Ontology endpoint provided by group E(Database Layer)


#### Using LocalLlama API server
It is possible to use a local LlamaServer. It can be found in ../concept_linking/tools/LlamaServer.
A README for setting up an instance of this server can be found in the directory given above.

#### Using the Llama API server hosted in the KNOX pipeline
WIP
Go to the directory /concept_linking/PromptEngineering/main
set the api_url accordingly
```
api_url={domain or ip+port of llama server hosted in the knox pipeline}
```
cd .\program\
python .\main.py
```
Refer to the <a href="https://docs.google.com/spreadsheets/d/1dvVQSEvw15ulNER8qvl1P8Ufq-p3vLU0PswUeahhThg/edit#gid=0" target="_blank">Server Distribution document</a>
for specific dns and ip+port information.

## Report
### String Comparison
description WIP

Description of the project can be found in the report on [Overleaf](https://www.overleaf.com/project/65000513b10b4521e8907099) (requires permission)

## Authors
### Untrained Spacy
description WIP



---

## Tools

### LlamaServer

Lucas, Gamma, Vi, Mikkel, Caspar & Rune
### OntologyGraphBuilder

---

## Report
Description of the project can be found in the report on Overleaf (requires permission)

## Authors
Lucas, Gamma, Vi, Mikkel, Caspar & Rune
Loading

0 comments on commit 140cff6

Please sign in to comment.