Code repository for negation cue detection. Project for Applied Text Mining course @ VU
This repository contains data, code, trained models and results for negation cue detection task.
We present approached based on using pretrained BERT model and finetuning it on SEM-2012 Shared Task dataset for negation cue detection.
We use BERT models similarly to Named Entity Recogniotion task described the in original paper.
Besides baseline
model we use the approach of adding POS-tags and pre/suf-fixes to enhance model's performance (baseline+lexicals
model).
Our repository contains also the code for other lexical features generation as well as annotation study results (annotations folder).
- Python >= 3.6
- Python libraries:
- transformers - pretrained BERT model
- SpaCy - lexical features
- PyTorch - deep learning backend, data loaders
- pandas - data processing
- scikit-learn - evaluation tools
- numpy - math
- nltk - stemmers
- tqdm - progress bar
- Optional git-lfs - if you want to use pretrained models
All dependencies can be installed with:
pip install -r requirements.txt
If problems with installation encounter, please visit official libraries' websites.
Run e2e experiment pipeline including data preprocessing, features generation, baseline and baseline+lexical model trainig and evaluation on devset and testset.
python main.py
Generate features and store them as *-features.tsv
inside data folder. They are already precomputed and stored in this repository.
python run_generate_features.py
Train both baseline and baseline+lexicals models.
python train.py
Generate error analysis reports and calculate metrics. Results are stored in reports folder. Our results are included in the repo.
python run_evaluate.py
We include pre-trained models in the repository with git-lfs.
- Baseline model: neg_cue_detection_model_baseline
- Baseline+lexicals model: neg_cue_detection_model_lex
All results can be found in reports/*metrics.txt files.
Although we achieved very good F1 scores our models still make errors. Check them out in reports/*PPerror_analysis.txt files.