(Code for above visualization not included in repo)
Official PyTorch implementation of the paper Fine-Grained Propaganda Detection with Fine-Tuned BERT.
pip install -r requirements.txt
python -m spacy download en
- To create train, dev sets out of the training data:
sh tools/split-train.sh
- Raw dataset is converted into intermediate pickle files by running preprocess.py on it. Run preprocess.py to generate train and dev files.
eg:
python preprocess.py -d [path to articles and labels directory] -o [name of output file] -l
-l flag preserves labels if included (needed even when labels aren't available). - Run the trainer, for example
python train.py --expID test_run1--trainDataset train-train.p --evalDataset train-dev.p --model bert --LR 3e-5 --trainBatch 32 --nEpochs 5 --classType all_class --nLabels 21 --testDataset train-split/tasks-2-3/train-dev/ --train True --lowerCase True &
Here, train.p and dev.p is obtained by running preprocess.py
.
6. ./exp
directory contains the logs and model states for training runs.
A trained model can be tested on a dataset using
python train.py --expID test_run1 --trainDataset train-train.p --evalDataset train-dev.p --model bert --LR 3e-5 --trainBatch 32 --nEpochs 5 --classType all_class --nLabels 21 --testDataset train-split/tasks-2-3/train-dev/ --lowerCase True --loadModel exp/all_class/test_run1/ &
. Doing so will use the best model based on the F1 score on validation set during train. A model can be used to produce predictions on a test by first creating the binarized pickle file and then using the previous command. The output will be in the directory containing the model state labelled pred.[test dir name]
.
QCRI dataset V2 (NLP4IF)
huggingface/pytorch-pretrained-BERT 1.0
Pandas 0.25.3
Spacy 2.0.18
Torch 1.3.1
Python 3.7
CUDA 10.1