GitHub - shehel/BERT_propaganda_detection: Propaganda detection using fine-tuned BERT

Sequence classification for propaganda dataset (QCRI)

(Code for above visualization not included in repo)

Official PyTorch implementation of the paper Fine-Grained Propaganda Detection with Fine-Tuned BERT.

pip install -r requirements.txt
python -m spacy download en

Train

To create train, dev sets out of the training data: sh tools/split-train.sh
Raw dataset is converted into intermediate pickle files by running preprocess.py on it. Run preprocess.py to generate train and dev files. eg:
python preprocess.py -d [path to articles and labels directory] -o [name of output file] -l
-l flag preserves labels if included (needed even when labels aren't available).
Run the trainer, for example

python train.py --expID test_run1--trainDataset train-train.p --evalDataset train-dev.p --model bert --LR 3e-5 --trainBatch 32 --nEpochs 5 --classType all_class --nLabels 21 --testDataset train-split/tasks-2-3/train-dev/ --train True --lowerCase True &
Here, train.p and dev.p is obtained by running preprocess.py.
6. ./exp directory contains the logs and model states for training runs.

Evaluation and Testing

A trained model can be tested on a dataset using python train.py --expID test_run1 --trainDataset train-train.p --evalDataset train-dev.p --model bert --LR 3e-5 --trainBatch 32 --nEpochs 5 --classType all_class --nLabels 21 --testDataset train-split/tasks-2-3/train-dev/ --lowerCase True --loadModel exp/all_class/test_run1/ &. Doing so will use the best model based on the F1 score on validation set during train. A model can be used to produce predictions on a test by first creating the binarized pickle file and then using the previous command. The output will be in the directory containing the model state labelled pred.[test dir name].

Tested on:

QCRI dataset V2 (NLP4IF) huggingface/pytorch-pretrained-BERT 1.0
Pandas 0.25.3
Spacy 2.0.18
Torch 1.3.1

Python 3.7
CUDA 10.1

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
__pycache__		__pycache__
datasets-v2/datasets		datasets-v2/datasets
exp		exp
tools		tools
train-split		train-split
README.md		README.md
bertology.py		bertology.py
early_stopping.py		early_stopping.py
opt.py		opt.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
tokenize_text.py		tokenize_text.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence classification for propaganda dataset (QCRI)

Train

Evaluation and Testing

Tested on:

About

Releases

Packages

Contributors 3

Languages

shehel/BERT_propaganda_detection

Folders and files

Latest commit

History

Repository files navigation

Sequence classification for propaganda dataset (QCRI)

Train

Evaluation and Testing

Tested on:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages