BERT is state-of-the-art natural language processing model from Google. Using its latent space, it can be repurpossed for various NLP tasks, such as sentiment analysis.
This simple wrapper based on Transformers (for managing BERT model) and PyTorch achieves 92% accuracy on guessing positivity / negativity on IMDB reviews.
First, you need to prepare IMDB data which are publicly available. Format used here is one review per line, with first 12500 lines being positive, followed by 12500 negative lines. Or you can simply download dataset on my Google Drive here. Default folder read by script is data/
.
Training with default parameters can be performed simply by.
python script.py --train
Optionally, you can change output dir for weights or input dir for dataset.
You can find out how great you are (until your grandma gets her hands on BERT as well) simply by running
python script.py --evaluate
Of course, you need to train your data first or get them from my drive.
python script.py --predict "It was truly amazing experience."
or
python script.py --predict "It was so terrible and disgusting as coffee topped with ketchup."