ESC403 Introduction to Data Science Project
In this project, we analyse a Kaggle dataset containing 27000+ tweets that have been labelled with a sentiment (negative, neutral, or positive), and also explore various methods in sentiment analysis.
Through this project we hope to gain experience with Natural Language Processing and GitHub, as well as practise what we have learned in the course.
Method | Accuracy |
---|---|
Random forest (bootstrap=False) | 0.7005 |
DistilBERT (3 epochs) | 0.7845 |
DistilBERT (2 epochs) | 0.7890 |
BERT | 0.7903 |
Tom | 0.6415 |
Jessica |
Tasks:
Tom - Finish BERT (possibly add another model, if time left), write a short script to test human-level accuracy - DONE
Plot all methods in one graphs, create the slides (possibly discuss over Skype first) - maybe one slide could be some pros/cons table of all models.