- Project Mentor
- Dr Uthayasanker Thayasivam
- Contributors
- Piruntha Navanesan
- Jarsigan Vickneswaran
- Vahesan Vijayaratnam
We are developing a more effective and efficient system that would recognize the emotions of conversational texts. Effectiveness of the system would be addressing the high accuracy of the results while the efficiency would ideally provide the best results using a comparatively small data set.
We were able to obtain a well-structured data set from Microsoft through the “EmoContext” competition. Data collection process done by the competition organizers is explained below.
Source of Dataset :
https://competitions.codalab.org/competitions/19790#learn_the_details-data-set-format
- The models were trained using Keras with TensorFlow backend.
- Tensorflow
- Keras
- nltk
- Python 3.5 or above
- OS: Ubuntu
We are using customized FastText embedding as the pre-trained embedding model Which is a context-free word embedding trained with 322M tweets that are mostly emotion related. It generates better word embeddings for rare words, or even words not seen during training because it uses n-gram characters.
1- Install all necessary requirements.
2- Download source code from github and add them into a folder.
3- Download a pre-trained word embedding and add into the same folder.
4- Specify the embedding file name in the baseline file.
5- Run the following command to run the model.
python baseline_with_eval_With_Nltk.py -config testBaseline.config
File Name | Description |
---|---|
baseline_with_eval_With_Nltk.py | Contains the code basics for the model |
testBaseline.config | Contains main parameters |
Train.txt | Contains Training data |
Devwithoutlabels.txt | Contains test data |
SolFile.txt | Contains the result data |
Emotion | Precision | Recall | Micro F1 |
---|---|---|---|
Happy | 0.696 | 0.750 | 0.722 |
Sad | 0.472 | 0.760 | 0.751 |
Angry | 0.716 | 0.795 | 0.754 |
- Emoji prediction is weak in our model.
- Overfitting for some emotional related words.
- Censored words are not handled.
- Achieved a best micro F1 value (0.7420) which betters the 3rd Quartile value of 0.7317 and stands up into the top quarter of the leaderboard of EmoContext competition.
- When ranking the models in terms of recall for happy emotion, our model outperforms all other models.
- Providing the simplest and easily referable emotion prediction model for future researchers.
Emotion Analysis for Conversational Texts
- Reference
- Link
Apache License 2.0
Please read our code of conduct document here.