At my internship at Dtac, my task was to gain insights into how well Dtac has been doing against its two main competitors from Facebook comments. The approach was to build a semi-supervised sentiment analysis model. Ten percent of all collected data was tagged thanks to the call-center team at Dtac. Then, a sentiment analysis model was built on those trained data and labeled the rest of the data.
Slide: [Link]
The project started from scratch, and can be categorized into:
- Facebook comments data crawler: [Link]
- Labelling platform: [Link]
- Sentiment Analysis with LSTM: [Link]
- Visualization Dashboard (sample below)
- Project report: [Link]
While the comments were tagged on comment level, we were able to provide predictions at a word-level by taking the output of the LSTM at each word. The model seemed to work quite well even for comments with multiple phrases or sentences with different sentiments. One trick to ensure this was to only include short comments in the training/labelling data, which then forced the model to learn only from comments that have one sentiment.
We used the obtained sentiment analysis to label the whole Facebook comments we have, and plot them as a graph to compare how Dtac had been performing against its competitors.
Interestingly, we noticed a surge of negative comments on Dtac on particular date. Upon further investigation, it seemed that there was a huge network maintenance that caused the whole area network to shuwdown. This showed how our project was able to reflect the real-world phenomenon through digital data.