Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

IMDb

The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. Models are evaluated based on accuracy.

Model	Accuracy	Paper / Source
ULMFiT (Howard and Ruder, 2018)	95.4	Universal Language Model Fine-tuning for Text Classification
Block-sparse LSTM (Gray et al., 2017)	94.99	GPU Kernels for Block-Sparse Weights
oh-LSTM (Johnson and Zhang, 2016)	94.1	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Virtual adversarial training (Miyato et al., 2016)	94.1	Adversarial Training Methods for Semi-Supervised Text Classification
BCN+Char+CoVe (McCann et al., 2017)	91.8	Learned in Translation: Contextualized Word Vectors

SST

The Stanford Sentiment Treebank contains of 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.

Fine-grained classification:

Model	Accuracy	Paper / Source
BCN+ELMo (Peters et al., 2018)	54.7	Deep contextualized word representations
BCN+Char+CoVe (McCann et al., 2017)	53.7	Learned in Translation: Contextualized Word Vectors

Binary classification:

Model	Accuracy	Paper / Source
Block-sparse LSTM (Gray et al., 2017)	93.2	GPU Kernels for Block-Sparse Weights
bmLSTM (Radford et al., 2017)	91.8	Learning to Generate Reviews and Discovering Sentiment
BCN+Char+CoVe (McCann et al., 2017)	90.3	Learned in Translation: Contextualized Word Vectors
Neural Semantic Encoder (Munkhdalai and Yu, 2017)	89.7	Neural Semantic Encoders
BLSTM-2DCNN (Zhou et al., 2017)	89.5	Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Yelp

The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).

Fine-grained classification:

Model	Error	Paper / Source
ULMFiT (Howard and Ruder, 2018)	29.98	Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)	30.58	Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)	32.39	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)	37.95	Character-level Convolutional Networks for Text Classification

Binary classification:

Model	Error	Paper / Source
ULMFiT (Howard and Ruder, 2018)	2.16	Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)	2.64	Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)	2.90	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)	4.88	Character-level Convolutional Networks for Text Classification

SemEval

SemEval (International Workshop on Semantic Evaluation) has a specific task for Sentiment analysis. Latest year overview of such task (Task 4) can be reached at: http://www.aclweb.org/anthology/S17-2088

SemEval-2017 Task 4 consists of five subtasks, each offered for both Arabic and English:

Subtask A: Given a tweet, decide whether it expresses POSITIVE, NEGATIVE or NEUTRAL sentiment.
Subtask B: Given a tweet and a topic, classify the sentiment conveyed towards that topic on a two-point scale: POSITIVE vs. NEGATIVE.
Subtask C: Given a tweet and a topic, classify the sentiment conveyed in the tweet towards that topic on a five-point scale: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.
Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets across the POSITIVE and NEGATIVE classes.
Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across the five classes: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

Subtask A results:

Model	F1-score	Paper / Source
LSTMs+CNNs ensemble with multiple conv. ops (Cliche. 2017)	0.685	BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Deep Bi-LSTM+attention (Baziotis et al., 2017)	0.677	DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis

Aspect-based sentiment analysis

Sentihood

Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric for aspect detection and accuracy as evaluation metric for sentiment analysis.

Model	Aspect	Sentiment	Paper / Source
Liu et al. (2018)	78.5	91.0	Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis
SenticLSTM (Ma et al., 2018)	78.2	89.3	Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
LSTM-LOC (Saeidi et al., 2016)	69.3	81.9	Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods

Go back to the README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentiment_analysis.md

sentiment_analysis.md

Sentiment analysis

IMDb

SST

Yelp

SemEval

Aspect-based sentiment analysis

Sentihood

Files

sentiment_analysis.md

Latest commit

History

sentiment_analysis.md

File metadata and controls

Sentiment analysis

IMDb

SST

Yelp

SemEval

Aspect-based sentiment analysis

Sentihood