Sentiment analysis is the task of classifying the polarity of a given text.
The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. Models are evaluated based on accuracy.
The Stanford Sentiment Treebank contains of 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.
Fine-grained classification:
| Model | Accuracy | Paper / Source |
|---|---|---|
| BCN+ELMo (Peters et al., 2018) | 54.7 | Deep contextualized word representations |
| BCN+Char+CoVe (McCann et al., 2017) | 53.7 | Learned in Translation: Contextualized Word Vectors |
Binary classification:
| Model | Accuracy | Paper / Source |
|---|---|---|
| bmLSTM (Radford et al., 2017) | 91.8 | Learning to Generate Reviews and Discovering Sentiment |
| BCN+Char+CoVe (McCann et al., 2017) | 90.3 | Learned in Translation: Contextualized Word Vectors |
| Neural Semantic Encoder (Munkhdalai and Yu, 2017) | 89.7 | Neural Semantic Encoders |
| BLSTM-2DCNN (Zhou et al., 2017) | 89.5 | Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling |
The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).
Fine-grained classification:
| Model | Error | Paper / Source |
|---|---|---|
| ULMFiT (Howard and Ruder, 2018) | 29.98 | Universal Language Model Fine-tuning for Text Classification |
| DPCNN (Johnson and Zhang, 2017) | 30.58 | Deep Pyramid Convolutional Neural Networks for Text Categorization |
| CNN (Johnson and Zhang, 2016) | 32.39 | Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings |
| Char-level CNN (Zhang et al., 2015) | 37.95 | Character-level Convolutional Networks for Text Classification |
Binary classification:
| Model | Error | Paper / Source |
|---|---|---|
| ULMFiT (Howard and Ruder, 2018) | 2.16 | Universal Language Model Fine-tuning for Text Classification |
| DPCNN (Johnson and Zhang, 2017) | 2.64 | Deep Pyramid Convolutional Neural Networks for Text Categorization |
| CNN (Johnson and Zhang, 2016) | 2.90 | Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings |
| Char-level CNN (Zhang et al., 2015) | 4.88 | Character-level Convolutional Networks for Text Classification |
Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric for aspect detection and accuracy as evaluation metric for sentiment analysis.
