Skip to content

Latest commit

 

History

History
33 lines (18 loc) · 3.45 KB

1806.1303.md

File metadata and controls

33 lines (18 loc) · 3.45 KB

Detecting Sarcasm is Extremely Easy ;-), Parde and Nielsen, 2018

Paper, Tags: #sarcasm-detection

We analyze the performance of a domain-general sarcasm detection system on datasets from 2 very different domains: Twitter and Amazon product reviews. We categorize the errors we identify with each, and make recommendations for addressing these issues.

Since most prior work in this area has been domain-specific, the findings may not be broadly applicable. Twitter's constraint might make the sarcasm in tweets different from that found in a domain that allows lengthy posts.

We found that by applying a domain adaptation step prior to training the model, we were able to achieve higher performance in predicting sarcasm in Amazon product reviews over models that trained on reviews alone or on a simple combination of reviews and tweets.

Sarcasm detection methods

We train our sarcasm detection approach on the same training data used in our previous work, 4k tweets and 1k Amazon product reviews and apply it to 2 test datasets: Amazon, a 251-instance set of sarcastic (87) and non-sarcastic (164) Amazon product reviews, and Twitter, a 1k-instance set of sarcastic (391) and non-sarcastic (609) tweets containing the hashtags #sarcasm (sarcastic class), #happiness, #sadness, #anger, #surprise, #fear, #disgust (negative class).

All features were extracted from each instance, then the feature space was transformed using the domain adaptation approach originally outlined by Daumé III (2007). We use Naïve Bayes, following our earlier work.

Model performance

We compute precision, recall and F1 score on the positive class for both datasets, and report results relative to the performance of other systems on the same data. Our results on Amazon are identical to those reported originally in our last paper. Our previous paper reported results on Twitter when training only on twitter data, here we instead apply the same model as applied to Amazon and achieve hightly higher results.

Error analysis

We conduct our error analysis on all misclassified instances, 402 total, in both datasets. For both datasets, there were more false positives than false negatives.

Among false negatives, in many cases the sarcasm expressed could only be inferred using world knowledge. Some tweets didn't convey sarcasm once the sarcasm hashtag was removed, some also contained sarcastic content only in other hashtags associated with the tweet. Other tweets were found not to be sarcastic upon manually analysis.

False positives, many tweets (109) included excessive punctuation, a trait commonly associated with sarcastic text. Others (29 tweets and 5 reviews) contained a mix of positive and negative sentiment, which the model mistook for sarcasm.

Based on our analysis, we recommend that the following factors be taken into account in future systems:

  • World knowledge: for many false negatives, the sarcasm expressed was detectable only through knowledge of the world. Frame-semantic resources could be used to detect some sarcasm instantiated through script-based inconsistencies.
  • Text normalization: word splitting algorithms should be applied to disambiguate compound hashtags into their constituent words, and spelling correction algorithms to normalize text.
  • Enhanced lexicon of sentiment and situation phrases: some of the errors could've been adressed if the system had understood that they described negative situations in positive terms, or vice versa.