Skip to content

Latest commit

 

History

History
47 lines (27 loc) · 3.1 KB

1912.33742.md

File metadata and controls

47 lines (27 loc) · 3.1 KB

A2Text-Net: A Novel Deep Neural Network for Sarcasm Detection, Liu et al., 2019

Paper, Tags: #sarcasm-detection

We employ a new DNN, A2Text-Net to mimic face-to-face speech, integrating punctuation, POS, numeral, emoji, etc. to increase classification performance.

  • Rule based: use specific pieces of evidence, positive verb negative nouns.
  • ML and DL: bag of words, word embeddings.

3 layers in A2Text-Net:

  1. Hypothesis test layer: decide if the auxiliary variable is suitable to add to the text. These auxiliary verbs can be selected using a statistical hypothesis test.
  2. Feature processing layer: a word embedding layer, convert unstructred data to structured.
  3. Neural network layer, whose input is the concatenation of the previous 2.

History

Riloff et al. collected 3k tweets and used a rule-based method to detect sarcasm. Automatically detect the positive sentiment phrases and negative situation phrases. Choosing own features > standard features (n-gram). Sarcasm is often signaled by hyperbole, using intensifiers as exclamation.

Maynard et al. proposed using the polarity of tweets. Also extracted rules that enabled them to improve the accuracy. Involved hashtags.

More recently, DL has been used with emoji, personality features, etc.

Our model can include both rule-based hypotheses and DL approaches. We include lgistic reglression, SVM, random forest, DNNs, LSTMs, GRUs as baseline models.

In the first layer of the model, we choose the auxiliary verb. In this study we propose two variables: punctuation and POS, the chi-squared test is used to determine if there's a significant difference between the expected frequencies and the observed ones in 1+ categories.

The second layer converts the unstructred data to structured data, and connects text features with auxiliary ones. Then we train the word embedding model, using word2vec, GloVe. We train the word embedding model instead of using the pre-trained models.

In the last layer we use a back-propagation NN.

Datasets

  • News headlines dataset: 27k news headlines, 15k not sarcastic vs. 12k sarcastic
  • Tweets dataset A: Riloff et al. 2013
  • Tweets dataset B: Ghosh et al. 55k records, 26k sarcastic vs. 29k not. The dataset creators labeled the data
  • Reddit dataset: 5k reddit comments, balanced dataset

Evaluation metrics

We employ ROC AUC, recall, precision, F1 score as evaluation metrics. Also 5-cross satisfied validation.

Conclusion

The experiment results show our proposed method could achieve better performance compared with other baseline models.

A2Text-Net is a suitable model to detect sarcasm that allows us to add more relevant auxiliary features other than only use text features. Our method is also easy to adjust. In future work, we'll study how to use the output of the current one to improve other sentiment analysis projects.