This repository holds the data for comparing Sentiment Analysis methods on Danish literature - specifically fairy tales and religious hymns of the 19th century. Our study compares human annotations to the continuous valence scores of both transformer- and dictionary-based sentiment analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.
- Original and modernized Danish text
- Continuous valence annotation (0-10) by human annotators (n=2-3) per sentence/verse
- Automatic annotation scores per sentence/verse (using dictionary- and transformer-based Sentiment Analysis tools)
This data allows for the comparison of human/human and human/model sentiment scoring on Danish literary texts.
We use two datasets: i) H.C. Andersen fairy tales, and ii) Religious hymns
No. texts | No. annotations | No. words | Mean no. verses/sents per text | Period | |
---|---|---|---|---|---|
HCA | 3 | 791 | 18,910 | 263.7 | 1837-1847 |
Hymns | 65 | 1,914 | 10,303 | 32.9 | 1798-1873 |
Code for the hymns and fairy tales analysis (separately) -- annotator agreement and human/model correlations -- is available in this folder, while the SHAP values analysis of RoBERTa scores is available in a seperate GitHub repository.
📄 Paper | Link to our paper comparing SA resources on Danish literary texts. |
🔬 CHC | Center for Humanities Computing, hosting the project. |
✉️ Contact | Contact the authors. |