Skip to content

Latest commit

 

History

History
46 lines (27 loc) · 4.31 KB

README.md

File metadata and controls

46 lines (27 loc) · 4.31 KB

The Fiction2 Danish Literature Corpus Static Badge

🌡️📖 Danish literary sentiment

This repository holds the data for comparing Sentiment Analysis methods on Danish literature - specifically fairy tales and religious hymns of the 19th century. Our study compares human annotations to the continuous valence scores of both transformer- and dictionary-based sentiment analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.

🔎 What is included

  • Original and modernized Danish text
  • Continuous valence annotation (0-10) by human annotators (n=2-3) per sentence/verse
  • Automatic annotation scores per sentence/verse (using dictionary- and transformer-based Sentiment Analysis tools)

This data allows for the comparison of human/human and human/model sentiment scoring on Danish literary texts.

🔬 Data

We use two datasets: i) H.C. Andersen fairy tales, and ii) Religious hymns

No. texts No. annotations No. words Mean no. verses/sents per text Period
HCA 3 791 18,910 263.7 1837-1847
Hymns 65 1,914 10,303 32.9 1798-1873

📖 Documentation

Code for the hymns and fairy tales analysis (separately) -- annotator agreement and human/model correlations -- is available in this folder, while the SHAP values analysis of RoBERTa scores is available in a seperate GitHub repository.

📄 Paper Link to our paper comparing SA resources on Danish literary texts.
🔬 CHC Center for Humanities Computing, hosting the project.
✉️ Contact Contact the authors.