Skip to content

centre-for-humanities-computing/Danish_literary_sentiment

Repository files navigation

The Fiction2 Danish Literature Corpus Static Badge

🌡️📖 Danish literary sentiment

This repository holds the data for comparing Sentiment Analysis methods on Danish literature - specifically fairy tales and religious hymns of the 19th century. Our study compares human annotations to the continuous valence scores of both transformer- and dictionary-based sentiment analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.

🔎 What is included

  • Original and modernized Danish text
  • Continuous valence annotation (0-10) by human annotators (n=2-3) per sentence/verse
  • Automatic annotation scores per sentence/verse (using dictionary- and transformer-based Sentiment Analysis tools)

This data allows for the comparison of human/human and human/model sentiment scoring on Danish literary texts.

🔬 Data

We use two datasets: i) H.C. Andersen fairy tales, and ii) Religious hymns

No. texts No. annotations No. words Mean no. verses/sents per text Period
HCA 3 791 18,910 263.7 1837-1847
Hymns 65 1,914 10,303 32.9 1798-1873

📖 Documentation

Code for the hymns and fairy tales analysis (separately) -- annotator agreement and human/model correlations -- is available in this folder, while the SHAP values analysis of RoBERTa scores is available in a seperate GitHub repository.

📄 Paper Link to our paper comparing SA resources on Danish literary texts.
🔬 CHC Center for Humanities Computing, hosting the project.
✉️ Contact Contact the authors.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages