The Fiction2 Danish Literature Corpus

🌡️📖 Danish literary sentiment

This repository holds the data for comparing Sentiment Analysis methods on Danish literature - specifically fairy tales and religious hymns of the 19th century. Our study compares human annotations to the continuous valence scores of both transformer- and dictionary-based sentiment analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.

🔎 What is included

Original and modernized Danish text
Continuous valence annotation (0-10) by human annotators (n=2-3) per sentence/verse
Automatic annotation scores per sentence/verse (using dictionary- and transformer-based Sentiment Analysis tools)

This data allows for the comparison of human/human and human/model sentiment scoring on Danish literary texts.

🔬 Data

We use two datasets: i) H.C. Andersen fairy tales, and ii) Religious hymns

	No. texts	No. annotations	No. words	Mean no. verses/sents per text	Period
HCA	3	791	18,910	263.7	1837-1847
Hymns	65	1,914	10,303	32.9	1798-1873

📖 Documentation

Code for the hymns and fairy tales analysis (separately) -- annotator agreement and human/model correlations -- is available in this folder, while the SHAP values analysis of RoBERTa scores is available in a seperate GitHub repository.


📄 Paper	Link to our paper comparing SA resources on Danish literary texts.
🔬 CHC	Center for Humanities Computing, hosting the project.
✉️ Contact	Contact the authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Fiction2 Danish Literature Corpus

🌡️📖 Danish literary sentiment

🔎 What is included

🔬 Data

📖 Documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Fiction2 Danish Literature Corpus

🌡️📖 Danish literary sentiment

🔎 What is included

🔬 Data

📖 Documentation