For each of the 100 novels in ELTeC-fra, 10 sentences have been randomly sampled, for a total of 1000 sentences or around 18600 words.
Each sentence has been annotated with one of four speech categories:
- n = narrator speech, including indirect character speech
- c = direct character speech
- x = mixed speech, including character and narrator speech
- u = undecidable / other, e.g. free indirect speech or thought as well as letters.
No annotation guidelines, one annotator, no inter-annotator agreement checks.
- Overall percentages of the four categories in the sample
- Percentages for the four categories by decade.
- Sentence length by speech category.