A German Temporally Annotated News Corpus
KRAUTS (Korpus of newspapeR Articles with Underlinded Temporal expressionS) is a German temporally annotated news corpus accompanied with TimeML annotation guidelines for German. It was developed at Fondazione Bruno Kessler, Trento, Italy and at the Max Planck Institute for Informatics, Saarbrücken, Germany. Our goal is to boost temporal tagging research [1] for German.
The corpus is available under CC-BY-NC license and is described in:
- Jannik Strötgen, Anne-Lyse Minard, Lukas Lange, Manuela Speranza, Bernardo Magnini:
KRAUTS: A German Temporally Annotated News Corpus, LREC'18 (to appear)
KRAUTS contains articles from the daily newspaper Dolimiten and from the weekly newspaper Die Zeit.
The annotation guidelines are strongly based on the guidelines defined for Italian, i.e., the It-TimeML guidelines [2]. Our Annex to the It-TimeML guidelines contains (annotated) examples in German and extensions needed to adapt the It-TimeML guidelines to the specific morpho-syntactic features of German. It is available on the It-TimeML website.
A publicly available temporal tagger for German is HeidelTime, which can be found on the HeidelTime github page.
[1] Jannik Strötgen and Michael Gertz: Domain-Sensitive Temporal Tagging, Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2016.
[2] Tommaso Caselli and Rachele Sprugnoli: It-TimeML, TimeML Annotation Guidelines for Italian, version 1.4, 2015.