Original dataset | https://zenodo.org/record/6046853 |
Document type | newspaper (mid-19C to mid 20C) |
Languages | French |
Annotation guidelines | Aligned on Quaero guidelines, see publication. |
Annotation tool | brat |
Original format and tagging scheme | brat format |
Annotations | NERC |
Version (used in HIPE-2022) | v1.0 |
Related publication | Diachronic evaluation of NER systems on old newspapers |
License |
Coarse-grained tagset | Fine-grained tagset | Nesting applies | Linking applies |
---|---|---|---|
pers | pers.ind | yes | no |
pers.coll | yes | no | |
loc | loc.adm.town | yes | no |
loc.adm.reg | yes | no | |
loc.adm.nat | yes | no | |
loc.adm.sup | yes | no | |
loc.phys.geo | yes | no | |
loc.phys.hydro | yes | no | |
loc.oro | yes | no | |
loc.fac | yes | no | |
loc.add.phys | yes | no | |
loc.add.elec | yes | no | |
loc.unk | yes | no |
The letemps dataset can be used for:
- Tasks: NERC-Coarse, NERC-Fine.
- Challenges: Multilingual Newspaper Coarse, Multilingual Newspaper Fine, Global Adaptation Coarse.
- Guidelines: letemps annotation guidelines are compatible with the Quaero guidelines, and to some extent with HIPE-2020 impresso guidelines.
- Documents: letemps documents corresponds to newspaper articles.
- Sentence splitting: performed automatically on OCRed text (performances not perfect).
- About person function and title annotations: letemps dataset is very similar to hipe2020 and newseye. For the type PERS a noticeable difference is that letemps annotations do not include function (e.g. general, mayor) in person names, but do include titles (e.g. M., Madame). Both hipe2020 and newseye include functions and titles as part of person name annotations.