Skip to content

Commit

Permalink
Merge branch 'release-v2.1' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Matteo Romanello committed Apr 15, 2022
2 parents 7178928 + 1e6e85e commit a7a063e
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions documentation/README-ajmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ The *ajmc* dataset can be used for:

### Data release notes

**HIPE-2022-data v2.1**
- Thorough data cleaning: added missing OCR transcriptions, added some missing Wikidata IDs, fixed some erroneous entity types, added some missing mentions.

**HIPE-2022-data v2.0**
- This release contains dev and train set for all languages (EN, DE, FR).
- It also includes the mappings of OCR/gold transcript for those entities that are affected by OCR noise. These mappings will be of particurlar use for entity linking, where the impact of OCR noise on short entities is much higher than on other entities. These mappings are contained in three files named `ajmc-entity-ocr-correction-{LANG}.tsv` and located in the corpus' root directory. Each file contains the following columns:
Expand Down

0 comments on commit a7a063e

Please sign in to comment.