Skip to content

HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment

Notifications You must be signed in to change notification settings

ivulic/hyperlex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

HyperLex

HyperLex is a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment (also known as the type-of, is-a, or hypernymy-hyponymy relation) rather than semantic similarity or relatedness. It quantifies the extent of the semantic category membership and lexical entailment (LE) relation.

HyperLex provides 2616 word pairs (2163 noun pairs and 453 verb pairs) with ratings on a scale 0-6, annotated according to the question: "To what degree is X a type of Y?". Here are some examples:

Pair Rating
girl / person 5.91
citizen / person 5.18
person / citizen 3.10
idol / person 2.57
plant / animal 0.08
to talk / to communicate 5.55
to pray / to communicate 2.90

HyperLex covers plenty of normed word types from the USF free-association database, and provides annotated examples of different WordNet-based lexical relations (i.e., hyponymy-hypernymy at different levels, co-hyponymy, synonymy, antonymy, meronymy-holonymy, no-relation). It also contains examples of different concreteness levels.

Download

Download HyperLex by clicking here.

All design details are described in the following paper. Please cite it if you use HyperLex in your own work:

HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
Ivan Vulić, Daniela Gerz, Douwe Kiela, Felix Hill, and Anna Korhonen. Computational Linguistics, volume 43, number 4, pages 781-835, 2017.
[pdf] [bib]

The provided archive includes the full HyperLex dataset, noun and verb subsets, as well as two different data splits (random and lexical) into training, development and test data. Please see the accompanying readme file for the file formats and further details.

HyperLex in Other Languages and Cross-Lingual HyperLex

Similar repositories for three other languages (German, Italian, Croatian) based on the original English HyperLex are also available. You can download multilingual and cross-lingual HyperLex by clicking here.

The multilingual and cross-lingual extensions of the original HyperLex data set are described in the following paper. Please cite it if you use the data in your own work:

Multilingual and Cross-Lingual Graded Lexical Entailment
Ivan Vulić, Simone Paolo Ponzetto, and Goran Glavaš. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pages 4963-4974, 2019.
[pdf] [bib]

Contact

Please contact the first author (Ivan Vulić) if you have any questions not addressed in the referenced papers and the accompanying repo README files.

About

HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published