Releases · nert-nlp/streusle · GitHub

16 Jun 03:03

nschneid

STREUSLE 4.5 Latest

Latest

Update SNACS annotations to the v2.6 standard (automatically rename p.Causer -> p.Force and p.RateUnit -> p.SetIteration).
Update UD to v2.10. This affects many UPOS tags and lemmas, especially for proper names. The UD update also introduces lines encoding multiword tokens (not to be confused with multiword expressions) for clitics.

Assets 2

05 Nov 03:58

nschneid

STREUSLE 4.4

Update govobj.py to recognize a different style of annotation for preposition stranding.
Update UD to v2.6.
Link from README to a new paper on converting STREUSLE annotations to UCCA (Universal Conceptual Cognitive Annotation), which uses this version of the data in experiments.

Assets 2

02 May 01:27

nschneid

STREUSLE 4.3

Updated preposition/possessive annotations to SNACS v2.5 guidelines, which includes changes in the set of labels.
Added a sentence that had been omitted from a document in the training set.
Updated UD parses to the latest dev version (post-v2.5). This improves lemmas for misspelled words and adds paragraph boundaries.
Link from README to new Pepper converter module.
Link from README to online search tool using ANNIS.

Assets 2

02 Jan 03:37

nschneid

STREUSLE 4.2

Annotations

Manually corrected all tokens with the placeholder lexcat symbol !!@ (introduced in v4.0) to have a real lexcat and, if appropriate, a supersense (issue #15).
A number of revisions to SNACS (preposition/possessive supersense) annotations coordinated with updated guidelines ([5], specifically SNACS v2.4, https://arxiv.org/abs/1704.02134v5; this incorporates updates for SNACS v2.3 as well).
Minor corrections in the data and validation improvements.
Updated UD parses to the latest dev version (post-v2.5). Among other things, this improves lemmas for words with nonstandard spellings.

Utilities and data formats

Added streuseval.py, a unified evaluation script for MWEs + supersenses (issue #31).
Added streusvis.py, for viewing sentences with their MWE and supersense annotations.
Added supdate.py (sentence-wise) and tupdate.py (token-wise) for editing lexical semantic annotations (issue #54).
Added format conversion scripts conllulex2json.py, conllulex2UDlextag.py, and UDlextag2json.py.
Normalized the way MWEs within a sentence are numbered in markup (normalize_mwe_numbering.py, issue #42).
Several improvements to govobj.py (most notably issue #35, affecting 184 tokens, and a small fix in 58db569 which affected 53 tokens).
Subdirectories for splits (train/, dev/, test/) now include .json and .govobj.json files alongside the source .conllulex.
Added release preparation scripts under releaseutil/.
Added setup.py.
Fixed a very small bug in tquery.py affecting the display of sentence-final matches, and made minor changes in functionality involving null values and negative constraints; token-level attributes of multiword expressions; and a new option to filter by sentence length.

Assets 2

03 Jul 00:40

nschneid

STREUSLE 4.1

Added subtypes to verbal MWEs (871 tokens) per PARSEME Shared Task 1.1 guidelines; some MWE groupings revised in the process.
Minor improvements to SNACS (preposition/possessive supersense) annotations coordinated with updated guidelines.
Implementation of SNACS (preposition/possessive supersense) target identification heuristics from

Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, and Omri Abend. Comprehensive supersense disambiguation of English prepositions and possessives. Proceedings of the Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 15–20, 2018. http://people.cs.georgetown.edu/nschneid/p/pssdisambig.pdf
New utility scripts for listing/filtering tokens (tquery.py) and converting to and from an Excel-compatible CSV format.

Assets 2

11 Feb 03:19

nschneid

STREUSLE 4.0

Updated preposition supersenses to new annotation scheme (4398 tokens).
Annotated possessives (1117 tokens) using preposition supersenses.
Revised a considerable number of MWEs involving prepositions.
Added lexical category for every single-word or strong multiword expression.
New data format (.conllulex) integrates gold syntactic annotations from the Universal Dependencies project.

Assets 2

10 Dec 01:41

nschneid

STREUSLE 3.0

Originally released 2016-08-23. Also available at http://www.ark.cs.cmu.edu/LexSem/

Assets 2