Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between "head" words and words, which modify those heads.
Example:
root
|
| +-------dobj---------+
| | |
nsubj | | +------det-----+ | +-----nmod------+
+--+ | | | | | | |
| | | | | +-nmod-+| | | +-case-+ |
+ | + | + + || + | + | |
I prefer the morning flight through Denver
Relations among the words are illustrated above the sentence with directed, labeled arcs from heads to dependents (+ indicates the dependent).
Models are evaluated on the Stanford Dependency conversion (v3.3.0) of the Penn Treebank with predicted POS-tags. Punctuation symbols are excluded from the evaluation. Evaluation metrics are unlabeled attachment score (UAS) and labeled attachment score (LAS). UAS does not consider the semantic relation (e.g. Subj) used to label the attachment between the head and the child, while LAS requires a semantic correct label for each attachment.Here, we also mention the predicted POS tagging accuracy.
The focus of the task is learning syntactic dependency parsers that can work in a real-world setting, starting from raw text, and that can work over many typologically different languages, even low-resource languages for which there is little or no training data, by exploiting a common syntactic annotation standard. This task has been made possible by the Universal Dependencies initiative (UD, http://universaldependencies.org), which has developed treebanks for 60+ languages with cross-linguistically consistent annotation and recoverability of the original raw texts.
Participating systems will have to find labeled syntactic dependencies between words, i.e. a syntactic head for each word, and a label classifying the type of the dependency relation. In addition to syntactic dependencies, prediction of morphology and lemmatization will be evaluated. There will be multiple test sets in various languages but all data sets will adhere to the common annotation style of UD. Participants will be asked to parse raw text where no gold-standard pre-processing (tokenization, lemmas, morphology) is available. Data preprocessed by a baseline system (UDPipe, https://ufal.mff.cuni.cz/udpipe) was provided so that the participants could focus on improving just one part of the processing pipeline. The organizers believed that this made the task reasonably accessible for everyone.
Model | LAS | MLAS | BLEX | Paper / Source | Code |
---|---|---|---|---|---|
Stanford (Qi et al.) | 74.16 | 62.08 | 65.28 | Universal Dependency Parsing from Scratch | Official |
UDPipe Future (Straka) | 73.11 | 61.25 | 64.49 | UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task | Official |
HIT-SCIR (Che et al.) | 75.84 | 59.78 | 65.33 | Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation | |
TurkuNLP (Kanerva et al.) | 73.28 | 60.99 | 66.09 | Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task | Official |
The following results are just for references:
Model | UAS | LAS | Note | Paper / Source |
---|---|---|---|---|
Stack-only RNNG (Kuncoro et al., 2017) | 95.8 | 94.6 | Constituent parser | What Do Recurrent Neural Network Grammars Learn About Syntax? |
Deep Biaffine (Dozat and Manning, 2017) | 95.75 | 94.22 | Stanford conversion v3.5.0 | Deep Biaffine Attention for Neural Dependency Parsing |
Semi-supervised LSTM-LM (Choe and Charniak, 2016) (Constituent parser) | 95.9 | 94.1 | Constituent parser | Parsing as Language Modeling |
Cross-lingual zero-shot parsing is the task of inferring the dependency parse of sentences from one language without any labeled training trees for that language.
Models are evaluated against the Universal Dependency Treebank v2.0. For each of the 6 target languages, models can use the trees of all other languages and English and are evaluated by the UAS and LAS on the target. The final score is the average score across the 6 target languages. The most common evaluation setup is to use gold POS-tags.
Model | UAS | LAS | Paper / Source | Code |
---|---|---|---|---|
Cross-Lingual ELMo (Schuster et al., 2019) | 84.2 | 77.3 | Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing | Official |
MALOPA (Ammar et al., 2016) | 70.5 | Many Languages, One Parser | Official | |
Guo et al. (2016) | 76.7 | 69.9 | A representation learning framework for multi-source transfer parsing |
Unsupervised dependency parsing is the task of inferring the dependency parse of sentences without any labeled training data.
As with supervised parsing, models are evaluated against the Penn Treebank. The most common evaluation setup is to use gold POS-tags as input and to evaluate systems using the unlabeled attachment score (also called 'directed dependency accuracy').
Model | UAS | Paper / Source |
---|---|---|
Iterative reranking (Le & Zuidema, 2015) | 66.2 | Unsupervised Dependency Parsing - Let’s Use Supervised Parsers |
Combined System (Spitkovsky et al., 2013) | 64.4 | Breaking Out of Local Optima with Count Transforms and Model Recombination - A Study in Grammar Induction |
Tree Substitution Grammar DMV (Blunsom & Cohn, 2010) | 55.7 | Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing |
Shared Logistic Normal DMV (Cohen & Smith, 2009) | 41.4 | Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction |
DMV (Klein & Manning, 2004) | 35.9 | Corpus-Based Induction of Syntactic Structure - Models of Dependency and Constituency |