Skip to content

Latest commit

 

History

History
74 lines (38 loc) · 4.02 KB

1804.07853.md

File metadata and controls

74 lines (38 loc) · 4.02 KB

What’s Going On in Neural Constituency Parsers? An Analysis, Gaddy et al., 2018

Paper, Repo, Tags: #nlp

In recent years constituency parsing: from grammars and feature-rich lexicons to RNNs. This work analyzes the extent to which information provided directly by the model structure in classical system is still captured by neural methods.

We propose a model/parser that implicitly learns to encode much of the same information which was explicitly provided by grammars and lexicons in the past. We replace the externally predicted PoS tags with character-level word representations.

1. Past work

Past work in constituency parsing used context-free grammars with production rules governing adjacent labels to propagate information and capture correlations between output decisions. Many recent parsers no longer have explicit grammar production rules, but still use information about other predictions, allowing them to capture output correlations.

We also show that we can completely remove output correlations from our model with a variant of our parser that makes independent span label decisions without any tree constraints.

We find that the same sources of information that were effective for grammar-driven parsers are also captured by parsers based on recurrent neural networks.

2. Model

We propose a span-based parsing model which combines components from several recent NNs. The model has only a single scoring function s(i, j, l), assigning a real-valued score to every label l for each span (i, j) in an input sentence.

In contrast to other chart parsers, our model can directly score n-ary trees without binarization or other tree transformations. The parsing problem is to find the tree with the highest score. Our implementation of s(i, j, l) in 3 steps:

  1. Word representation
  2. Span representation
  3. Label scoring

2.1. Word representation

We have a separate embedding for each word type in the training vocabulary and map all unknown words at test time to a single token.

Character-level representations alleviate the unknown word problem by working with smaller, more frequent units. We obtain a character-level representation for a word by running it through a bidirectional character LSTM and concatenating the final forward and backward outputs.

Complete representation of a word = concatenation of its word embedding + its character LSTM representation.

2.2. Span representation

To build up to spans, we first run a bidirectional LSTM over the sequence of word representations for an input sentence to obtain context-sensitive forward and backward representations fi and bi for each fencepost i. Then we concatenate the corresponding forward and backward span differences.

2.3. Label scoring

We feed the span representation through a one-layer feedforward NN whose output dimensionality equals the number of possible labels. The score of a specific label l is the corresponding component of the output vector (formula in paper).

We introduce an auxiliary empty label 0 with s(i, j, 0) = 0 for all (i, j) to handle spans that are not constituents.

We can independently select the best label for the current span and the best split point, where the score of a split is the sum of the best scores for the corresponding subtree.

To parse the full sentence, we compute sbest(0, n) using a bottom-up chart decoder, then traverse backpointers to recover the tree achieving that score.

Training

We use margin-based training. More in paper.

Results

More in paper.

3. Output correlations

Output correlations are information about compatibility between outputs in a structured prediction model.

4. Lexical representations

We investigate several common choices for lexical representations of words and their role in neural parsing.

4.1. Alternate word representation

4.2. Predicting word features

5. Context in the sentence LSTM

In this section, we analyze where the information in the sentence-level LSTM hidden vectors comes from.