Learning Semantic Representations for Novel Words: Leveraging Both Form and Context, Schick et al., 2018

Paper, Tags: #nlp, #embeddings

Currently there are 2 approaches for learning embeddings of novel words:

Learning an embedding from the novel word's surface-form (subword n-grams)
Learning an embedding from the context in which it occurs.

One of the problems of embeddings is that they need many observations of a word for the embedding to be reliable, so they struggle with small corpora and infrequent words. And because models are trained with a fixed vocabulary, they can't deal with oov words in inference. Some solutions include:

using subword information, exploiting information that can be extracted from the surface-form of the word
- unemployable (?) -> un, employ, able
using context information, inferring embeddings for novel words from the words surrounding them

In this paper we leverage both sources of information and show that it results in large increases in embedding quality. As input we only require an embedding set and an unlabened corpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1811.03866.md

1811.03866.md

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context, Schick et al., 2018

Paper, Tags: #nlp, #embeddings

Files

1811.03866.md

Latest commit

History

1811.03866.md

File metadata and controls

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context, Schick et al., 2018

Paper, Tags: #nlp, #embeddings