Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 1.1 KB

1811.03866.md

File metadata and controls

16 lines (10 loc) · 1.1 KB

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context, Schick et al., 2018

Paper, Tags: #nlp, #embeddings

Currently there are 2 approaches for learning embeddings of novel words:

  • Learning an embedding from the novel word's surface-form (subword n-grams)
  • Learning an embedding from the context in which it occurs.

One of the problems of embeddings is that they need many observations of a word for the embedding to be reliable, so they struggle with small corpora and infrequent words. And because models are trained with a fixed vocabulary, they can't deal with oov words in inference. Some solutions include:

  • using subword information, exploiting information that can be extracted from the surface-form of the word
    • unemployable (?) -> un, employ, able
  • using context information, inferring embeddings for novel words from the words surrounding them

In this paper we leverage both sources of information and show that it results in large increases in embedding quality. As input we only require an embedding set and an unlabened corpus.