Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia, Yamada et al., 2018

Paper, Demo, Repo, Tags: #nlp, #embeddings

Wikipedia2Vec is an open source tool for learning the embeddings of words and entities from Wikipedia. We have pretrained embeddings for 12 languages in the repo.

In our experiments, our tool achieved a stateof-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets.

Entity embeddings provide rich information (or knowledge) regarding entities available in KB using fixed continuous vectors.

These embeddings can be used to enhance the performance of state-of-the-art contextualized word embeddings like BERT.

Our proposed tool jointly learns the embeddings of words and entities, and places semantically similar words and entities close to one another in the vector space.

Traditionally, entity embeddings from a knowledge base KB have been based on conventional word embedding models like skip-gram.

Model

The model is:

word-based skip-gram model +
- the neighboring words of each word are used as contexts
- predict the neighboring words of the given word
anchor context model +
- the neighboring words of a hyperlink pointing to an entity are used as contexts
- attempts to place similar words and entities close to one another in the vector space using hyperlinks and their neighboring words in Wikipedia
- predict surrounding words given eache ntity
link graph model
- the neighboring entities of each entity in Wikipedia's link graph are used as contexts
- predict the neighboring entities of each entity in the Wikipedia’s link graph – an undirected graph whose nodes are entities and the edges represent the presence of hyperlinks between the entities

The three losses are added and optimized with stochastig gradient descent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1812.06280.md

1812.06280.md

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia, Yamada et al., 2018

Paper, Demo, Repo, Tags: #nlp, #embeddings

Model

Files

1812.06280.md

Latest commit

History

1812.06280.md

File metadata and controls

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia, Yamada et al., 2018

Paper, Demo, Repo, Tags: #nlp, #embeddings

Model