Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 2.02 KB

1812.06280.md

File metadata and controls

32 lines (21 loc) · 2.02 KB

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia, Yamada et al., 2018

Paper, Demo, Repo, Tags: #nlp, #embeddings

Wikipedia2Vec is an open source tool for learning the embeddings of words and entities from Wikipedia. We have pretrained embeddings for 12 languages in the repo.

In our experiments, our tool achieved a stateof-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets.

Entity embeddings provide rich information (or knowledge) regarding entities available in KB using fixed continuous vectors.

These embeddings can be used to enhance the performance of state-of-the-art contextualized word embeddings like BERT.

Our proposed tool jointly learns the embeddings of words and entities, and places semantically similar words and entities close to one another in the vector space.

Traditionally, entity embeddings from a knowledge base KB have been based on conventional word embedding models like skip-gram.

Model

The model is:

  • word-based skip-gram model +
    • the neighboring words of each word are used as contexts
    • predict the neighboring words of the given word
  • anchor context model +
    • the neighboring words of a hyperlink pointing to an entity are used as contexts
    • attempts to place similar words and entities close to one another in the vector space using hyperlinks and their neighboring words in Wikipedia
    • predict surrounding words given eache ntity
  • link graph model
    • the neighboring entities of each entity in Wikipedia's link graph are used as contexts
    • predict the neighboring entities of each entity in the Wikipedia’s link graph – an undirected graph whose nodes are entities and the edges represent the presence of hyperlinks between the entities

The three losses are added and optimized with stochastig gradient descent.