Skip to content

Latest commit

 

History

History
22 lines (12 loc) · 1.35 KB

1911.03681.md

File metadata and controls

22 lines (12 loc) · 1.35 KB

BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA, Poerner et al., 2019

Paper, Tags: #nlp, #embeddings

We argue that the good performance of BERT is partly due to reasoning about entity names, e.g., guessing that a person with an Italian-sounding name speaks Italian.

We show that BERT's precision drops dramatically when we filter certain easy-to-guess facts.

LAMA-UHN

The benchmark in the unsupervised QA is LAMA, where BERT has a knowledge base built by relation extraction. This is supposed to probe for 'factual and commonsense knowledge' inherent in LMs.

We construct LAMA-UHN (LAMA unhelpful names), a more factual subset by filtering out queries that are easy to answer from entity names alone.

  1. The string match filter deletes all KB triples where the correct answer is a case-insensitive substring of the subject entity name. This deletes up to 81% of triples
  2. The person name filter uses cloze-style questions to elicit name associations inherent in BERt, and deletes KB triples that correlate with them.

E-BERT

We propose E-BERT, an extension of BERT that replaces entity mentions with symbolic, wikipedia2vec entity embeddings.

We ensemble BERT and E-BERT by a) mean-pooling their outputs or b) concatenating the entity and its name with a slash symbol.