Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking, Schick et al., 2019
Paper, Tags: #embeddings
We demonstrate that deep LMs still struggle to understand rare words. We adapt Attentive Mimicking to deep LMs, by using one-token approximation, a procedure enabling us to use AM even when the underlying LM uses subword-based tokenization.
OTA finds an embedding for a multi-token word or phrase w that's similar to the embedding that w would've received if it had been a single token. This allows us to train AM in the usual way by simply minicking the OTA-based embeddings of multi-token words.