Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 1.54 KB

2002.07306.md

File metadata and controls

13 lines (7 loc) · 1.54 KB

From English To Foreign Languages: Transferring Pre-trained Language Models, Tran, 2020

Paper, Tags: #nlp

We transfer an existing pretrained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERTbase model within 20h and a foreign BERTlarge within 46h.

In previous work, these are the only publicly available multilingual pretrained models to date: multilingual BERT, cross-lingual LM (XLM-R) and LASER.

First we build a bilingual LM of English and a target language. Starting from a pretrained English LM, we learn the target language specific parameters (i.e., word embeddings) while keeping the encoder layers of the pretrained English LM fixed. Then we fine-tune both English and target model to obtain the bilingual LM.

Artetxe et al., concurrently, transfer pretrained English models to other languages by fine-tuning only randomly initialized target word embeddings while keeping the Transformer encoder fixed. Their approach is simpler than ours but requires more compute to achive good results.

In this work we present a simple and effective approach for rapidly building a bilingual LM under a limited computational budget. Using BERT as the starting point, we demonstrate that our approach performs better than mBERT on 2 cross-lingual zero-shot sentence classification and dependency parsing. We also find that our bilingual LM is a powerful feature extractor in a supervised dependency parsing.