From English To Foreign Languages: Transferring Pre-trained Language Models, Tran, 2020

Paper, Tags: #nlp

We transfer an existing pretrained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERTbase model within 20h and a foreign BERTlarge within 46h.

In previous work, these are the only publicly available multilingual pretrained models to date: multilingual BERT, cross-lingual LM (XLM-R) and LASER.

First we build a bilingual LM of English and a target language. Starting from a pretrained English LM, we learn the target language specific parameters (i.e., word embeddings) while keeping the encoder layers of the pretrained English LM fixed. Then we fine-tune both English and target model to obtain the bilingual LM.

Artetxe et al., concurrently, transfer pretrained English models to other languages by fine-tuning only randomly initialized target word embeddings while keeping the Transformer encoder fixed. Their approach is simpler than ours but requires more compute to achive good results.

In this work we present a simple and effective approach for rapidly building a bilingual LM under a limited computational budget. Using BERT as the starting point, we demonstrate that our approach performs better than mBERT on 2 cross-lingual zero-shot sentence classification and dependency parsing. We also find that our bilingual LM is a powerful feature extractor in a supervised dependency parsing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2002.07306.md

2002.07306.md

From English To Foreign Languages: Transferring Pre-trained Language Models, Tran, 2020

Paper, Tags: #nlp

Files

2002.07306.md

Latest commit

History

2002.07306.md

File metadata and controls

From English To Foreign Languages: Transferring Pre-trained Language Models, Tran, 2020

Paper, Tags: #nlp