Paper, Tags: #nlp
We transfer an existing pretrained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERTbase model within 20h and a foreign BERTlarge within 46h.
In previous work, these are the only publicly available multilingual pretrained models to date: multilingual BERT, cross-lingual LM (XLM-R) and LASER.
First we build a bilingual LM of English and a target language. Starting from a pretrained English LM, we learn the target language specific parameters (i.e., word embeddings) while keeping the encoder layers of the pretrained English LM fixed. Then we fine-tune both English and target model to obtain the bilingual LM.
Artetxe et al., concurrently, transfer pretrained English models to other languages by fine-tuning only randomly initialized target word embeddings while keeping the Transformer encoder fixed. Their approach is simpler than ours but requires more compute to achive good results.
In this work we present a simple and effective approach for rapidly building a bilingual LM under a limited computational budget. Using BERT as the starting point, we demonstrate that our approach performs better than mBERT on 2 cross-lingual zero-shot sentence classification and dependency parsing. We also find that our bilingual LM is a powerful feature extractor in a supervised dependency parsing.