Scaling Neural Machine Translation (Ott et al., 2018)

This page includes instructions for reproducing results from the paper Scaling Neural Machine Translation (Ott et al., 2018).

Pre-trained models

Model	Description	Dataset	Download
`transformer.wmt14.en-fr`	Transformer (Ott et al., 2018)	WMT14 English-French	model: download (.tar.bz2) newstest2014: download (.tar.bz2)
`transformer.wmt16.en-de`	Transformer (Ott et al., 2018)	WMT16 English-German	model: download (.tar.bz2) newstest2014: download (.tar.bz2)

Training a new model on WMT'16 En-De

First download the preprocessed WMT'16 En-De data provided by Google.

Then:

1. Extract the WMT'16 En-De data

TEXT=wmt16_en_de_bpe32k
mkdir -p $TEXT
tar -xzvf wmt16_en_de.tar.gz -C $TEXT

2. Preprocess the dataset with a joined dictionary

fairseq-preprocess \
    --source-lang en --target-lang de \
    --trainpref $TEXT/train.tok.clean.bpe.32000 \
    --validpref $TEXT/newstest2013.tok.bpe.32000 \
    --testpref $TEXT/newstest2014.tok.bpe.32000 \
    --destdir data-bin/wmt16_en_de_bpe32k \
    --nwordssrc 32768 --nwordstgt 32768 \
    --joined-dictionary \
    --workers 20

3. Train a model

fairseq-train \
    data-bin/wmt16_en_de_bpe32k \
    --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
    --dropout 0.3 --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 3584 \
    --fp16

Note that the --fp16 flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.

If you want to train the above model with big batches (assuming your machine has 8 GPUs):

add --update-freq 16 to simulate training on 8x16=128 GPUs
increase the learning rate; 0.001 works well for big batches

4. Evaluate

fairseq-generate \
    data-bin/wmt16_en_de_bpe32k \
    --path checkpoints/checkpoint_best.pt \
    --beam 4 --lenpen 0.6 --remove-bpe

Citation

@inproceedings{ott2018scaling,
  title = {Scaling Neural Machine Translation},
  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
  year = 2018,
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scaling Neural Machine Translation (Ott et al., 2018)

Pre-trained models

Training a new model on WMT'16 En-De

1. Extract the WMT'16 En-De data

2. Preprocess the dataset with a joined dictionary

3. Train a model

4. Evaluate

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scaling Neural Machine Translation (Ott et al., 2018)

Pre-trained models

Training a new model on WMT'16 En-De

1. Extract the WMT'16 En-De data

2. Preprocess the dataset with a joined dictionary

3. Train a model

4. Evaluate

Citation