Dual Learning for Machine Translation

TLDR; The authors finetune an FR -> EN NMT model using a RL-based dual game. 1. Pick a French sentence from a monolingual corpus and translate it to EN. 2. Use an EN language model to get a reward for the translation 3. Translate the translation back into FR using an EN -> FR system. 4. Get a reward based on the consistency between original and reconstructed sentence. Training this architecture using Policy Gradient authors can make efficient use of monolingual data and show that a system trained on only 10% of parallel data and finetuned with monolingual data achieves comparable BLUE scores as a system trained on the full set of parallel data.

Key Points

Making efficient use of monolingual data to improve NMT systems is a challenge
Two Agent communication game: Agent A only knows language A and agent B only knows language B. A send message through a noisy translation channel, B receives message, checks its correctness, and sends it back through another noisy translation channel. A checks if it is consistent with the original message. Translation channels are then improves based on the feedback.
Pieces required: LanguageModel(A), LanguageModel(B), TranslationModel(A->B), TranslationModel(B->A). Monolingual Data.
Total reward is linear combination of: r1 = LM(translated_message), r2 = log(P(original_message | translated_message)
Samples are based on beam search using the average value as the gradient approximation
EN -> FR pretrained on 100% of parallel data: 29.92 to 32.06 BLEU
EN -> FR pretrained on 10% of parallel data: 25.73 to 28.73 BLEU
FR -> EN pretrained on 100% of parallel data: 27.49 to 29.78 BLEU
FR -> EN pretrained on 10% of parallel data: 22.27 to 27.50 BLEU

Some Notes

I think the idea is very interesting and we'll see a lot related work coming out of this. It would be even more amazing if the architecture was trained from scratch using monolingual data only. Due the the high variance of RL methods this is probably quite hard to do though.
I think the key issue is that the rewards are quite noisy, as is the case with MT in general. Neither the language model nor the BLEU scores gives good feedback for the "correctness" of a translation.
I wonder why there is such a huge jump in BLEU scores for FR->EN on 10% of data, but not for EN->FR on the same amount of data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dual-learning-mt.md

dual-learning-mt.md

Dual Learning for Machine Translation

Key Points

Some Notes

Files

dual-learning-mt.md

Latest commit

History

dual-learning-mt.md

File metadata and controls

Dual Learning for Machine Translation

Key Points

Some Notes