Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2018 NGT Iterative Back-Translation for Neural Machine Translation #33

Open
IsaacJ60 opened this issue Jul 12, 2023 · 0 comments
Open
Assignees
Labels
literature-review Summary of the paper related to the work

Comments

@IsaacJ60
Copy link
Member

Main problem

The main problem addressed in the paper is the limited availability of parallel data for training Neural Machine Translation (NMT) systems. NMT models require large amounts of parallel data to achieve good translation quality, but such data may be scarce for certain language pairs or domains. Using monolingual data to perform neural machine translation is difficult. By using back-translation, we are able to generate synthetic corpora, which we can append onto the original dataset.

Proposed Method

The paper introduces the Iterative Back-Translation (IBT) method as a solution to improve NMT performance in scenarios with limited parallel data. IBT involves generating synthetic parallel data by back-translating target monolingual data into the source language. The proposed method iteratively refines the back-translation process, improving translation quality with each iteration. Simply, this means performing back-translation on previously back-translated data. Techniques such as length control, confidence thresholding, and model selection are introduced to enhance the back-translation process.

Input/Output

The input to the proposed method is limited parallel data, consisting of source sentences and their translations, along with monolingual data in both the source and target languages. The output is an iteratively refined NMT model, which is trained on augmented datasets comprising the original limited parallel data and the synthetic parallel data generated through back-translation.

Example

The paper presents experimental results on various language pairs, including German-English, English-French, and English-Farsi. It demonstrates the effectiveness of the IBT method in improving translation quality. For instance, in the German-English high-resource scenario, the final IBT systems outperform baseline NMT models and achieve the best reported BLEU scores for the WMT 2017 task.

Related Works &Their Gaps

The paper discusses related works such as traditional back-translation approaches in statistical machine translation and modern NMT research. It mentions previous studies that explored the use of monolingual data, linguistic constraints, or separately trained language models. However, it highlights the gap that these approaches have not proven to be as successful as back-translation in leveraging monolingual data for improving translation quality. The paper also mentions the dual learning approach, but notes that it did not achieve significant gains in their experiments. Thus, the proposed IBT method fills the gap by providing a simple and effective approach for leveraging back-translation to enhance NMT systems.

@DelaramRajaei DelaramRajaei added the literature-review Summary of the paper related to the work label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature-review Summary of the paper related to the work
Projects
None yet
Development

No branches or pull requests

2 participants