Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2022-ACL 2022-Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation #244

Open
thangk opened this issue Jun 25, 2024 · 0 comments
Assignees
Labels
literature-review Summary of the paper related to the work

Comments

@thangk
Copy link
Collaborator

thangk commented Jun 25, 2024

Link: arXiv

Main problem

Previous encoder-only pertaining approaches produce low translation quality and induce over-estimation issues, and model robustness is not there.

Proposed method

This paper proposes a simple strategy to overcome the limitations of the main problem, via two key components: in-domain pertaining and input adaptation.

My Summary

The author took the approach to jointly pretrain the decoder which produced a more diverse translation in the experiments and also reduced adequacy-related translation errors compared to the encoder-only pretaining approach. After applying the proposed method, the author observed up to 19% improvement in performance in some cases (W19 EN->DE). The end result is improved translation performance and model robustness. The future works include validating the findings with more Seq2Seq pertaining models and language pairs.

Datasets

(1) WMT19 English-German
(2) WMT16 English-Romanian (low resource)
(3) IWSLT17 English-French
(4) a subset from WMT19 English-German (for ablation studies)

@thangk thangk added the literature-review Summary of the paper related to the work label Jun 25, 2024
@thangk thangk self-assigned this Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature-review Summary of the paper related to the work
Projects
None yet
Development

No branches or pull requests

1 participant