Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024-LREC, COLING-Learning from Wrong Predictions in Low-Resource Neural Machine Translation #239

Open
thangk opened this issue Jun 24, 2024 · 0 comments
Assignees
Labels
literature-review Summary of the paper related to the work

Comments

@thangk
Copy link
Collaborator

thangk commented Jun 24, 2024

Link: ACL Anthology

Main problem

Low-resource languages produce wrong predictions and data augmentation methods are often not available to aid with the predictions.

Proposed method

The author proposes USKI (Unaligned Sentences Keytokens pre-training) which “leverages the relationships and similarities that exist between unaligned sentences. This method claims to improve the prediction by increasing the dataset by square of its initial quantity thus matching high-resource languages’ dataset size and result in improved performance.

My Summary

This is an interesting paper. According to the introduction section, there are over 7000 spoken languages and over half of them are estimated to be extinct by 2100. My mother tongue is a very low-resource language (LRL) as well. However, I am still skeptical on the accuracy of claims of squaring the LRL dataset and learning from unaligned sentences in other LRL datasets. This approach assumes every language has a translation result for a sentence even though it’s not a direct translation. I am not sure about this claim. However, it performance improvement isn’t super high, so it’s realistically understandable.

Datasets

Selkup-Russian
Evenki-Russian
Griko-Italian
Uzbek-English
Wolof-Ukrainian

@thangk thangk added the literature-review Summary of the paper related to the work label Jun 25, 2024
@thangk thangk changed the title Learning from Wrong Predictions in Low-Resource Neural Machine Translation (2024) 2024-LREC, COLING-Learning from Wrong Predictions in Low-Resource Neural Machine Translation Jun 25, 2024
@thangk thangk self-assigned this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature-review Summary of the paper related to the work
Projects
None yet
Development

No branches or pull requests

1 participant