Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

Open
Sepideh-Ahmadian opened this issue Sep 27, 2024 · 0 comments
Assignees
Labels
literature-review Summary of the paper related to the work

Comments

@Sepideh-Ahmadian
Copy link
Member

Paper
Document Level NMT of Low-Resource Languages with Back Translation

Introduction
This paper discusses a submission to the WMT 2020 shared task on similar language translation, focusing on low-resource language pairs such as Marathi-Hindi. The authors explore the use of document-level neural machine translation (NMT), which incorporates contextual information across sentences to improve translation quality.

Main Problem
The main problem is the scarcity of parallel data for low-resource language pairs, such as Marathi-Hindi, which limits the effectiveness of neural machine translation (NMT).

Illustrative Example
Not mentioned

Input
A sentence in Marathi

Output
A sentence in Hindi

Motivation
The authors were motivated by the lack of sufficient parallel data for low-resource languages like Marathi-Hindi, which makes it difficult to train accurate NMT systems.

Related works and their gaps
There are similar works for low-resource language pair translation Pourdamghani and Knight (2017); Lakew et al. (2018); Costa-jussa` (2017). The paper addresses the gap in sentence-level NMT models, which fail to capture cross-sentence context. Previous work often overlooked document-level context in low-resource settings.

Contribution of this paper
The main contributions include:
Proposing a document-level NMT system for low-resource languages, which incorporates context-aware hierarchical attention networks (HANs). And using backtranslation to augment monolingual data for training NMT models. They claim that their result outperforms the other results in the field.

Proposed methods
Not included

Experiments
Dataset WMT20 similar language translation task for the Marathi-Hindi language pair.

Implementation
They have used the following tokenizer but they did not provide their resource for their source.
https://github.com/anoopkunchukuttan/ indic_nlp_library

Gaps this work
The availability of document-level data may be limited in other low-resource languages. Given that back translation is involved, it's unclear how effectively it is being handled, as working with low-resource data could impact the quality of the back translation.

@Sepideh-Ahmadian Sepideh-Ahmadian added the literature-review Summary of the paper related to the work label Sep 27, 2024
@Sepideh-Ahmadian Sepideh-Ahmadian self-assigned this Sep 27, 2024
@Sepideh-Ahmadian Sepideh-Ahmadian changed the title Document Level NMT of Low-Resource Languages with Back Translation 2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature-review Summary of the paper related to the work
Projects
None yet
Development

No branches or pull requests

1 participant