2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

Sepideh-Ahmadian · 2024-09-27T20:19:34Z

Paper
Document Level NMT of Low-Resource Languages with Back Translation

Introduction
This paper discusses a submission to the WMT 2020 shared task on similar language translation, focusing on low-resource language pairs such as Marathi-Hindi. The authors explore the use of document-level neural machine translation (NMT), which incorporates contextual information across sentences to improve translation quality.

Main Problem
The main problem is the scarcity of parallel data for low-resource language pairs, such as Marathi-Hindi, which limits the effectiveness of neural machine translation (NMT).

Illustrative Example
Not mentioned

Input
A sentence in Marathi

Output
A sentence in Hindi

Motivation
The authors were motivated by the lack of sufficient parallel data for low-resource languages like Marathi-Hindi, which makes it difficult to train accurate NMT systems.

Related works and their gaps
There are similar works for low-resource language pair translation Pourdamghani and Knight (2017); Lakew et al. (2018); Costa-jussa` (2017). The paper addresses the gap in sentence-level NMT models, which fail to capture cross-sentence context. Previous work often overlooked document-level context in low-resource settings.

Contribution of this paper
The main contributions include:
Proposing a document-level NMT system for low-resource languages, which incorporates context-aware hierarchical attention networks (HANs). And using backtranslation to augment monolingual data for training NMT models. They claim that their result outperforms the other results in the field.

Proposed methods
Not included

Experiments
Dataset WMT20 similar language translation task for the Marathi-Hindi language pair.

Implementation
They have used the following tokenizer but they did not provide their resource for their source.
https://github.com/anoopkunchukuttan/ indic_nlp_library

Gaps this work
The availability of document-level data may be limited in other low-resource languages. Given that back translation is involved, it's unclear how effectively it is being handled, as working with low-resource data could impact the quality of the back translation.

Sepideh-Ahmadian added the literature-review Summary of the paper related to the work label Sep 27, 2024

Sepideh-Ahmadian self-assigned this Sep 27, 2024

Sepideh-Ahmadian changed the title ~~Document Level NMT of Low-Resource Languages with Back Translation~~ 2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

Sepideh-Ahmadian commented Sep 27, 2024

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

Comments

Sepideh-Ahmadian commented Sep 27, 2024