You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Paper
Document Level NMT of Low-Resource Languages with Back Translation
Introduction
This paper discusses a submission to the WMT 2020 shared task on similar language translation, focusing on low-resource language pairs such as Marathi-Hindi. The authors explore the use of document-level neural machine translation (NMT), which incorporates contextual information across sentences to improve translation quality.
Main Problem
The main problem is the scarcity of parallel data for low-resource language pairs, such as Marathi-Hindi, which limits the effectiveness of neural machine translation (NMT).
Illustrative Example
Not mentioned
Input
A sentence in Marathi
Output
A sentence in Hindi
Motivation
The authors were motivated by the lack of sufficient parallel data for low-resource languages like Marathi-Hindi, which makes it difficult to train accurate NMT systems.
Related works and their gaps
There are similar works for low-resource language pair translation Pourdamghani and Knight (2017); Lakew et al. (2018); Costa-jussa` (2017). The paper addresses the gap in sentence-level NMT models, which fail to capture cross-sentence context. Previous work often overlooked document-level context in low-resource settings.
Contribution of this paper
The main contributions include:
Proposing a document-level NMT system for low-resource languages, which incorporates context-aware hierarchical attention networks (HANs). And using backtranslation to augment monolingual data for training NMT models. They claim that their result outperforms the other results in the field.
Proposed methods
Not included
Experiments
Dataset WMT20 similar language translation task for the Marathi-Hindi language pair.
Implementation
They have used the following tokenizer but they did not provide their resource for their source. https://github.com/anoopkunchukuttan/ indic_nlp_library
Gaps this work
The availability of document-level data may be limited in other low-resource languages. Given that back translation is involved, it's unclear how effectively it is being handled, as working with low-resource data could impact the quality of the back translation.
The text was updated successfully, but these errors were encountered:
Sepideh-Ahmadian
changed the title
Document Level NMT of Low-Resource Languages with Back Translation
2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation
Sep 27, 2024
Paper
Document Level NMT of Low-Resource Languages with Back Translation
Introduction
This paper discusses a submission to the WMT 2020 shared task on similar language translation, focusing on low-resource language pairs such as Marathi-Hindi. The authors explore the use of document-level neural machine translation (NMT), which incorporates contextual information across sentences to improve translation quality.
Main Problem
The main problem is the scarcity of parallel data for low-resource language pairs, such as Marathi-Hindi, which limits the effectiveness of neural machine translation (NMT).
Illustrative Example
Not mentioned
Input
A sentence in Marathi
Output
A sentence in Hindi
Motivation
The authors were motivated by the lack of sufficient parallel data for low-resource languages like Marathi-Hindi, which makes it difficult to train accurate NMT systems.
Related works and their gaps
There are similar works for low-resource language pair translation Pourdamghani and Knight (2017); Lakew et al. (2018); Costa-jussa` (2017). The paper addresses the gap in sentence-level NMT models, which fail to capture cross-sentence context. Previous work often overlooked document-level context in low-resource settings.
Contribution of this paper
The main contributions include:
Proposing a document-level NMT system for low-resource languages, which incorporates context-aware hierarchical attention networks (HANs). And using backtranslation to augment monolingual data for training NMT models. They claim that their result outperforms the other results in the field.
Proposed methods
Not included
Experiments
Dataset WMT20 similar language translation task for the Marathi-Hindi language pair.
Implementation
They have used the following tokenizer but they did not provide their resource for their source.
https://github.com/anoopkunchukuttan/ indic_nlp_library
Gaps this work
The availability of document-level data may be limited in other low-resource languages. Given that back translation is involved, it's unclear how effectively it is being handled, as working with low-resource data could impact the quality of the back translation.
The text was updated successfully, but these errors were encountered: