2018, NAACL, Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction #84

Sepideh-Ahmadian · 2024-09-26T15:42:28Z

Paper
Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction

Introduction
This research proposes a solution for data sparsity (noisy and clean pairs) in grammar correction in the NLP domain. The lack of enough noisy and clear pairs is a bottleneck in developing machine translation models. by noising is means they add some grammatical error to sentences and then denoising (refine) it using a language model.

Main Problem
There is a need to provide a large corpus of parallel noisy and clean in the field of grammar correction. This article suggests alleviating this problem by generating synthetic noisy data from clean one. To generate data, they proposed a method inspired by back translation from machine translation.

Illustrative Example
Clean version: "Day after day, I get up at 8 o'clock"
synthesized noisy version: "I got up at 8 o'clock day after day."

Input
Noisy sentence (having grammatical mistakes)

Output
Clean and grammatically correct sentence

Motivation
The authors were motivated by the need to overcome the data sparsity issue in grammar correction. Grammar correction systems often require a large corpus of parallel noisy and clean sentence pairs, which are hard to come by. The motivation was to generate synthetic noisy sentences from clean ones, which would allow training neural models for grammar correction without the need for extensive manually curated data.

Related works and their gaps
The paper addresses gaps related to the lack of realistic, diverse error types in previous methods for synthesizing noisy data. (Brockett et al., 2006; Felice, 2016). Previous approaches often generated unrealistic noise or were limited to local context windows. (Linzen et al., 2016- Sennrich et al., 2015) The authors aim to generate more realistic, diverse noisy sentences through neural sequence transduction and back-translation techniques.

Contribution of this paper
The paper’s main contributions include:
Proposing a neural sequence transduction model for generating synthetic noisy data for grammar correction.
Introducing several beam search noising procedures to produce diverse and realistic noisy sentences.
Demonstrating that the synthesized data improves grammar correction performance, nearly matching the performance of models trained on large parallel corpora of real noisy data.

Proposed methods
Not included

Experiments
The model is evaluated on the CoNLL 2013 and 2014 datasets for grammar correction and the JFLEG test set, which evaluates fluency in grammar correction.

Implementation
Not mentioned

Gaps this work
I believe based on the limited training dataset the synthesized noisy data may not capture all real-world grammatical errors. Therefore the model does not present good performance in various domains.

hosseinfani · 2024-09-26T21:28:27Z

@Sepideh-Ahmadian
I had an idea of fixing the grammatical or any type of errors in a sentence using backtranslations in an unsupervised way. this is the same idea, right?

Sepideh-Ahmadian · 2024-09-26T21:59:30Z

@hosseinfani, The purpose of this research project is to generate additional data in the machine translation domain, creating a corpus of correct and noisy sentence pairs.
I think we should do some digging in Grammar Correction literature.

Sepideh-Ahmadian added the literature-review Summary of the paper related to the work label Sep 26, 2024

Sepideh-Ahmadian self-assigned this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018, NAACL, Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction #84

2018, NAACL, Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction #84

Sepideh-Ahmadian commented Sep 26, 2024 •

edited

Loading

hosseinfani commented Sep 26, 2024

Sepideh-Ahmadian commented Sep 26, 2024

2018, NAACL, Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction #84

2018, NAACL, Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction #84

Comments

Sepideh-Ahmadian commented Sep 26, 2024 • edited Loading

hosseinfani commented Sep 26, 2024

Sepideh-Ahmadian commented Sep 26, 2024

Sepideh-Ahmadian commented Sep 26, 2024 •

edited

Loading