Skip to content

Repository to collect and categorize Grammatical Error Correction papers.

Notifications You must be signed in to change notification settings

gotutiyan/GEC-Info

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 

Repository files navigation

GEC Information

Policy

  • This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
  • Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
    • Authors and conferences are also not be considered.
  • The papers are limited to refereed papers in international conferences for now.
    • This is not the case for survey papers.

Contributing

  • Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
  • You can also request to add papers as an issue.

It can also be viewed on GitHub Pages

Overview

Surveys

Title Year Page Note
"Automated Grammatical Error Correction: A Comprehensive Review" 2017 [paper]
"A Comprehensive Survey of Grammar Error Correction" 2020 [paper]
"Recent Trends in the Use of Deep Learning Models for Grammar Error Handling" 2020 [paper]
"Grammatical Error Correction: A Survey of the State of the Art" 2022 [paper]

Shared Tasks

Name Year Paper Note
HOO 2011 2011 [paper] [website]
HOO 2012 2012 [paper] [website]
CoNLL-2013 2013 [paper] [website]
CoNLL-2014 2014 [paper] [website] [system outputs]
BEA-2019 2019 [paper] [website] [system outpus]

Datasets

For Training (Real Data)

Name Year Paper Note
EFCamDat 2014 [Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat)] [The EF Cambridge Open Language Database (efcamdat) Information for Users] [download v2]
GitHub Typo Corpus 2019 [GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors] [download]
W&I+LOCNESS on BEA2019 Shared Task 2019 [Developing an Automated Writing Placement System for ESL Learners ] [direct download]
FCE 2011 [A New Dataset and Method for Automatically Grading ESOL Texts] [direct download]
NUCLE 2013 [Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English] [download]
ICNALE 2013 [The ICNALE and Sophisticated Contrastive Interlanguage Analysis of Asian Learners of English] [download]
Lang-8 2011 [Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners] [website] [download: Fill this form]
Related tools are useful. See the [Other Tools] for the details.

For Training (Pseudo/Systhetic Data)

Name Year Paper Note
PIE-synthetic 2019 [Parallel Iterative Edit Models for Local Sequence Transduction] [download]

For Evaluation

Name Year Paper Note
KJ 2011 [Creating a manually error-tagged and shallow-parsed learner corpus] [download]
CoNLL-2013 2013 [The CoNLL-2013 Shared Task on Grammatical Error Correction] [direct download]
CoNLL-2014 2014 [The CoNLL-2014 Shared Task on Grammatical Error Correction] [direct download]
10 additional annotations for the CoNLL14 2015 [How Far are We from Fully Automatic High Quality Grammatical Error Correction?] [direct download]
8 additional annotations for the CoNLL14 2016 [Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality] [download]
JFLEG 2017 [JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction] [download]
GMEG-Data 2019 [Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses] [code]
CWEB 2020 [Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses] [download]
ErAConD 2021 [ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction] [data]
Training dataset is also included.
RobustGEC 2023 RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation [code]
CSW Lang-8 Dataset 2024 Grammatical Error Correction for Code-Switched Sentences by Learners of English [code/data]

Performance measures

Reference-based

Name Year Paper Note
M^2 Scorer 2012 [Better Evaluation for Grammatical Error Correction] [code]
It is often used to evaluate CoNLL-2013 and CoNLL-2014.
GLEU 2015 [Ground Truth for Grammatical Error Correction Metrics]
[GLEU Without Tuning]
[code]
It is often used to evaluate JFLEG.
I-measure 2015 [Towards a standard evaluation method for grammatical error detection and correction] [code]
Code is available only python 2.x.
ERRANT 2016 [Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments]
[Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction]
[code]
It is often used to evaluate BEA-2019.
GMEG-Metric 2019 [Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses] [code]
Ridge regression using existing metrics (e.g. ERRANT, GLEU) as features.
GoToScorer 2019 [Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation] [code]
It can be evaluated systems considering error correction difficulty.
PT-M2 2022 Revisiting Grammatical Error Correction Evaluation and Beyond [code]
CLEME 2023 CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction [code]

Reference-less

Keywords / Overview Year Paper Note
Scoring by counting the errors 2016 [There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction] [code]
Fluency + grammaticality + meaning preservation 2017 [Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems]
USim 2018 [Reference-less Measure of Faithfulness for Grammatical Error Correction] [code]
SOME 2020 [SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction] [code]
Scribendi Score 2021 [Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric] [Unofficial code]
IMPARA 2022 IMPARA: Impact-Based Metric for GEC Using Parallel Data [code]

Quality Estimation

Keywords / Overview Year Paper Note
2022 Proficiency Matters Quality Estimation in Grammatical Error Correction

Models / Architectures

Supervised

Keywords / Overview Year Paper Note
2006 Correcting ESL Errors Using Phrasal SMT Techniques
2009 Using First and Second Language Models to Correct Preposition Errors in Second Language Authoring
2010 Generating Confusion Sets for Context-Sensitive Error Correction
2011 Correcting Semantic Collocation Errors with L1-induced Paraphrases
2012 Tense and Aspect Error Correction for ESL Learners Using Global Context
2012 Exploring Grammatical Error Correction with Not-So-Crummy Machine Translation
2014 Grammatical error correction using hybrid systems and type filtering CoNLL2014: CAMB
2014 The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation CoNLL2014: AMU
2014 The Illinois-Columbia System in the CoNLL-2014 Shared Task CoNLL2014: CUUI
2014 RACAI GEC – A hybrid approach to Grammatical Error Correction CoNLL2014: RAC
2014 Grammatical Error Detection Using Tagger Disagreement CoNLL2014: UFC
2014 CoNLL 2014 Shared Task: Grammatical Error Correction with a Syntactic N-gram Language Model from a Big Corpora CoNLL2014: IPN
2014 Tuning a Grammar Correction System for Increased Precision CoNLL2014: IITB
2014 POSTECH Grammatical Error Correction System in the CoNLL-2014 Shared Task CoNLL2014: POST
2014 Grammatical Error Detection and Correction using a Single Maximum Entropy Model CoNLL2014: SJTU
2014 Factored Statistical Machine Translation for Grammatical Error Correction CoNLL2014: UMC
2014 NTHU at the CoNLL-2014 Shared Task CoNLL2014: NTHU
2014 A Unified Framework for Grammar Error Correction CoNLL2014: PKU
2016 Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction
2016 Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models
Phrase-based SMT 2016 [Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction] [code]
Neural reinforcement learning 2017 [Grammatical Error Correction with Neural Reinforcement Learning] [code]
Word-level SMT enhanced NNJMs + char-based SMT 2017 [Connecting the Dots: Towards Human-Level Grammatical Error Correction] [code]
First NMT-based approach 2016 [Grammatical error correction using neural machine translation]
2016 Neural Network Translation Models for Grammatical Error Correction
SMEG 2017 [Systematically Adapting Machine Translation for Grammatical Error Correction] [code]
A nested attention (word and char attention) 2017 [A Nested Attention Neural Hybrid Model for Grammatical Error Correction]
Re-ranking N-best sentence (by SMT) with LSTM-based GED 2017 [Neural Sequence-Labelling Models for Grammatical Error Correction]
CNN-based Encder-Decoder approach 2018 [A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction] [code]
Fluency boosting learning 2018 [Fluency Boost Learning and Inference for Neural Grammatical Error Correction] [code]
ACL2018
Fluency boosting learning (added round-way error correction) 2018 [Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study] [code]
Microsoft Research Technical Report
Hybrid SMT and NMT 2018 [Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation]
Copy-Augmented Architecture 2019 [Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data] [code]
Consider a few previous sentences 2019 [Cross-Sentence Grammatical Error Correction] [code]
PIE 2019 [Parallel Iterative Edit Models for Local Sequence Transduction] [code]
LaserTagger 2019 [Encode, Tag, Realize: High-Precision Text Editing] [code]
Pretrain by DAE + sequential transfer learning 2019 [A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning] [code]
BEA-2019: Kakao&Brain
Use sentence-level error dectection 2019 [The AIP-Tohoku System at the BEA-2019 Shared Task] BEA-2019: AIP-Tohoku
Four CNN + eight Transformer 2019 [The LAIX Systems in the BEA-2019 GEC Shared Task] BEA-2019: LAIX
Combine Transformer+CNN with FST + Re-ranking 2019 [Neural and FST-based approaches to grammatical error correction] BEA-2019: CAMB-CLED
Transformer seq2seq + BERT re-ranker 2019 [TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track] BEA-2019: TMU
Apply noisy channel with BERT and GPT-2 as LM 2019 [Noisy Channel for Low Resource Grammatical Error Correction] BEA-2019: Siteimprove
Use Finite State Transducers 2019 [Neural Grammatical Error Correction with Finite State Transducers]
GECToR 2020 [GECToR – Grammatical Error Correction: Tag, Not Rewrite] [code]
BERT-fuse 2020 [Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction] [code]
Adversarial approach (G:seq2seq D:sentence-pair classification) 2020 [Adversarial Grammatical Error Correction]
Erroneous span correction and detection 2020 [Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction]
Document-level approach 2020 [Document-level grammatical error correction] [code]
Seq2Edits 2020 [Seq2Edits: Sequence Transduction Using Span-level Edit Operations] [code]
Beam search considering copy probability 2020 [Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction]
BART-based 2020 [Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model] [code]
VERNet 2021 [Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction] [code]
Shallow Aggressive Decoding 2021 [Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding] [code]
T5-based 2021 [A Simple Recipe for Multilingual Grammatical Error Correction] [code]
GAN-like sequence labeling 2021 [Grammatical Error Correction as GAN-like Sequence Labeling]
Use multiclass GED for Transformer seq2seq and reranking 2021 [Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems]
GEC for writing improvement model adapted to the writer’s L1 2021 [Beyond Grammatical Error Correction: Improving L1-influenced research writing in English using pre-trained encoder-decoder models] [code]
Constrastive Leaning approach 2021 [Grammatical Error Correction with Contrastive Learning in Low Error Density Domains] [code]
Sequence Span Rewriting 2021 [Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting]
Dependent Self-Attention (DSA) 2021 [Grammatical Error Correction with Dependency Distance]
2021 Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction [code]
A GEC model using only 11.6MB 2021 An efficient system for grammatical error correction on mobile devices
2022 Interpretability for Language Learners Using Example-Based Grammatical Error Correction [code]
2022 Type-Driven Multi-Turn Corrections for Grammatical Error Correction [code]
GECToR Large 2022 Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction [code] [Author's Master Thesis]
2022 Position Offset Label Prediction for Grammatical Error Correction
SynGEC 2022 SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser [code]
2022 Improved grammatical error correction by ranking elementary edits [code]
EdiT5 2022 EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start [code]
GEC-DePenD 2023 GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [code]
TemplateGEC 2023 TemplateGEC: Improving Grammatical Error Correction with Detection Template [code]
LET 2023 LET: Leveraging Error Type Information for Grammatical Error Correction
2023 Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction
Use speech information 2023 Improving Grammatical Error Correction with Multimodal Feature Integration [code]
2023 Improving Autoregressive Grammatical Error Correction with Non-autoregressive Models
2023 Reducing Sequence Length by Predicting Edit Spans with Large Language Models
2024 No Error Left Behind: Multilingual Grammatical Error Correction with Pre-trained Translation Models
EDU Copy Mechanism 2024 Improving Copy-oriented Text Generation via EDU Copy Mechanism
2024 Improving Grammatical Error Correction by Correction Acceptability Discrimination

Unsupervised

Keywords / Overview Year Paper Note
5-gram LM based approach 2018 [Language Model Based Grammatical Error Correction without Annotated Training Data] [code]
Train GRU models for each of five error types 2018 [A Simple but Effective Classification Model for Grammatical Error Correction]
Use Finite State Transducers 2019 [Neural Grammatical Error Correction with Finite State Transducers]
LSTM tagger for word coice task 2019 [Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems] [code]
Use LM (BERT, GPT-1,2) 2019 [The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction]
Create erroneous data from monolingual data 2019 [Minimally-Augmented Grammatical Error Correction] Supervised setting is also performed
LM-Critic 2021 [LM-Critic: Language Models for Unsupervised Grammatical Error Correction] [code]
Supervised setting is also performed
2023 Unsupervised Grammatical Error Correction Rivaling Supervised Methods [code]

Ensemble Methods

Keywords / Overview Year Paper Note
Use MENT 2014 System Combination for Grammatical Error Correction
2016 Grammatical Error Correction: Machine Translation and Classifiers
2019 [Learning to combine Grammatical Error Corrections] [code]
Diversity-Driven Combination (DDC) 2021 [Diversity-Driven Combination for Grammatical Error Correction] [code]
Select a system for each error type with IP 2021 [System Combination for Grammatical Error Correction Based on Integer Programming] [code]
2022 Frustratingly Easy System Combination for Grammatical Error Correction [code]
GRECO 2023 System Combination via Quality Estimation for Grammatical Error Correction [code]

Strategies

Keywords / Overview Year Paper Note
2012 A Beam-Search Decoder for Grammatical Error Correction
2016 Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation
2016 Candidate re-ranking for SMT-based grammatical error correction
Some methods that can be adapted neural MT 2018 [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task] [code]
Iterative decoding 2018 [Weakly Supervised Grammatical Error Correction using Iterative Decoding]
2019 Controlling Grammatical Error Correction Using Word Edit Rate
Add adversarial examples continually 2020 [Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples]
Cross-lingual Transfer Learning 2020 [Cross-lingual Transfer Learning for Grammatical Error Correction]
Data Weighted Training Strategies 2020 [Data Weighted Training Strategies for Grammatical Error Correction]
Align-and-Predict Decoding 2022 Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction [code]
2023 Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting [code]
2023 An Extended Sequence Tagging Vocabulary for Grammatical Error Correction [code]
BTR 2023 Bidirectional Transformer Reranker for Grammatical Error Correction [code]
2023 Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule
MainGEC 2023 Grammatical Error Correction via Mixed-Grained Weighted Training
2023 Improving Seq2Seq Grammatical Error Correction via Decoding Interventions [code]

Data Augmentation

Keywords / Overview Year Paper Note
Make artificial errors in a probabilistic manner 2014 [Generating artificial errors for grammatical error correction]
Back translation 2016 [Improving Neural Machine Translation Models with Monolingual Data]
SMT based MT + pattern extraction 2017 [Artificial Error Generation with Machine Translation and Syntactic Patterns]
Diverse back translation with noisy beam search 2018 [Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction]
DirectNoise 2019 [Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data] The method was first called "DirectNoise" by [kiyono+ 2019]?
Substituting words using confusion sets 2019 [Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data] [synthetic data]
BEA-2019: UEDIN-MS
Error+Context Dictionary 2019 [Improving Precision of Grammatical Error Correction with a Cheat Sheet] BEA-2019: Buffalo
Use Google Translate for making pseudo data 2019 [(Almost) Unsupervised Grammatical Error Correction using a Synthetic Comparable Corpus] BEA-2019: TMU in Low Resource
Inverted Spellchecker + Patterns+POS 2019 [A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction]
Methods for erroneous data generation 2019 [Erroneous data generation for Grammatical Error Correction] BEA-2019: Shuyao
Wikipedia revision & Wikipedia round-trip translation 2019 [Corpora Generation for Grammatical Error Correction]
Create confusion sets by edit distance, word embeddings, spell-breaking 2019 [Minimally-Augmented Grammatical Error Correction] Supervised setting is also performed
Explore methods to make pseude data, seed corpus, training settings 2019 [An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction] [code]
2020 [Massive Exploration of Pseudo Data for Grammatical Error Correction]
Control error rates and error types by rule-based corruption and filtered back-translation 2020 [Controllable Data Synthesis Method for Grammatical Error Correction]
Use machine translation pairs 2020 [Improving Grammatical Error Correction with Machine Translation Pairs]
Edit latent representation 2020 [Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation]
Consider learner’s error tendency 2020 [Grammatical Error Correction Using Pseudo Learner Corpus Considering Learner’s Error Tendency]
Tagged corruption 2021 [Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models] [code]
Use 188 modules 2021 [Various Errors Improve Neural Grammatical Error Correction] [code]
Use real error petterns and linguistic knowledge 2021 [Data Augmentation of Incorporating Real Error Patterns and Linguistic Knowledge for Grammatical Error Correction]
Divide non-English sentence into chunks → translate to English for each of them → concatenate 2021 [Grammatical Error Generation Based on Translated Fragments]
2023 Grammatical Error Correction through Round-Trip Machine Translation
TransGEC 2023 TransGEC: Improving Grammatical Error Correction with Translationese [code]
Focus on gender bias 2023 Gender-Inclusive Grammatical Error Correction through Augmentation [code]
2023 Training for Grammatical Error Correction Without Human-Annotated L2 Learners’ Corpora
MixEdit 2023 MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction [code]

Data Cleaning

Keywords / Overview Year Paper Note
A Self-Refinement Strategy for Noise Reduction 2020 [A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction]
cLang8 (Cleaned Lang-8) 2021 [A Simple Recipe for Multilingual Grammatical Error Correction] [code]

Analyses

Keywords / Overview Year Paper Note
2011 Algorithm Selection and Model Adaptation for ESL Correction Tasks
2012 The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
Re-rank the CoNLL14 systems by human evaluation 2015 [Human Evaluation of Grammatical Error Correction Systems] [code]
2015 [How Far are We from Fully Automatic High Quality Grammatical Error Correction?]
Human annotation focused on fluency 2016 [Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality] [code]
2017 [GEC into the future: Where are we going and how do we get there?]
MAEGE 2018 [Automatic Metric Validation for Grammatical Error Correction] [code]
2018 [Inherent Biases in Reference-based Evaluation for Grammatical Error Correction] [code]
2018 [Assessing Grammatical Correctness in Language Learning]
Reassess M^2, I-measure, GLEU by comparing human evaluation 2018 [A Reassessment of Reference-Based Grammatical Error Correction Metrics] [code]
Quality estimation (and re-ranking using estimated score) 2018 [Neural Quality Estimation of Grammatical Error Correction] [code]
Evaluate four systems (SMT, CNN, LSTM, Transformer) for six corpora (CoNLL13&14, FCE, JFLEG, KJ, ICNALE) 2019 [Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?]
Compare CNN, Transformer, PRPN, ON-LSTM as back-translation models 2019 [The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction]
GEC for post-processing 2019 Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study
CGOP 2020 [Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection] Metric Considering overcorrection
Create new gold data by post-editing system outputs 2021 [How Good (really) are Grammatical Error Correction Systems?]
Explore whether models have grammatical knowledge with Known-setting and Unknown-setting 2021 [Do Grammatical Error Correction Models Realize Grammatical Generalization?]
Compare CNN, LSTM, transformer or combinations of them as BT models 2021 [Comparison of Grammatical Error Correction Using Back-Translation Models]
2022 Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
2022 Grammatical Error Correction: Are We There Yet?
2022 Grammatical Error Correction Systems for Automated Assessment: Are They Susceptible to Universal Adversarial Attacks? [code]
TETRA 2022 Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond
2023 ChatBack: Investigating Methods of Providing Grammatical Error Feedback in a GUI-based Language Learning Chatbot
2023 Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods
2023 A Closer Look at k-Nearest Neighbors Grammatical Error Correction
2023 Grammatical Error Correction for Sentence-level Assessment in Language Learning
2023 Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
2024 Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency
2024 GPT-3.5 for Grammatical Error Correction Target languages: CZ, DE, EN, RU, SV, UA

Spoken Domain

Keywords / Overview Year Paper Note
2019 AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH
2020 Grammatical error detection in transcriptions of spoken English
Disfluency detection (DD) model 2020 Spoken Language ‘Grammatical Error Correction’
2022 On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems

Applications

Name Year Paper Note
GECko++ [GECko+: a Grammatical and Discourse Error Correction Tool] [website] [code]
An English assiting tool. Correction grammatical error and re-ordering sentences automatically.
MiSS 2021 [MiSS: An Assistant for Multi-Style Simultaneous Translation] [website] [demo video]
ALLECS 2023 ALLECS: A Lightweight Language Error Correction System [website] [code]
2023 Doolittle: Benchmarks and Corpora for Academic Writing Formalization [code]

Projects

Name Website
GramFormer [GitHub]

Other Tools

Name Code Note
Lang8-NAIST-extractor [code] Scripts for extracting error-correct pairs from the Lang-8 Corpus.
M2Converter [code] Scripts for converting m2 file into source file and target file.
EFCamDat-Preprocess [code]

Other materials

Name Paper Note
NLP-progress [website]
The performance ranking on some datasets.
A Crash Course in Automatic Grammatical Error Correction [paper] [materials]
The tutorial about GEC in COLING2020.
Chunngai/gec-papers [github]
The papers are being compiled around 2019-2020?

Related Tasks

Grammatical Error Detection

Keywords / Overview Year Paper Note
2003 Automatic Error Detection in the Japanese Learners’ English Spoken Data
2006 Detecting errors in English article usage by non-native speakers
2008 The Ups and Downs of Preposition Error Detection in ESL Writing
2010 Evaluating performance of grammatical error detection to maximize learning effect
A weighted measure according to crowdsourcing results (for GED) 2011 [They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems]
2014 Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics
2016 Compositional Sequence Labeling Models for Error Detection in Learner Writing
2017 Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings [code]
2018 [Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection] [code]
Bi-LSTM with contextual word embeddings 2019 [Context is Key: Grammatical Error Detection with Contextual Word Representations]
Multi-head and multi-layer attention 2019 [Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection]
2021 [Exploring the Capacity of a Large-scale Masked Language Model to Recognize Grammatical Errors]
2022 Probing for targeted syntactic knowledge through grammatical error detection [code]

Feedback Comment Generation

Keywords / Overview Year Paper Note
2014 [Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages]
English grammar checker with feedback in Japanese 2018 [Grammatical Error Checker for Japanese Learners of English] This is not a research as a feedback comment generation, but I classify it here for now
2019 [Toward a Task of Feedback Comment Generation for Writing Learning]
2020 [Creating Corpora for Research in Feedback Comment Generation]
2021 [Shared Task on Feedback Comment Generation for Language Learners]
2023 Template-guided Grammatical Error Feedback Comment Generation

Explainable Grammatical Error Correction

  • Studies to explain the reasons for and intentions of error correction.
Keywords / Overview Year Paper Note
EXPECT 2023 Enhancing Grammatical Error Correction Systems with Explanations [code]
XGEC dataset 2024 Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction [data]

Other Languages

Arabic

Keywords / Overview Year Paper Note
Arabic Learner Corpus 2013 [Arabic Learner Corpus v1: A New Resource for Arabic Language Research] [website]
QALB 2014 [Large Scale Arabic Error Annotation: Guidelines and Framework] [QALB Project Website]
QALB 2014 Shared Task 2014 [The First QALB Shared Task on Automatic Text Correction for Arabic] [website]
QALB 2015 Shared Task 2015 [The Second QALB Shared Task on Automatic Text Correction for Arabic]
ARETA 2021 [Automatic Error Type Annotation for Arabic] [code]
2023 Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation [code]
2023 Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction [[code]]

Bangla

Keywords / Overview Year Paper Note
2021 [Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation]

Chinese

Keywords / Overview Year Paper Note
2013 Chinese Spelling Checker Based on Statistical Machine Translation
2014 Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners
2015 Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation
NLPCC-2018 Shared Task 2018 [Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction] [data]
Two-stage: Spell checker → seq2seq 2019 [A Two-Stage Model for Chinese Grammatical Error Correction]
CNN-based seq2seq 2019 [Chinese Grammatical Error Correction Based on Convolutional Sequence to Sequence Model]
MaskGEC 2020 [MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking]
2020 [Chinese Grammatical Error Detection Based on BERT Model]
2020 [BERT Enhanced Neural Machine Translation and Sequence Tagging Model for Chinese Grammatical Error Diagnosis]
2020 [Heterogeneous Recycle Generation for Chinese Grammatical Error Correction]
NLPTEA-2020 Shared Task 2020 [Overview of NLPTEA-2020 Shared Task for Chinese Grammatical Error Diagnosis]
Tail-to-Tail Non-Autoregressive Sequence Prediction 2021 [Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction]
2021 "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction
2022 Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students
2022 MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction [code]
2022 Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation [code]
2022 String Editing Based Chinese Grammatical Error Diagnosis
CLG 2022 Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction [code]
2022 From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
FCGEC 2022 FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction [code]
2023 Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction? [code]
2023 Focal Training and Tagger Decouple for Grammatical Error Correction
NaSGEC 2023 NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts [code]
TLM 2023 TLM: Token-Level Masking for Transformers [code]
2024 LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [code]

Czech

Keywords / Overview Year Paper Note
AKCES-GEC dataset 2019 [Grammatical Error Correction in Low-Resource Scenarios] [data]
Grammar Error Correction Corpus for Czech (GECCC) 2022 Czech Grammar Error Correction with a Large and Diverse Corpus [data]

Finnish

Keywords / Overview Year Paper Note
2024 Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models

Geek

Keywords / Overview Year Paper Note
Greek Learner Corpus 2018 [Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)]
ELERRANT 2021 [ELERRANT: Automatic Grammatical Error Type Classification for Greek] [code]

German

Keywords / Overview Year Paper Note
Falko-MERLIN dataset 2018 [Using Wikipedia Edits in Low Resource Grammatical Error Correction] [data]

Hindi

Keywords / Overview Year Paper Note
2014 [Detection and correction of non word spelling errors in Hindi language]
HiWikiEd dataset 2020 [Generating Inflectional Errors for Grammatical Error Correction in Hindi] [data]

Icelandic

Keywords / Overview Year Paper Note
Byte-level approach 2023 Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora [code]

Japanese

Keywords / Overview Year Paper Note
Character-level RNN-based seq2seq 2018 [Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation]
Constructing retrieval system for Japanese GEC 2019 [Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language]
TMU Evaluation Corpus for Japanese Learners 2020 [Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language] [data: Fill this form]
Non-Autoregressive approach 2020 [Non-Autoregressive Grammatical Error Correction Toward a Writing Support System]
2022 Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Korean

Keywords / Overview Year Paper Note
KAGAS 2023 Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation [code] [data request form]

Lithuanian

Keywords / Overview Year Paper Note
2022 Towards Lithuanian grammatical error correction [code]

Romain

Keywords / Overview Year Paper Note
2020 [Neural Grammatical Error Correction for Romanian] [code]

Russian

Keywords / Overview Year Paper Note
RULEC-GEC dataset 2019 [Grammar Error Correction in Morphologically Rich Languages: The Case of Russian] [data]
RU-Lang8 dataset 2021 [New Dataset and Strong Baselines for the Grammatical Error Correction of Russian] [data]
Additional annotations for RULEC and RU-Lang8 2024 Multi-Reference Benchmarks for Russian Grammatical Error Correction [RULEC] [RU-Lang8]
2024 Universal Dependencies for Learner Russian [code]

Spanish

Keywords / Overview Year Paper Note
COWS-L2H 2020 [Developing NLP Tools with a New Corpus of Learner Spanish] [data]

Swedish

Keywords / Overview Year Paper Note
2024 Evaluation of Really Good Grammatical Error Correction code

Turkish

Keywords / Overview Year Paper Note
ERRANT-TR 2023 Towards Automatic Grammatical Error Type Classification for Turkish [code]

Ukrainian

Keywords / Overview Year Paper Note
UA-GEC 2023 [UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language] [data]
UNLP 2023 Shared Task 2023 The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
2023 Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction UNLP-2023: Pravopysnyk
2023 A Low-Resource Approach to the Grammatical Error Correction of Ukrainian UNLP-2023: QC-NLP
2023 RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans UNLP-2023: WebSpellChecker

About

Repository to collect and categorize Grammatical Error Correction papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages