Skip to content

๐ŸŽ[ErrorAnalysis_NLGEvaluation] (ACL2023) Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

Notifications You must be signed in to change notification settings

Coldmist-Lu/ErrorAnalysis_NLGEvaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ErrorAnalysis_NLGEvaluation

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis (ACL2023)

ErrorAnalysis_NLGEvaluation is a NLG evaluation metric computed by analyse the explicit/ implicit errors in the hypothesis. This is interpretable and human-like since we display how the metric refine the hypothesis into a better sentence.

We release the Metric, BARTScore4NLG, including the scripts of error analysis and scoring + prompting process for the replication of this study.

Usage in Python Command:

from score import BARTScore4NLG_Scorer

scorer = BARTScore4NLG_Scorer(task='MT_WMT20', setting='zh-en', signature='bs:4|model:para')

src = ['Mike goes to the bookstore.', 'The cat is on the mat.']
tgt = ['Jerry goes to bookstore happily.', 'The mat sat on the mat.']
scorer.score(src, tgt)

See test.ipynb for more information. The scripts are

If you find this work helpful, please consider citing as follows:

@inproceedings{lu-etal-2023-toward,
    title = "Toward Human-Like Evaluation for Natural Language Generation with Error Analysis",
    author = "Lu, Qingyu  and
      Ding, Liang  and
      Xie, Liping  and
      Zhang, Kanjian  and
      Wong, Derek F.  and
      Tao, Dacheng",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.324",
    pages = "5892--5907",
}

About

๐ŸŽ[ErrorAnalysis_NLGEvaluation] (ACL2023) Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published