Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 1.59 KB

README.md

File metadata and controls

26 lines (18 loc) · 1.59 KB

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

This repository contains code, data, and templates for crowdsourcing protocols, described by the paper: Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries.

Scripts

calculate.ipynb: to calculate the score distribution, krippendorff reliability, and SHR reliability.

Data

We released our evaluation templates and annotations to promote future work on factual consistency evaluation. The annotations can be found in for CNN&DM data, for XSUM data and templates

Model

The code for BART, ProphetNet, PEGASUS, and BERTSUM is based on Fairseq(-py). Our pretrained models can be found in for CNN&DM data and for XSUM data

Citation

If you use our code in your research, please cite our work:

@inproceedings{tang2022investigating,
   title={Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries},
   author={Tang, Xiangru and Fabbri, Alexander R and Mao, Ziming and Adams, Griffin and Wang, Borui and Li, Haoran and Mehdad, Yashar and Radev, Dragomir},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2022}
}