[Summary] Add popular NLG faithfulness evaluation datasets #139

gsarti · 2022-06-22T12:58:29Z

🚀 Feature Request

In order to facilitate the evaluation of different interpretability techniques, I propose to identify a set of commonly used datasets from the literature, create 🤗 Datasets loading scripts to have them in a shared format, and host them on the Inseq organization in the Hugging Face hub.

This would provide a shared interface for:

Faithfulness metrics applied at a dataset level.
Future support of instance attribution methods.

The following table summarizes some of the datasets used in the literature:

Name	Task	Data source	Paper	Description
SCAT	Translation	neulab/contextual-mt	Yin et al. '21	Contextual coreference in translation, with disambiguating context highlights from translators
Lambada + Rationales	Language Modeling	keyonvafa/sequential-rationales	Vafa et al. '21	Next word prediction with human-annotated previous relevant context
Europarl Gold Alignments	Translation	TBD	TBD	Gold alignments for various language pairs in the Europarl corpus

The ExNLP Datasets website summarizes various sources available for NLP explainability, verify what is relevant to generation.

gsarti added the enhancement New feature or request label Jun 22, 2022

gsarti added this to the Demo Paper Release milestone Jun 22, 2022

gsarti removed this from the Demo Paper Release milestone Oct 26, 2022

gsarti added the long shot Future direction for the development of the library label Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Summary] Add popular NLG faithfulness evaluation datasets #139

[Summary] Add popular NLG faithfulness evaluation datasets #139

gsarti commented Jun 22, 2022

[Summary] Add popular NLG faithfulness evaluation datasets #139

[Summary] Add popular NLG faithfulness evaluation datasets #139

Comments

gsarti commented Jun 22, 2022

🚀 Feature Request