Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summary] Add popular NLG faithfulness evaluation datasets #139

Open
gsarti opened this issue Jun 22, 2022 · 0 comments
Open

[Summary] Add popular NLG faithfulness evaluation datasets #139

gsarti opened this issue Jun 22, 2022 · 0 comments
Labels
enhancement New feature or request long shot Future direction for the development of the library

Comments

@gsarti
Copy link
Member

gsarti commented Jun 22, 2022

🚀 Feature Request

In order to facilitate the evaluation of different interpretability techniques, I propose to identify a set of commonly used datasets from the literature, create 🤗 Datasets loading scripts to have them in a shared format, and host them on the Inseq organization in the Hugging Face hub.

This would provide a shared interface for:

  • Faithfulness metrics applied at a dataset level.
  • Future support of instance attribution methods.

The following table summarizes some of the datasets used in the literature:

Name Task Data source Paper Description
SCAT Translation neulab/contextual-mt Yin et al. '21 Contextual coreference in translation, with disambiguating context highlights from translators
Lambada + Rationales Language Modeling keyonvafa/sequential-rationales Vafa et al. '21 Next word prediction with human-annotated previous relevant context
Europarl Gold Alignments Translation TBD TBD Gold alignments for various language pairs in the Europarl corpus

The ExNLP Datasets website summarizes various sources available for NLP explainability, verify what is relevant to generation.

@gsarti gsarti added the enhancement New feature or request label Jun 22, 2022
@gsarti gsarti added this to the Demo Paper Release milestone Jun 22, 2022
@gsarti gsarti removed this from the Demo Paper Release milestone Oct 26, 2022
@gsarti gsarti added the long shot Future direction for the development of the library label Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long shot Future direction for the development of the library
Projects
None yet
Development

No branches or pull requests

1 participant