Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose samples via the CLI #228

Merged
merged 9 commits into from
Aug 1, 2024
Merged

Expose samples via the CLI #228

merged 9 commits into from
Aug 1, 2024

Conversation

clefourrier
Copy link
Member

@clefourrier clefourrier commented Jul 17, 2024

This will not include few shot samples, as they require a model and its tokenizer to be built with the correct truncation.

Fix #164 and a request by @lewtun .

@clefourrier
Copy link
Member Author

Example for command: lighteval tasks --inspect "lighteval|wmt14:fr-en|0|0" --num_samples 1 --show_config

}
lighteval/sacrebleu_manual wmt14_fr-en
Careful, the task lighteval|wmt14:fr-en is using evaluation data to build the few shot examples.
---------- lighteval|wmt14:fr-en ----------
---------- CONFIG
|           Key            |               Value                |
|--------------------------|------------------------------------|
|name                      |wmt14:fr-en                         |
|prompt_function           |wmt_reverse_alphabetical            |
|hf_repo                   |lighteval/sacrebleu_manual          |
|hf_subset                 |wmt14_fr-en                         |
|metric 0: metric_name     |bleu                                |
|metric 0: higher_is_better|True                                |
|metric 0: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 0: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 0: sample_level_fn |GenerativePreparator.prepare        |
|metric 0: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 1: metric_name     |chrf                                |
|metric 1: higher_is_better|True                                |
|metric 1: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 1: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 1: sample_level_fn |GenerativePreparator.prepare        |
|metric 1: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 2: metric_name     |ter                                 |
|metric 2: higher_is_better|False                               |
|metric 2: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 2: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 2: sample_level_fn |GenerativePreparator.prepare        |
|metric 2: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|hf_avail_splits           |('test',)                           |
|evaluation_splits         |('test',)                           |
|few_shots_split           |None                                |
|few_shots_select          |None                                |
|generation_size           |None                                |
|stop_sequence             |('\n',)                             |
|output_regex              |None                                |
|num_samples               |None                                |
|frozen                    |False                               |
|suite                     |('lighteval', 'sacrebleu')          |
|original_num_docs         |-1                                  |
|effective_num_docs        |-1                                  |
|trust_dataset             |True                                |
|must_remove_duplicate_docs|None                                |
|version                   |0                                   |

---------- SAMPLES
{"query": "French phrase: L'affaire NSA souligne l'absence totale de d\u00e9bat sur le renseignement\nEnglish phrase:", "choices": ["NSA Affair Emphasizes Complete Lack of Debate on Intelligence"], "gold_index": 0, "original_query": "", "specific": null, "task_name": "lighteval|wmt14:fr-en", "instruction": null, "target_for_fewshot_sorting": null, "ctx": "", "num_asked_few_shots": -1, "num_effective_few_shots": -1}

@NathanHB
Copy link
Member

Great feature ! small nit however, I would print the samples in a formated way so that's it's easier to read using pprint.

@clefourrier
Copy link
Member Author

Good idea!

@clefourrier clefourrier requested a review from NathanHB July 25, 2024 17:14
@clefourrier clefourrier merged commit cbae17d into main Aug 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose a few model predictions / gold answers in the logs
2 participants