option to track avg logit per token type #6

IanMagnusson · 2023-10-25T23:31:41Z

This PR is ported from the old LLM repo at allenai/OLMo#334.

This PR makes use of new features in Catwalk's perplexity evaluations in allenai/catwalk#155 that report avg logits for tokens. It makes WriteOutputsAsRowsMultipleMetrics and PredictAndCalculateMetricsStep capable of surfacing these "token_count_avg_logits_by_domain" results within the new "extra_output" field.

To use this one must have the following task kwargs:

task_kwargs: {
    keep_all_instance_fields_except: ["text", "tokens"],
    detailed_output: true
}

You need to use keep_all_instance_fields_except because this depends on a lot of instance level information. And the detailed_output flag indicates that the aggregated results of this instance level data should be surfaced.

option to track avg logit per token type

241863d

IanMagnusson marked this pull request as ready for review October 26, 2023 00:49

IanMagnusson requested a review from AkshitaB October 26, 2023 00:49

IanMagnusson added 3 commits October 26, 2023 15:07

still print other results to gsheet

a7054ee

Merge remote-tracking branch 'origin/main' into token-ppls

2420fdd

satisfy checks

0cb71bd

IanMagnusson marked this pull request as draft October 27, 2023 20:41

Merge remote-tracking branch 'origin/main' into token-ppls

54cff72

IanMagnusson marked this pull request as ready for review October 27, 2023 22:51

AkshitaB approved these changes Oct 27, 2023

View reviewed changes

IanMagnusson merged commit e324a88 into main Oct 30, 2023
9 checks passed

IanMagnusson deleted the token-ppls branch October 30, 2023 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to track avg logit per token type #6

option to track avg logit per token type #6

IanMagnusson commented Oct 25, 2023

option to track avg logit per token type #6

option to track avg logit per token type #6

Conversation

IanMagnusson commented Oct 25, 2023