Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing lessons from OLMES #2002

Open
lintangsutawika opened this issue Jun 20, 2024 · 0 comments
Open

Implementing lessons from OLMES #2002

lintangsutawika opened this issue Jun 20, 2024 · 0 comments

Comments

@lintangsutawika
Copy link
Contributor

The OLMES paper is a pretty interesting read and is complementary to LM-Eval. I think there are a few features we can consider implementing based on lessons and recommendations from the paper.

  1. Implement mode-switch between Multiple-choice formulation and Completion/cloze formulation. Empirically models respond to both version differently depending on how much tokens trained with the former showing a stronger signal in later stages (400B tokens and above). The recommendation is to evaluate on both and take the highest score. Lm-eval would benefit from having the ability to write 1 prompt format and have it automatically be used in both formulations.
  2. Normalization should be a configuration feature. We use both non-normalized accuracy and normalized accuracy, specifically by dividing the log-probability by the length or characters. It would be great to be able to choose the normalization and add more (normalization like based on token length or pointwise-mutual information)
  3. Further add support for fewshot selection. We have some support for hardcoding fewshot samples, but we've never really supported a way to make it easier to select fewshots with conditions like "make sure the answer choices are not all A" or set a list of index number that points to the samples directly.

cc: @haileyschoelkopf @StellaAthena

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant