Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MMLU prompt variants #484

Merged
merged 10 commits into from
Mar 5, 2024
Merged

Add MMLU prompt variants #484

merged 10 commits into from
Mar 5, 2024

Conversation

OyvindTafjord
Copy link
Contributor

With @yulinggu-cs, added a version of the MMLU datasets with multiple prompt variants, to make the datasets effectively 7x larger. These are named mmlu_stem_var etc (add _var suffix).

The prompts are called: [None, "inst", "inst+1", "inst+2", "inst+3", "inst+4", "inst+5"]

where "inst+2" means add instruction line followed by 2-shot example. We also added logging of first few examples for each downstream task (can remove this if it's annoying, but it's a useful sanity check), here's an example for the "inst+2" prompt:

[2024-03-04 19:06:09] INFO     [olmo.eval.downstream:224, rank=0] Sample doc from (hails/mmlu_no_train, high_school_government_and_politics, inst+2):
doc_text: The following are multiple choice questions (with answers) about high school government and politics:

Question: Uncertainty over the limits to presidential power is caused primarily by the fact that
Answer: the constitutional definition of those powers is broad and unspecific

Question: The term "budget deficit" refers to the
Answer: amount the government spends in excess of its revenues

Question: Which of the following accurately describes congressional committees? I. The committee chairpersons always belong to the majority party. II. Seats on each committee are divided between the two major parties in exact proportion to the parties' representation in Congress. III. They recommend whether Congress should pass various pieces of legislation, and those recommendations are always approved by the full congressional body. IV. When a committee vote results in a tie, the vice president casts the tie-breaking vote.
Answer:
continuations: [' I only', ' II only', ' I and III only', ' II and III only']

Things to note:

  • There was a bug when an instance goes beyond max context length, as the computation in update uses batch["ctx_len"] which might be larger than 2048. I tried to fix this in f64b9ce, but it's a bit iffy (the context length is separately processed in prep_examples and in collate_fn which could be confusing
  • Logging the instances we noticed a tiny issue with the SciQ formatting, fixed here
  • As mentioned, we the first 5 instances in each dataset. If this happens in some sort of parallel processing, the method of doing self.log_instances -= 1 might not be very robust? If this is too noisy in the logging, we can comment these lines out.

@OyvindTafjord
Copy link
Contributor Author

Huh, interesting, a test is failing with an error I ran into in the oe-eval-internal yesterday as well, regarding downloading datasets from HF hub. This will probably affect any OLMo run regardless of this PR. After some mad testing of python dependencies, I eventually found that pinning fsspec==2023.5.0 resolved the issue. I assume this will be fixed quickly, but I didn't see any chatter about it when I did a quick search yesterday.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, we the first 5 instances in each dataset. If this happens in some sort of parallel processing, the method of doing self.log_instances -= 1 might not be very robust? If this is too noisy in the logging, we can comment these lines out.

We should be okay here since we generally run the eval data loaders with num_workers=0.

@epwalsh epwalsh mentioned this pull request Mar 5, 2024
@epwalsh epwalsh merged commit 493c0b8 into main Mar 5, 2024
10 of 11 checks passed
@epwalsh epwalsh deleted the mmlu-variants branch March 5, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants