Change the eos condition for GSM8K #85

clefourrier · 2024-03-04T10:09:55Z

Will likely require updating the test suite - pending till we figure out the datasets bug in the CI.
Linked to #82

src/lighteval/tasks/tasks_table.jsonl

clefourrier · 2024-03-04T13:33:09Z

@NathanHB wdyt of having 6 tasks called leaderboard|task|... ?
That way we could differentiate the modifs we make for more general setups from the pinned leaderboard versions.

NathanHB · 2024-03-04T13:39:15Z

@NathanHB wdyt of having 6 tasks called leaderboard|task|... ? That way we could differentiate the modifs we make for more general setups from the pinned leaderboard versions.

Oh good idea, though i'm not sure anyone will use other versions if we have the leaderboard versions. We want to be able to compare results with as many models as possible.

clefourrier · 2024-03-04T13:41:19Z

Well, atm, the leaderboard versions use a pinned very old version of the harness - which led to the problems mentioned by @lewtun (for EOS tokens for ex).
I think we should both adress these problems and provide a cool version of our evals, but also allow people to reproduce leaderboard scores, wdyt?

NathanHB · 2024-03-04T13:52:53Z

I agree !

lewtun · 2024-03-04T21:52:16Z

Well, atm, the leaderboard versions use a pinned very old version of the harness - which led to the problems mentioned by @lewtun (for EOS tokens for ex). I think we should both adress these problems and provide a cool version of our evals, but also allow people to reproduce leaderboard scores, wdyt?

Just so I understand: in the new format, there is leaderboard|tasks|num_fewshot|0 but will lighteval still be a valid suite for e.g. gsm8k?

In other words, the leaderboard suite => same logic as old pinned version of harness, but lighteval will have various improvements etc?

src/lighteval/tasks/tasks_table.jsonl

clefourrier · 2024-03-05T07:06:16Z

@lewtun You understood perfectly! leaderboard|task should allow you to reproduce the current scores of the Open LLM Leaderboard. lighteval|task will follow our own logic for the task, in terms of EOS tokens and generation length.

README.md

clefourrier added 2 commits March 4, 2024 09:51

change GSM8K conf

d22f7ac

fix

e201451

NathanHB reviewed Mar 4, 2024

View reviewed changes

src/lighteval/tasks/tasks_table.jsonl Outdated Show resolved Hide resolved

clefourrier added 2 commits March 4, 2024 13:56

Merge branch 'main' into GSM8K_change_eos_condition

3f918bf

Merge branch 'main' into GSM8K_change_eos_condition

ea738d2

updated to leaderboard tasks vs normal tasks

607a1eb

NathanHB linked an issue Mar 4, 2024 that may be closed by this pull request

Anomalously small values gemma-2b-it on GMS8k #82

Closed

clefourrier and others added 2 commits March 4, 2024 15:56

update task table ^^''

5f2c1e3

Merge branch 'main' into GSM8K_change_eos_condition

77dc189

clefourrier requested a review from NathanHB March 4, 2024 15:20

NathanHB approved these changes Mar 4, 2024

View reviewed changes

Merge branch 'main' into GSM8K_change_eos_condition

bb3e60e

lewtun reviewed Mar 4, 2024

View reviewed changes

src/lighteval/tasks/tasks_table.jsonl Show resolved Hide resolved

lewtun reviewed Mar 6, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

clefourrier commented Mar 6, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

clefourrier added 2 commits March 6, 2024 09:24

Update README.md

35a0a6f

Merge branch 'main' into GSM8K_change_eos_condition

8daf791

clefourrier merged commit 9b3813f into main Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the eos condition for GSM8K #85

Change the eos condition for GSM8K #85

clefourrier commented Mar 4, 2024 •

edited

Loading

clefourrier commented Mar 4, 2024

NathanHB commented Mar 4, 2024

clefourrier commented Mar 4, 2024

NathanHB commented Mar 4, 2024

lewtun commented Mar 4, 2024

clefourrier commented Mar 5, 2024

Change the eos condition for GSM8K #85

Change the eos condition for GSM8K #85

Conversation

clefourrier commented Mar 4, 2024 • edited Loading

clefourrier commented Mar 4, 2024

NathanHB commented Mar 4, 2024

clefourrier commented Mar 4, 2024

NathanHB commented Mar 4, 2024

lewtun commented Mar 4, 2024

clefourrier commented Mar 5, 2024

clefourrier commented Mar 4, 2024 •

edited

Loading