Add mt-bench #75

NathanHB · 2024-02-28T12:10:41Z

What this PR does:

Uses custom metrics and tasks to add llm a as judge
adds multi turn generation
Adds mt-bench metric

This implementation uses mt-bench prompts from InflectionAI. The code is inspired from the original implementation of mt-bench with notable differences.

mt-bench uses a custom-made chat templating system, we use the tokenizer
mt-bench uses an old version of the openai API, we use the newest one, with very simplified logic for chat prompt formating. We can easily add more models to act as judge.
We do not use varying temperature based on the sample we are evaluating. All samples are generated using do_sample=False and temperature set to 0.0.

src/lighteval/models/base_model.py

clefourrier · 2024-03-08T13:20:01Z

@NathanHB feel free to ping me once it's merged with main so we can integrate it :)

clefourrier · 2024-03-20T12:40:32Z

Careful with the deletion of task_examples, quite sure some of these files are needed by the nanotron team.

clefourrier · 2024-03-20T12:42:19Z

src/lighteval/models/base_model.py

            override_bs=override_bs,
        )

+    def greedy_until_multi_turn(self, requests: list[GreedyUntilMultiTurnRequest], override_bs: Optional[int] = None) -> GenerateMultiTurnReturn:


You'll also need to update the other model launchers, or at least add a function in the abstract model which indicates that it needs to be implemented

clefourrier

Quite cool overall!
Some code reorgs are need imo (management of multi turn context with all the other contexts, finding a better way to pass judgements for logging, implementing greedy_until_multi_turn in the other models, or at least the LightevalModel class), plus some doc.

Nice that the code is so neat, it looks like it will be easy to switch to "using a judge as metric" in the future!

extended_tasks/mt_bench/judge_prompts.jsonl

extended_tasks/mt_bench/judges.py

extended_tasks/mt_bench/main.py

src/lighteval/tasks/lighteval_task.py

extended_tasks/mt_bench/judges.py

extended_tasks/mt_bench/main.py

src/lighteval/logging/info_loggers.py

src/lighteval/models/base_model.py

clefourrier

LGTM, minor questions.
Overall, very good job in adding this mechanism! Looking forward having it as an actual metric!

src/lighteval/evaluator.py

src/lighteval/models/base_model.py

src/lighteval/tasks/extended/mt_bench/main.py

src/lighteval/tasks/lighteval_task.py

What this PR does: - Uses custom metrics and tasks to add llm a as judge - adds multi turn generation - Adds mt-bench metric This implementation uses mt-bench prompts from [InflectionAI](https://github.com/InflectionAI/Inflection-Benchmarks). The code is inspired from the original implementation of mt-bench with notable differences. - mt-bench uses a custom-made chat templating system, we use the tokenizer - mt-bench uses an old version of the openai API, we use the newest one, with very simplified logic for chat prompt formating. We can easily add more models to act as judge. - We do not use varying temperature based on the sample we are evaluating. All samples are generated using `do_sample=False` and temperature set to `0.0`.

clefourrier and others added 6 commits February 20, 2024 14:12

init ifeval, now need to add loading custom metric system

89e7fda

Merge branch 'main' into clem_customizable_metrics

96aa81b

custom metrics working! need to update the readme

2fdceb8

update doc

0e30b21

fix eos token + eval script

1ba178f

init

6233af7

NathanHB changed the title ~~Add mt bench~~ Add mt-bench Feb 28, 2024

clefourrier reviewed Feb 28, 2024

View reviewed changes

src/lighteval/models/base_model.py Outdated Show resolved Hide resolved

Nathan Habib added 4 commits February 28, 2024 14:32

remove ifeval

5cc9c2c

revert README

b9045e1

revert README

ff79480

better context management

a234bf6

NathanHB linked an issue Mar 4, 2024 that may be closed by this pull request

Add MT-Bench #88

Closed

NathanHB added 3 commits March 6, 2024 16:24

working state

1357c10

fix

bb5cca2

:Merge branch 'nathan_fix_push_details' into nathan-add-mt-bench

6b74a68

NathanHB and others added 7 commits March 9, 2024 10:27

continue

f548902

continue

2e2b15d

commit

339f1f6

:Merge remote-tracking branch 'origin/main' into nathan-add-mt-bench

aba90b3

Update README.md

5bc5b98

commit

cd1300d

commit

1fd755e

clefourrier reviewed Mar 20, 2024

View reviewed changes

NathanHB added 4 commits March 20, 2024 14:14

commit

4b00eb7

commit

4903755

commit

9ff0707

commit

ff177a1

NathanHB added 5 commits March 21, 2024 11:25

commit

e5b6ea8

update readme

0dcdb1e

commti

703741b

commit

6e8026f

format

588fb2f

NathanHB requested review from clefourrier and lewtun March 22, 2024 11:44

format

8cb4894

clefourrier reviewed Mar 22, 2024

View reviewed changes

commit

c08a8f6

clefourrier mentioned this pull request Mar 26, 2024

Add EQ Bench #114

Open

NathanHB added 8 commits March 27, 2024 14:46

fixes for review

64ceee5

make style

46d7dd8

fix

e2f7fa8

revert generate_response in base model

3260147

Merge remote-tracking branch 'origin/main' into nathan-add-mt-bench

323188a

merge

33eb252

fix tests

b2e5895

fix format

c42e65d

clefourrier reviewed Mar 28, 2024

View reviewed changes

src/lighteval/logging/info_loggers.py Outdated Show resolved Hide resolved

clefourrier reviewed Mar 28, 2024

View reviewed changes

src/lighteval/models/base_model.py Show resolved Hide resolved

NathanHB added 2 commits March 29, 2024 12:44

commit

aa6c6f8

make style

bb4b133

NathanHB requested a review from clefourrier March 29, 2024 12:48

clefourrier approved these changes Mar 29, 2024

View reviewed changes

NathanHB and others added 3 commits March 29, 2024 14:23

fix from review

2d3a04c

fix

0819ac7

Merge branch 'main' into nathan-add-mt-bench

b2bf514

NathanHB merged commit af24080 into main Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mt-bench #75

Add mt-bench #75

Uh oh!

NathanHB commented Feb 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

clefourrier commented Mar 8, 2024

Uh oh!

clefourrier commented Mar 20, 2024

Uh oh!

clefourrier Mar 20, 2024

Uh oh!

clefourrier left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add mt-bench #75

Add mt-bench #75

Uh oh!

Conversation

NathanHB commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

clefourrier commented Mar 8, 2024

Uh oh!

clefourrier commented Mar 20, 2024

Uh oh!

clefourrier Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanHB commented Feb 28, 2024 •

edited

Loading

clefourrier left a comment •

edited

Loading