LLM Evaluation Tutorials with Evalverse #76

jihoo-kim · 2024-05-27T01:37:25Z

Suggest for LLM Evaluation Tutorials with `Evalverse`

Tutorials (Notebook examples): https://github.com/UpstageAI/evalverse/tree/main/examples
- 01_basic_usage.ipynb
- 02_advanced_usage.ipynb
  - Open LLM Leaderboard (h6_en)
  - MT-Bench
  - IFEval (Instruction Following Evaluation)
  - EQ-Bench

Evalverse
- Github: https://github.com/UpstageAI/evalverse
- Docs: https://evalverse.gitbook.io/evalverse-docs
- Paper: https://arxiv.org/abs/2404.00943
- HuggingFace: https://huggingface.co/spaces/upstage/evalverse-space
- Articles
  - https://www.upstage.ai/feed/tech/evalverse-llm-evaluation-opensource
  - https://huggingface.co/blog/Yescia/evalverse-llm-evaluation-opensource

The text was updated successfully, but these errors were encountered:

mlabonne · 2024-05-28T11:44:34Z

Hey thanks for the suggestion, this is quite exciting. I've been looking for something like this for a while.

I've tried it yesterday and I ran into some issues:

Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results
I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

In general, I would really appreciate if we could have an example with Llama 3.

jihoo-kim · 2024-05-29T00:41:54Z

Thanks for accepting my suggestion and trying it. @mlabonne

Issue 1

Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results

Could you tell me what script you ran it with? If you specify the output_path argument, the results would be saved on the disk. The default values of output_path is the directory where evalverse is placed.

Please try again with your own output_path.

CLI

python3 evaluator.py \
    --ckpt_path {your_model} \
    --mt_bench \
    --num_gpus_total 8 \
    --parallel_api 4 \
    --output_path {your_path}

Library

import evalverse as ev

evaluator = ev.Evaluator()
evaluator.run(
    model={your_model},
    benchmark="mt_bench",
    num_gpus_total=8,
    parallel_api=4,
    output_path={your_path}
)

Issue 2

I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

I will fix it as soon as possible and let you know again. Thank you!

jihoo-kim changed the title ~~LLM Evaluation Tutorials with Evalverse (Open~~ LLM Evaluation Tutorials with Evalverse May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Evaluation Tutorials with Evalverse #76

LLM Evaluation Tutorials with Evalverse #76

jihoo-kim commented May 27, 2024

mlabonne commented May 28, 2024

jihoo-kim commented May 29, 2024

LLM Evaluation Tutorials with Evalverse #76

LLM Evaluation Tutorials with Evalverse #76

Comments

jihoo-kim commented May 27, 2024

Suggest for LLM Evaluation Tutorials with Evalverse

mlabonne commented May 28, 2024

jihoo-kim commented May 29, 2024

Issue 1

Issue 2

Suggest for LLM Evaluation Tutorials with `Evalverse`