Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM Evaluation Tutorials with Evalverse #76

Open
jihoo-kim opened this issue May 27, 2024 · 2 comments
Open

LLM Evaluation Tutorials with Evalverse #76

jihoo-kim opened this issue May 27, 2024 · 2 comments

Comments

@jihoo-kim
Copy link

Suggest for LLM Evaluation Tutorials with Evalverse

image

@jihoo-kim jihoo-kim changed the title LLM Evaluation Tutorials with Evalverse (Open LLM Evaluation Tutorials with Evalverse May 27, 2024
@mlabonne
Copy link
Owner

Hey thanks for the suggestion, this is quite exciting. I've been looking for something like this for a while.

I've tried it yesterday and I ran into some issues:

  • Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results
  • I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

In general, I would really appreciate if we could have an example with Llama 3.

@jihoo-kim
Copy link
Author

Thanks for accepting my suggestion and trying it. @mlabonne

Issue 1

Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results

Could you tell me what script you ran it with? If you specify the output_path argument, the results would be saved on the disk. The default values of output_path is the directory where evalverse is placed.

Please try again with your own output_path.

CLI

python3 evaluator.py \
    --ckpt_path {your_model} \
    --mt_bench \
    --num_gpus_total 8 \
    --parallel_api 4 \
    --output_path {your_path}

Library

import evalverse as ev

evaluator = ev.Evaluator()
evaluator.run(
    model={your_model},
    benchmark="mt_bench",
    num_gpus_total=8,
    parallel_api=4,
    output_path={your_path}
)

Issue 2

I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

I will fix it as soon as possible and let you know again. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants