Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in lmms-eval/api/task.py and a suggestion #445

Open
real-lurEn opened this issue Dec 5, 2024 · 1 comment
Open

A bug in lmms-eval/api/task.py and a suggestion #445

real-lurEn opened this issue Dec 5, 2024 · 1 comment

Comments

@real-lurEn
Copy link

1.There is a bug in lmms-eval/api/task.py line.736 and 737
if self.config.fewshot_config is not None: self.sampler = samplers.get_sampler(self.config.fewshot_config.get("sampler", "default") if self.config.fewshot_config else "default")(list(self.fewshot_docs()), self, rnd=random.Random(1234))
When evaluate some benches, like GSM8K, it will not set default sampler.
The solution maybe remove the 'if', I hope I'm right.

2.Could you add more text bench, such as 'cmmlu' or 'HumanEval' ?

@kcz358
Copy link
Collaborator

kcz358 commented Dec 25, 2024

  1. There is a lot of bugs in few shot evaluation as the multi-modality settings is way more complicated for us to do so.
  2. For text benchmarks, you are recommended to use lm-evaluation-harness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants