A bug in lmms-eval/api/task.py and a suggestion #445

real-lurEn · 2024-12-05T09:02:04Z

1.There is a bug in lmms-eval/api/task.py line.736 and 737
if self.config.fewshot_config is not None: self.sampler = samplers.get_sampler(self.config.fewshot_config.get("sampler", "default") if self.config.fewshot_config else "default")(list(self.fewshot_docs()), self, rnd=random.Random(1234))
When evaluate some benches, like GSM8K, it will not set default sampler.
The solution maybe remove the 'if', I hope I'm right.

2.Could you add more text bench, such as 'cmmlu' or 'HumanEval' ?

The text was updated successfully, but these errors were encountered:

kcz358 · 2024-12-25T00:46:42Z

There is a lot of bugs in few shot evaluation as the multi-modality settings is way more complicated for us to do so.
For text benchmarks, you are recommended to use lm-evaluation-harness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug in lmms-eval/api/task.py and a suggestion #445

A bug in lmms-eval/api/task.py and a suggestion #445

real-lurEn commented Dec 5, 2024

kcz358 commented Dec 25, 2024

A bug in lmms-eval/api/task.py and a suggestion #445

A bug in lmms-eval/api/task.py and a suggestion #445

Comments

real-lurEn commented Dec 5, 2024

kcz358 commented Dec 25, 2024