-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VL-RewardBench dataset #484
Conversation
Also, does lmms-eval support majority voting evaluation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for your contribution. I think most of the changes are good to go. If you can provide a screenshot of the success evaluation result in this PR would be better.
I think the commit history include some duplicate commits and should be remove to keep the change history clean. I can directly do that on your branch but I need to force push on it so if I can do that I can help you clean it.
For the filtering option, you can try adding
filter_list:
- name: "xxx"
filter:
- function: "majority_vote"
in your yaml file and see if it works. But we directly copy it from lm-eval-harness
so we are not sure whether it can work or not. For self-customizing filters, you can check examples such as realworldqa
Here are the saved evaluation results of GPT-4o mini and GPT-4o, which are consistent with our previous results with small variance.
For the duplicate commits: you can do it on the branch directly. Thanks a lot! For the majority voting: Our results indicate that the increased inference computation brings little difference so I think we can leave it for future integration. |
17a2e19
to
1896045
Compare
Thanks, I will merge this PR now |
Hi lmms-eval team,
Thanks for your great project, which accelerates our LMM workflow a lot.
This PR incorporates our recently released VL-RewardBench.
Example evaluation script: