您好,请问Qwen2.5官方榜单中的评测数据集分别是使用多少shot评测的? #978
Unanswered
13416157913
asked this question in
Q&A
Replies: 3 comments 1 reply
-
For instruction-tuned models, we choose the best setting for each dataset.
We do not report BBH score for instruction models. For base models, we use 3-shot. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks a lot. |
Beta Was this translation helpful? Give feedback.
0 replies
-
https://arxiv.org/pdf/2407.10671 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Has this been raised before?
Description
您好,请问Qwen2.5官方的评测榜单中所有的评测集分别是采用多少shot评测的?例如:MMLU、MMLU-Pro、GPQA、MATH、GSM8K、HumanEval、IFEval、MBPP、BBH等等
Beta Was this translation helpful? Give feedback.
All reactions