-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to reproduce Deepseek-math result #2555
Comments
Hi! can you provide the reference? |
I ran the following command According to the official deekseek-math repo, gsm-8k-cot should be 82.9%. |
I am not familiar with unicode, but according to an online unicode translator, the So, "<\uff5cbegin\u2581of\u2581sentence\uff5c>User" = "<|begin_of_sentence>User" which is exactly the But I am not sure how to fix this. |
I think by "avoiding", they mean if you want to format the chats manually. Otherwise looks like the chat_template does format the messages in a similar way. Seems like their implementation is similar to gsm8k_cot_llama except for the fewshots, instead of The Their answer extraction is also a quite different. You can remove the
where the The Hope this is helpful! |
A quick update of the reproduced results. On 50 samples, I can get the reported result (82.x), I am going to run experiment on the whole benchmark later today but currently I am out of GPUs.
Where extract_gsm_few_shot_cot_answer and is_correct are copied from official deepseek-math repro. The experiment command is Devices: 2 x A6000 |
I will close this issue once getting expected result on the full gsm8k-test benchmark. |
Hi there, I failed to reproduce the reported deepseek-math result on gsm-8k benchmark with 8shot cot. But the result is significantly much lower than the reported result (0.82 vs 0.64).
The text was updated successfully, but these errors were encountered: