The results of the model evaluation reproduction are incorrect. 模型评测复现的结果不对 #982

Aslan-yulong · 2025-03-21T01:59:28Z

The results of the model evaluation reproduction are incorrect. I used the evaluation set OCRBenchV2 to evaluate the qwen2.5vl series and found that the scores are different from those presented in your paper and on the homepage. The scores I obtained are not as high as yours, and the difference is quite significant.
I conducted the evaluation using the default parameters after deploying the model on the vllm server. May I ask what your evaluation environment is? Why is there such a large discrepancy in the scores?
模型评测复现的结果不对，我使用评测集OCRBenchV2对qwen2.5vl系列进行了评测，发现结果与您在paper和主页展示的分数不同，没有您这里得到的分数高，差别相当大。
我是使用vllm sever部署模型之后按默认参数进行评测的，请问您这边的评测环境是什么？为什么分数差异这么大呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results of the model evaluation reproduction are incorrect. 模型评测复现的结果不对 #982

The results of the model evaluation reproduction are incorrect. 模型评测复现的结果不对 #982

Aslan-yulong commented Mar 21, 2025

The results of the model evaluation reproduction are incorrect. 模型评测复现的结果不对 #982

The results of the model evaluation reproduction are incorrect. 模型评测复现的结果不对 #982

Comments

Aslan-yulong commented Mar 21, 2025