Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 主观评测得分解析错误 #3

Open
WencWu opened this issue Nov 28, 2024 · 2 comments
Open

[Bug] 主观评测得分解析错误 #3

WencWu opened this issue Nov 28, 2024 · 2 comments

Comments

@WencWu
Copy link

WencWu commented Nov 28, 2024

例如以下例子中 **Score: 9** 被错误解析为 5.0

{
    "question": "To cook perfectly golden pancakes,",
    "obj": {
        "generation_a": "Mix the ingredients together in a bowl and pour it onto a hot pan.",
        "generation_b": "To cook golden pancakes, mix flour, eggs, milk, and baking powder. Heat a non-stick pan, scoop batter, and cook until bubbles form. Flip and cook until golden. Serve with maple syrup.🥞",
        "preference": "ansB",
    },
    "sub": {
        "generation_a": "To cook golden pancakes, mix flour, eggs, milk, and baking powder. Heat a non-stick pan, scoop batter, and cook until bubbles form. Flip and cook until golden. Serve with maple syrup.🥞",
        "generation_b": "Mix the ingredients together in a bowl and pour it onto a hot pan.",
        "feedback": "Answer A provides a more detailed, step-by-step process of how to cook pancakes. It specifies the ingredients needed, the type of pan to use, and how to know when the pancake is ready to flip. It also suggests a way to serve the pancakes. Answer B, on the other hand, is more general and lacks these details. \n\nDecision: A.",
        "preference": "ansA",
        "llm_a": "openbuddy-70b",
        "llm_b": "internlm-7b",
    },
    "data_source": "piqa",
    "metadata": {"quality_a_sub": "high", "quality_b_sub": "low"},
    "evaluation": {
        "cot": "### Evaluation of Generated Feedback\n\n**Generated Feedback Analysis:**\n\n1. **Clarity and Detail:**\n   - The generated feedback accurately highlights that Answer A provides a clear and detailed step-by-step guide, mentioning specific ingredients and cooking steps. This aligns well with the reference feedback, which also emphasizes the detailed nature of Answer A.\n\n2. **Additional Information:**\n   - The generated feedback notes the suggestion of serving with maple syrup, which is a practical addition. This aspect is not explicitly mentioned in the reference feedback, making the generated feedback slightly more comprehensive in this regard.\n\n3. **Engagement:**\n   - The generated feedback mentions the use of an emoji in Answer A, adding an element of engagement. This is an additional observation not covered in the reference feedback, which could be seen as enhancing the evaluation by considering the engagement factor.\n\n4. **Critique of Answer B:**\n   - Both the generated and reference feedbacks agree that Answer B lacks detail and completeness. The generated feedback expands on this by pointing out the absence of engaging elements, which is a valid critique but not present in the reference feedback.\n\n5. **Overall Preference:**\n   - The generated feedback clearly states a preference for Answer A, providing reasons that are consistent with the reference feedback. It also adds value by discussing engagement and additional serving suggestions.\n\n**Comparison with Reference Feedback:**\n\n- The generated feedback provides a more comprehensive analysis by including aspects of engagement and additional serving suggestions, which are not present in the reference feedback. This makes the generated feedback slightly more detailed and engaging.\n\n**Score: 9**\n\nThe generated feedback is slightly better than the reference feedback due to its additional insights into engagement and serving suggestions, while still maintaining a clear and accurate analysis of the answers.",
        "score": 5.0,
    },
}
@WencWu
Copy link
Author

WencWu commented Nov 29, 2024

还有 "Decision: .+" 也是同样的问题,若存在 "**Decision: 6**" 也无法正确解析,是否将正则表达式修改为 "Decision: [\.\d]+" 会更好呢?

@Brovmos
Copy link

Brovmos commented Dec 2, 2024

Thank you for reaching to Our Github. Customer service may take some time to respond due to a large number of requests. We believe your case will be resolved as soon as possible. Kindly visit the HERE TO CONTACT OUR LIVE SUPPORT DEV for instant support response
Note: Click on the live chat icon at the bottom corner of the page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants