[Bug] strategyqa answer extraction error #1715

Linzwcs · 2024-11-25T11:37:27Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

none

Reproduces the problem - code/configuration sample

none

Reproduces the problem - command or script

none

Reproduces the problem - error message

none

Other information

In StrategyQA, instructions are structured as follows:

{
    "role": "HUMAN",
    "prompt": "Question: Do hamsters provide food for any animals?\nAnswer:"
},
{
    "role": "BOT",
    "prompt": "Hamsters are prey animals. Prey are food for predators. Thus, hamsters provide food for some animals.\nSo the answer is yes\n"
}

I conducted an evaluation on Llama-3.1-8B-Instruct, yielding the following results:

"0": {
    "origin_prompt": [
        {
            "role": "HUMAN",
            "prompt": "Question: Do hamsters provide food for any animals?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "Hamsters are prey animals. Prey are food for predators. Thus, hamsters provide food for some animals.\nSo the answer is yes\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Could Brooke Shields succeed at University of Pennsylvania?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "Brooke Shields went to Princeton University. Princeton University is about as academically rigorous as the University of Pennsylvania. Thus, Brooke Shields could also succeed at the University of Pennsylvania.\nSo the answer is yes\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Hydrogen's atomic number squared exceeds number of Spice Girls?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "Hydrogen has an atomic number of 1. 1 squared is 1. There are 5 Spice Girls. Thus, Hydrogen's atomic number squared is less than 5.\nSo the answer is no\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Is it common to see frost during some college commencements?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "College commencement ceremonies can happen in December, May, and June. December is in the winter, so there can be frost. Thus, there could be frost at some commencements.\nSo the answer is yes\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Yes or no: Could a llama birth twice during War in Vietnam (1945-46)?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "The War in Vietnam was 6 months. The gestation period for a llama is 11 months, which is more than 6 months. Thus, a llama could not give birth twice during the War in Vietnam.\nSo the answer is no\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Would a pear sink in water?\nAnswer:"
        },
        {
            "role": "BOT",
            "prompt": "The density of a pear is about 0.6g/cm3, which is less than water. Objects less dense than water float. Thus, a pear would float.\nSo the answer is no\n"
        },
        {
            "role": "HUMAN",
            "prompt": "Question: Are more people today related to Genghis Khan than Julius Caesar?\nAnswer:"
        }
    ],
    "prediction": "Genghis Khan and his descendants had a large number of children and grandchildren. This led to a significant expansion of his genetic lineage. Julius Caesar, on the other hand, had fewer known descendants.\n\nStudies have estimated that around 16 million men in the world today are direct descendants of Genghis Khan. This is a relatively high percentage of the global male population.\n\nIn contrast, the number of direct descendants of Julius Caesar is significantly lower.\n\nSo the answer is yes",
    "gold": true
}

It appears that the correct answer lies in the final segment of the response. However, the strategyqa_pred_postprocess function currently extracts the answer from the first segment. Below is the function's implementation:

@TEXT_POSTPROCESSORS.register_module('strategyqa')
def strategyqa_pred_postprocess(text: str) -> str:
    text = text.split('\n\n')[0]
    text = text.split('answer is ')[-1]
    match = re.search(r'(yes|no)', text.lower())
    if match:
        return match.group(1)
    return ''

Observations

The function splits the input text by \n\n and processes only the first segment, potentially ignoring the final conclusion.

To Fix

The function should be changed to:

@TEXT_POSTPROCESSORS.register_module('strategyqa')
def strategyqa_pred_postprocess(text: str) -> str:
    text = text.split('\n\n')[-1]
    text = text.split('answer is ')[-1]
    match = re.search(r'(yes|no)', text.lower())
    if match:
        return match.group(1)
    return ''

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned bittersweet1999 Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] strategyqa answer extraction error #1715

[Bug] strategyqa answer extraction error #1715

Linzwcs commented Nov 25, 2024

[Bug] strategyqa answer extraction error #1715

[Bug] strategyqa answer extraction error #1715

Comments

Linzwcs commented Nov 25, 2024

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

Observations

To Fix