How to evaluate correctness and errors in multiple-choice tasks? #2529

WuXnkris · 2024-12-01T10:49:18Z

"gen_args_0": {
"arg_0": "Question: a shopkeeper sold an article offering a discount of 5 % and earned a profit of 31.1 % . what would have been the percentage of profit earned if no discount had been offered ?\nAnswer:",
"arg_1": " 38"
},
"gen_args_1": {
"arg_0": "Question: a shopkeeper sold an article offering a discount of 5 % and earned a profit of 31.1 % . what would have been the percentage of profit earned if no discount had been offered ?\nAnswer:",
"arg_1": " 27.675"
},
"gen_args_2": {
"arg_0": "Question: a shopkeeper sold an article offering a discount of 5 % and earned a profit of 31.1 % . what would have been the percentage of profit earned if no discount had been offered ?\nAnswer:",
"arg_1": " 30"
}

............
"filtered_resps": [
["-7.680142164230347", "False"],
["-17.299633383750916", "False"],

Why are there two gen_args, and why are the resps numerical values?
How can we determine the correct answer generated by the LLM?
How to get the model’s original response?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate correctness and errors in multiple-choice tasks? #2529

How to evaluate correctness and errors in multiple-choice tasks? #2529

WuXnkris commented Dec 1, 2024

How to evaluate correctness and errors in multiple-choice tasks? #2529

How to evaluate correctness and errors in multiple-choice tasks? #2529

Comments

WuXnkris commented Dec 1, 2024