Why does unsloth grpo still need 'answer' as input data? #1722

zhangxuefeng · 2025-02-15T23:48:20Z

As described in DeepSeek related papers, there is no need to provide 'answer' in input data for training.

However I saw in unsloth grop implemnetation, we still need to provide 'answer' in input data as below:

data = data.map(lambda x: { # type: ignore
'prompt': [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': x['question']}
],
'answer': extract_hash_answer(x['answer'])
}) # type: ignore

Did I miss something or just misunderstand?

Thanks !
Xuefeng.

kallewoof · 2025-02-16T15:32:37Z

That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".

Yazooliu · 2025-02-17T08:53:03Z

That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".

In reward functions, there are some super paramaters like 0.125.....and so on. How to set these key socres? based on experience or try during the training?

kallewoof · 2025-02-17T12:37:38Z

You pick them yourself based on how important it is that the model gets that particular aspect correctly. Something minor would get a low value (like 0.1) whereas something very important (such as getting the correct integer answer to a GSM8K question) would get a high value (like 2.0).

zhangxuefeng changed the title ~~Why unsloth grpo still need 'answer' as input data?~~ Why does unsloth grpo still need 'answer' as input data? Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does unsloth grpo still need 'answer' as input data? #1722

Why does unsloth grpo still need 'answer' as input data? #1722

zhangxuefeng commented Feb 15, 2025 •

edited

Loading

kallewoof commented Feb 16, 2025

Yazooliu commented Feb 17, 2025

kallewoof commented Feb 17, 2025

Why does unsloth grpo still need 'answer' as input data? #1722

Why does unsloth grpo still need 'answer' as input data? #1722

Comments

zhangxuefeng commented Feb 15, 2025 • edited Loading

data = data.map(lambda x: { # type: ignore 'prompt': [ {'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': x['question']} ], 'answer': extract_hash_answer(x['answer']) }) # type: ignore

kallewoof commented Feb 16, 2025

Yazooliu commented Feb 17, 2025

kallewoof commented Feb 17, 2025

zhangxuefeng commented Feb 15, 2025 •

edited

Loading

data = data.map(lambda x: { # type: ignore
'prompt': [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': x['question']}
],
'answer': extract_hash_answer(x['answer'])
}) # type: ignore