You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
zhangxuefeng
changed the title
Why unsloth grpo still need 'answer' as input data?
Why does unsloth grpo still need 'answer' as input data?
Feb 16, 2025
That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".
That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".
In reward functions, there are some super paramaters like 0.125.....and so on. How to set these key socres? based on experience or try during the training?
You pick them yourself based on how important it is that the model gets that particular aspect correctly. Something minor would get a low value (like 0.1) whereas something very important (such as getting the correct integer answer to a GSM8K question) would get a high value (like 2.0).
As described in DeepSeek related papers, there is no need to provide 'answer' in input data for training.
However I saw in unsloth grop implemnetation, we still need to provide 'answer' in input data as below:
data = data.map(lambda x: { # type: ignore
'prompt': [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': x['question']}
],
'answer': extract_hash_answer(x['answer'])
}) # type: ignore
Did I miss something or just misunderstand?
Thanks !
Xuefeng.
The text was updated successfully, but these errors were encountered: