Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does unsloth grpo still need 'answer' as input data? #1722

Open
zhangxuefeng opened this issue Feb 15, 2025 · 3 comments
Open

Why does unsloth grpo still need 'answer' as input data? #1722

zhangxuefeng opened this issue Feb 15, 2025 · 3 comments

Comments

@zhangxuefeng
Copy link

zhangxuefeng commented Feb 15, 2025

As described in DeepSeek related papers, there is no need to provide 'answer' in input data for training.

However I saw in unsloth grop implemnetation, we still need to provide 'answer' in input data as below:


data = data.map(lambda x: { # type: ignore
'prompt': [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': x['question']}
],
'answer': extract_hash_answer(x['answer'])
}) # type: ignore

Did I miss something or just misunderstand?

Thanks !
Xuefeng.

@zhangxuefeng zhangxuefeng changed the title Why unsloth grpo still need 'answer' as input data? Why does unsloth grpo still need 'answer' as input data? Feb 16, 2025
@kallewoof
Copy link

That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".

@Yazooliu
Copy link

That is just one use case which happens to include an 'answer' field which is used when scoring the response by the model. It's not "the generated tokens should equal this", but rather "the final answer given by the model should equal this (and this is custom code in a function that scores the output)".

In reward functions, there are some super paramaters like 0.125.....and so on. How to set these key socres? based on experience or try during the training?

@kallewoof
Copy link

You pick them yourself based on how important it is that the model gets that particular aspect correctly. Something minor would get a low value (like 0.1) whereas something very important (such as getting the correct integer answer to a GSM8K question) would get a high value (like 2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants