You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
Apologies if this is misfiled as a bug. I'm not sure if it is an enhancement, but grammars are a commonly used feature of llama.cpp and the Deepseek R1 model is very popular, so I wanted to raise it and perhaps think about solutions
Reasoning based models are trained to use tokens to add their own chain of thought to the context. These tokens get suppressed by design when using the sampler with JSON grammar causing the model performance to suffer significantly.
As suppressing the output of the tokens within the tokens is currently being discussed in #11325, I wanted to mention that using grammar to effectively suppress these tokens causes the model to perform badly.
Looking at https://github.com/ggerganov/llama.cpp/blob/master/common/sampling.cpp#L235 , the third parameter to common_sampler_accept appears to be a boolean to enable constraining the sampler by grammar. As the end token is defined at least in Deepseek R1 tokens.json, perhaps it would be a generic-enough solution to disable grammar for these models until this token is reached, if it is a declared token in the loaded model? Or, perhaps a user provided parameter like start_grammar_after_token?
noahhaon
changed the title
Misc. bug:
Misc. bug: Deepseek R1 incompatible with grammars or structured output
Jan 21, 2025
noahhaon
changed the title
Misc. bug: Deepseek R1 incompatible with grammars or structured output
Misc. bug: Deepseek R1 incompatible with grammars / structured output
Jan 21, 2025
Name and Version
Using llama.cpp at b4516 and ollama at 7bb35
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
Apologies if this is misfiled as a bug. I'm not sure if it is an enhancement, but grammars are a commonly used feature of llama.cpp and the Deepseek R1 model is very popular, so I wanted to raise it and perhaps think about solutions
Reasoning based models are trained to use tokens to add their own chain of thought to the context. These tokens get suppressed by design when using the sampler with JSON grammar causing the model performance to suffer significantly.
As suppressing the output of the tokens within the tokens is currently being discussed in #11325, I wanted to mention that using grammar to effectively suppress these tokens causes the model to perform badly.
Looking at https://github.com/ggerganov/llama.cpp/blob/master/common/sampling.cpp#L235 , the third parameter to
common_sampler_accept
appears to be a boolean to enable constraining the sampler by grammar. As the end token is defined at least in Deepseek R1tokens.json
, perhaps it would be a generic-enough solution to disable grammar for these models until this token is reached, if it is a declared token in the loaded model? Or, perhaps a user provided parameter likestart_grammar_after_token
?Thanks for all your great work! <3
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: