-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add Reasoning Parser for Granite Models #14202
base: main
Are you sure you want to change the base?
[Model] Add Reasoning Parser for Granite Models #14202
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
0affa66
to
f1d3367
Compare
Nice use of this new feature! Will try in a bit cc @gaocegege |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Could you please rebase the upstream? In a previous PR to support reasoning outputs in structured outputs https://github.com/vllm-project/vllm/pull/12955/files#diff-ea8b8ff63961713ccb62d78e53e96404b587b7828cb9fee08a9e5576bf563673R1065, we moved the CLI argument --reasoning-parser
to https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L1076
Thus you may need to add a new choice there.
Hi, I updated the docs in this PR #14114 Maybe you should rebase the docs too. Just FYI |
4f4cbe1
to
ecadd2d
Compare
@@ -19,6 +19,10 @@ def get_reasoner(tokenizer: PreTrainedTokenizer, | |||
return None | |||
elif reasoning_backend == "deepseek_r1": | |||
return DeepSeekReasoner.from_tokenizer(tokenizer) | |||
elif reasoning_backend == "granite": | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a warning for now since this is already a large PR, but I think adding a GraniteReasoner
for guided decoding could be a follow-up later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
Awesome thanks @gaocegege! It's been rebased 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🎉 👍
@mgoin Please give it another review, thanks! |
response_start) | ||
reasoning_content = current_text[ | ||
start_reasoning_content:end_reasoning_content] | ||
response_content = current_text[current_chunk_end + 1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_content = current_text[current_chunk_end + 1:] | |
response_content = current_text[current_chunk_end + 1:] | |
parsed_content = True |
parsed_content
flag doesn't seem to be updated, so maybe helpful to set it?
Very minor suggestion, totally optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @b8zhong, thanks for the suggestion! For now, I'd prefer to keep it as is since it returns immediately after parsing the response content. I.e., once this condition is met, there is no need to keep going, so updating the flag won't do anything 🙂
This pull request has merge conflicts that must be resolved before it can be |
@alex-jw-brooks Hi could you please resolve the conflicts |
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
ecadd2d
to
75dcdd2
Compare
Thanks for the nudge @gaocegege! Rebased 😄 |
Co-authored-by: Joe Runde <[email protected]>
This PR adds a reasoning parser for Granite 3.2 models! These models have an optional chat template kwarg
thinking
that changes the system prompt to enable reasoning. 😄The format of the text is expected to be:
There have been reports of quantized versions of the model using
Here's
instead ofHere is
though, so this PR matches on both.Examples
Start the server with a granite (3.2) language model that has reasoning and the
granite
parser:Snippets are copied from the docs, with the only change being adding
chat_template_kwargs
withthinking=True
. Without this, reasoning will be disabled, and it'll generally parse everything intocontent
.No streaming:
With streaming:
Example output (run from the streaming snippet above)