-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
cc: @K-Mistele and @maxdebayser for review. Thanks! |
cc @K-Mistele see if this chat template still looks good to you for tool use. |
Thanks for the ping! I'm getting ready for some travel but can take a look while I'm on the plane tomorrow. |
Possibly related #9859 |
{%- if not image_ns.has_images %} | ||
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }} | ||
{%- if tools is not none %} | ||
{{- "Environment: ipython\n" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for JSON tool calling, but is it also true that the pythonic tool calling is incompatible with images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tool Calling does NOT work with images in the prompt as of now.
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
806db87
to
77e82ac
Compare
I pushed some additional changes:
I also see that #9919 was merged to handle detecting what format of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chat content format detection doesn't distinguish between text and vision inputs (only the format of message["content"][int]
. This change LGTM!
Since the chat templates now support "openai" format, we should update the tests accordingly. |
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Head branch was pushed to by a user without write access
…ol use (vllm-project#10164) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
…ol use (vllm-project#10164) Signed-off-by: Travis Johnson <[email protected]>
For our use-case, we want to serve the Llama 3.2 Vision models while also supporting non-vision requests that use tools. The current recommended/example chat template assumes tool use. It injects a tool-use system prompt even when tools are not requested and it does not support image inputs. This PR updates the template to support tool-use, vision inputs, and plain chat generation depending on the input conversation.
Examples below show the results of templating for a few different use-cases. This was done using the
meta-llama/Llama-3.2-11B-Vision-Instruct
model's tokenizer. "New" refers to the template in this PR, "Old" is the current vLLM example template frommain
, and "Base" is using the template from thetokenizer_config.json
in HF Hub.FIX #10324
Basic Chat
Input
Old
New
Base
System Prompt
Input
Old
New
Base
NB: vLLM transforms the system prompt's string content into a JSON object for mllama, but the base template assumes it will always be a string.
Image
Input
Old
New
Base
Image with System Prompt
Input
Old
New
Throws exception:
Base
Throws exception:
Tool Use Request
Input
Old
New
Base
NB: vLLM transforms the string content into a JSON object for mllama, but the base template assumes it will be a string when merging the user message with the tool info.