-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Allow prefill of assistant response when using mistral_common
#9446
[Bugfix] Allow prefill of assistant response when using mistral_common
#9446
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten can you verify whether this is reasonable? I am not familiar with using Mistral tokenizer.
Just a minor nit:
Co-authored-by: Cyrus Leung <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! If the last message is an assistant message we should indeed set prefix=True
…on` (vllm-project#9446) Signed-off-by: charlifu <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Vinay Damodaran <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Alvant <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Amit Garg <[email protected]>
…on` (vllm-project#9446) Signed-off-by: qishuai <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Sumit Dubey <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Maxime Fournioux <[email protected]>
…on` (vllm-project#9446) Signed-off-by: Tyler Michael Smith <[email protected]>
The
mistral_common
tokenizer supports prefilling of an assistant message, whenprefix=True
is used in the last message (role="assistant"
only).However,
continue_final_message=False
makes no sense for themistral_common
tokenizer becauseChatCompletionRequest
cannot have a last message withrole="assistant"
withoutprefix=True
.Also, AFAIK, OpenAI uses
continue_final_message=True
-like behavior (continues the last assistant message), so I think this behavior should be replicated in vLLM (potentially replacingadd_generation_prompt
/continue_final_message
in chat completion endpoint?), but that's beyond the scope of this PR. (I'm only fixing the error that vLLM gives when an application sends a message chain in which the last message is an assistant.)(Also, warnings about
add_generation_prompt
/continue_final_message
when usingmistral_common
are annoying and should only be logged once (or don't logged at all), but that too is beyond the scope of this PR).