-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453
[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
It would also be nice if you add this to the Completions API as well (not just Chat Completions API). |
Sure, I can get on that as well |
@DarkLight1337 I have implemented the requested changes in the fixup. I did keep the default value in the response object as 'None' since it appears to be the default value for all unused/unfilled response values. Otherwise, I put the default as 0, as per your suggestion to not default to None. However, the SamplingParams object does default to None. It would make more sense to also change the default value in SamplingParams to keep consistency. |
I meant that the default should not be |
vllm/entrypoints/openai/protocol.py
Outdated
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field( | ||
default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field( | |
default=None) | |
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = None |
Using Field
is unnecessary here.
vllm/entrypoints/openai/protocol.py
Outdated
@@ -627,6 +634,8 @@ class ChatCompletionResponse(OpenAIBaseModel): | |||
model: str | |||
choices: List[ChatCompletionResponseChoice] | |||
usage: UsageInfo | |||
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field( | |||
default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for implementing this!
This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to a value higher than 0, the response will return the log probabilities of the decoded input tokens. The same option has been included for the completions API. Note that the prompt_logprobs will be included for every prompt that the completions request contains. This is why the prompt_logprompts in the completions response in nested further than in the chat completions response. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream is enabled and prompt_logprobs is set to a value higher than 0. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508
Head branch was pushed to by a user without write access
23fc4b3
to
a8e0511
Compare
/ready |
Is the correct way to make the request to a vllm server to use |
Yes, please check the test cases for some examples. |
Thank you. I did find it (I was doing it wrong on my end) |
vllm-project#7453) Signed-off-by: Alvant <[email protected]>
This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to true, it will return the log probabilities of the decoded input tokens.
This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful.
Currently, the server will report an error if stream and prompt_logprobs are both enabled.
The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline.
It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations.
FIX #6508