[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

gnpinkert · 2024-08-13T01:41:59Z

This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to true, it will return the log probabilities of the decoded input tokens.

This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful.

Currently, the server will report an error if stream and prompt_logprobs are both enabled.

The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline.

It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations.

FIX #6508

github-actions · 2024-08-13T01:42:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_chat.py

DarkLight1337 · 2024-08-13T05:38:56Z

It would also be nice if you add this to the Completions API as well (not just Chat Completions API).

gnpinkert · 2024-08-14T01:11:35Z

It would also be nice if you add this to the Completions API as well (not just Chat Completions API).

Sure, I can get on that as well

gnpinkert · 2024-08-15T04:03:31Z

@DarkLight1337 I have implemented the requested changes in the fixup. I did keep the default value in the response object as 'None' since it appears to be the default value for all unused/unfilled response values. Otherwise, I put the default as 0, as per your suggestion to not default to None.

However, the SamplingParams object does default to None. It would make more sense to also change the default value in SamplingParams to keep consistency.

DarkLight1337 · 2024-08-15T04:52:22Z

@DarkLight1337 I have implemented the requested changes in the fixup. I did keep the default value in the response object as 'None' since it appears to be the default value for all unused/unfilled response values. Otherwise, I put the default as 0, as per your suggestion to not default to None.

However, the SamplingParams object does default to None. It would make more sense to also change the default value in SamplingParams to keep consistency.

I meant that the default should not be None when the value would otherwise only be a boolean. Now that it can be an integer, it is fine to default to None again.

DarkLight1337 · 2024-08-15T07:20:44Z

vllm/entrypoints/openai/protocol.py

+    prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
+        default=None)


Suggested change

prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(

default=None)

prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = None

Using Field is unnecessary here.

DarkLight1337 · 2024-08-15T07:20:49Z

vllm/entrypoints/openai/protocol.py

@@ -627,6 +634,8 @@ class ChatCompletionResponse(OpenAIBaseModel):
    model: str
    choices: List[ChatCompletionResponseChoice]
    usage: UsageInfo
+    prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
+        default=None)


vllm/entrypoints/openai/serving_chat.py

DarkLight1337

LGTM, thanks for implementing this!

This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to a value higher than 0, the response will return the log probabilities of the decoded input tokens. The same option has been included for the completions API. Note that the prompt_logprobs will be included for every prompt that the completions request contains. This is why the prompt_logprompts in the completions response in nested further than in the chat completions response. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream is enabled and prompt_logprobs is set to a value higher than 0. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508

gnpinkert · 2024-08-16T00:13:29Z

/ready

vllm/entrypoints/openai/serving_chat.py

vllm/entrypoints/openai/serving_completion.py

vllm-project#7453)

SrGonao · 2024-08-29T13:24:11Z

Is the correct way to make the request to a vllm server to use kwargs["extra_body"] = {"prompt_logprobs": 15}? I haven't managed to generate a request that returns the prompt logprobs using the chat completion api. I looked at the tests but this is what they suggest and it does not seem to work in my end. Using the most recent version of vllm

DarkLight1337 · 2024-08-29T13:34:43Z

Is the correct way to make the request to a vllm server to use kwargs["extra_body"] = {"prompt_logprobs": 15}? I haven't managed to generate a request that returns the prompt logprobs using the chat completion api. I looked at the tests but this is what they suggest and it does not seem to work in my end. Using the most recent version of vllm

Yes, please check the test cases for some examples.

SrGonao · 2024-08-30T10:09:49Z

Thank you. I did find it (I was doing it wrong on my end)

vllm-project#7453) Signed-off-by: Alvant <[email protected]>

vllm-project#7453)

DarkLight1337 reviewed Aug 13, 2024

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 15, 2024

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Aug 16, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 16, 2024 00:04

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 16, 2024

auto-merge was automatically disabled August 16, 2024 00:13
Head branch was pushed to by a user without write access

gnpinkert force-pushed the add_prompt_logprobs branch from 23fc4b3 to a8e0511 Compare August 16, 2024 00:13

DarkLight1337 reviewed Aug 16, 2024

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_completion.py Outdated Show resolved Hide resolved

Keep the format of trailing comma consistent

6fa0b16

DarkLight1337 enabled auto-merge (squash) August 16, 2024 00:18

DarkLight1337 merged commit f878c8f into vllm-project:main Aug 16, 2024
51 checks passed

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

d572f00

vllm-project#7453)

zifeitong pushed a commit to zifeitong/vllm that referenced this pull request Aug 20, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

5682e14

vllm-project#7453)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

d4b3b98

vllm-project#7453)

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

7eb47e4

vllm-project#7453)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

8a93a6e

vllm-project#7453) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

fafeec2

vllm-project#7453)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

gnpinkert commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

DarkLight1337 commented Aug 13, 2024

gnpinkert commented Aug 14, 2024

gnpinkert commented Aug 15, 2024

DarkLight1337 commented Aug 15, 2024

DarkLight1337 Aug 15, 2024

DarkLight1337 Aug 15, 2024

DarkLight1337 left a comment

gnpinkert commented Aug 16, 2024

SrGonao commented Aug 29, 2024

DarkLight1337 commented Aug 29, 2024

SrGonao commented Aug 30, 2024

		prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
		default=None)

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

Conversation

gnpinkert commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

DarkLight1337 commented Aug 13, 2024

gnpinkert commented Aug 14, 2024

gnpinkert commented Aug 15, 2024

DarkLight1337 commented Aug 15, 2024

DarkLight1337 Aug 15, 2024

Choose a reason for hiding this comment

DarkLight1337 Aug 15, 2024

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

gnpinkert commented Aug 16, 2024

SrGonao commented Aug 29, 2024

DarkLight1337 commented Aug 29, 2024

SrGonao commented Aug 30, 2024