Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

gnpinkert
Copy link
Contributor

This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to true, it will return the log probabilities of the decoded input tokens.

This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful.

Currently, the server will report an error if stream and prompt_logprobs are both enabled.

The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline.

It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations.

FIX #6508

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved
vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved
@DarkLight1337
Copy link
Member

It would also be nice if you add this to the Completions API as well (not just Chat Completions API).

@gnpinkert
Copy link
Contributor Author

It would also be nice if you add this to the Completions API as well (not just Chat Completions API).

Sure, I can get on that as well

@gnpinkert
Copy link
Contributor Author

@DarkLight1337 I have implemented the requested changes in the fixup. I did keep the default value in the response object as 'None' since it appears to be the default value for all unused/unfilled response values. Otherwise, I put the default as 0, as per your suggestion to not default to None.

However, the SamplingParams object does default to None. It would make more sense to also change the default value in SamplingParams to keep consistency.

@DarkLight1337
Copy link
Member

@DarkLight1337 I have implemented the requested changes in the fixup. I did keep the default value in the response object as 'None' since it appears to be the default value for all unused/unfilled response values. Otherwise, I put the default as 0, as per your suggestion to not default to None.

However, the SamplingParams object does default to None. It would make more sense to also change the default value in SamplingParams to keep consistency.

I meant that the default should not be None when the value would otherwise only be a boolean. Now that it can be an integer, it is fine to default to None again.

Comment on lines 540 to 541
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
default=None)
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = None

Using Field is unnecessary here.

@@ -627,6 +634,8 @@ class ChatCompletionResponse(OpenAIBaseModel):
model: str
choices: List[ChatCompletionResponseChoice]
usage: UsageInfo
prompt_logprobs: Optional[List[Optional[Dict[int, Logprob]]]] = Field(
default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for implementing this!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 16, 2024 00:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 16, 2024
This commit adds a prompt_logprobs option in the extra body field of the
chat completions API. When set to a value higher than 0, the response
will return the log probabilities of the decoded input tokens.

The same option has been included for the completions API. Note that the
prompt_logprobs will be included for every prompt that the completions
request contains. This is why the prompt_logprompts in the completions
response in nested further than in the chat completions response.

This option was not included in the streaming API. This decision was made
since streaming is meant for real time feedback with reduced latency, it
doesn't make much sense to include the same prompt log probabilities every
single time. This can be included if that is also deemed to be useful.

Currently, the server will report an error if stream is enabled and
prompt_logprobs is set to a value higher than 0.

The return value in the chat completions API was modeled after the
prompt_logprobs return value during offline inference to reduce coding
complexity if switching between online/offline.

It was possible to get the prompt_logprobs earlier if echo and
top_logprobs were enabled. This behavior was kept the same to not break
any existing configurations.

FIX vllm-project#6508
auto-merge was automatically disabled August 16, 2024 00:13

Head branch was pushed to by a user without write access

@gnpinkert gnpinkert force-pushed the add_prompt_logprobs branch from 23fc4b3 to a8e0511 Compare August 16, 2024 00:13
@gnpinkert
Copy link
Contributor Author

/ready

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 16, 2024 00:18
@DarkLight1337 DarkLight1337 merged commit f878c8f into vllm-project:main Aug 16, 2024
51 checks passed
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
@SrGonao
Copy link

SrGonao commented Aug 29, 2024

Is the correct way to make the request to a vllm server to use kwargs["extra_body"] = {"prompt_logprobs": 15}? I haven't managed to generate a request that returns the prompt logprobs using the chat completion api. I looked at the tests but this is what they suggest and it does not seem to work in my end. Using the most recent version of vllm

@DarkLight1337
Copy link
Member

Is the correct way to make the request to a vllm server to use kwargs["extra_body"] = {"prompt_logprobs": 15}? I haven't managed to generate a request that returns the prompt logprobs using the chat completion api. I looked at the tests but this is what they suggest and it does not seem to work in my end. Using the most recent version of vllm

Yes, please check the test cases for some examples.

@SrGonao
Copy link

SrGonao commented Aug 30, 2024

Thank you. I did find it (I was doing it wrong on my end)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add OpenAI server prompt_logprobs support
3 participants