-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Add readiness and liveness endpoints to OpenAI API server #7078
Closed
mfournioux
wants to merge
18
commits into
vllm-project:main
from
mfournioux:add_readiness_liveness_k8s_probes
Closed
Changes from 3 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
bac2c90
add readiness and liveness k8s probes for openai api_server
mfournioux d65bf58
update naming for pydantic classes from openai protocol
mfournioux 0c7945d
update naming for pydantic classes from openai protocol and remove aw…
mfournioux fa1c549
add tests for readiness and liveness endpoints
mfournioux 2fbaa2f
correct syntax pydantic class in protocol
mfournioux 27ef5ac
correct ruff errors
mfournioux 7fa6a37
correct ruff errors
mfournioux 18a9f2c
fixing isort issues
mfournioux 32e030b
update some typo
mfournioux 5127e91
correct some yapf errors
mfournioux c698d76
correct readiness probe regarding its http status
mfournioux ea8be80
replace liveness endpoint by health endpoint and renaming readiness e…
mfournioux c0baaea
clean some imports and configure error response for readiness endpoint
mfournioux 3a8b227
correct model response in readiness endpoint
mfournioux ac095c1
add return response 500 for readiness if model weights not loaded
mfournioux 14b2b91
Update test_basic.py
mfournioux b06d686
update the readiness endpoint with a try clause
mfournioux e950836
add check if KV cache has been set up in readiness endpoint
mfournioux File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion stems from here. It seems that the readiness response incorrectly returns a 200 response (with value "ko", not sure whether it means anything to Kubernetes) even when the model hasn't finished loading yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, thanks for the clarification, the readiness probe should not return 200 if not ready, I correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if
None
is returned from the function, then 200 OK is still returned. You should return an error response (or whatever Kubernetes expects) explicitly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, Kubernetes should expect any other code less than 200 and greater or equal to 400 indicates failure.