-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Minimum requirements for SageMaker compatibility #11576
Conversation
…ort 8080 Signed-off-by: Nathan Azrak <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Nathan Azrak <[email protected]>
…upported Signed-off-by: Nathan Azrak <[email protected]>
Signed-off-by: Nathan Azrak <[email protected]>
Signed-off-by: Nathan Azrak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, thanks for adding support for this!
Please let me now once you have tested this as per #11576 (comment)
Signed-off-by: Nathan Azrak <[email protected]>
Thanks @DarkLight1337 ! One limitation I just realised - I don't think Sagemaker allows specifying launcher args to the container, only environment variables. I'm waiting on our AWS specialist to confirm. As far as I can tell from the vLLM source, engine args can only be specified via CLI args, not env vars (please correct me if this is not true). I wrote an extension to this PR, which involves a separate entrypoint which parses any environment variables prefixed with Let me know your thoughts. This is similarly non-invasive as it's all isolated to only the sagemaker image and a new entrypoint file. If you're happy with this pattern I can merge that into this branch, and use that for testing directly on Sagemaker. |
Yes, this is correct. You can also specify the CLI args by passing a config file, but the file path itself still needs to be passed via CLI args. |
This looks good to me! |
Add custom entrypoint mapping env vars to cli args
…ssues Signed-off-by: Nathan Azrak <[email protected]>
Is this ready to merge now? |
Signed-off-by: Nathan Azrak <[email protected]>
Not yet, just fixed a bug in the dockerfile. Docker builds take a while so iteration isn't very rapid. I should have time to test by Friday, but will ping you when ready @DarkLight1337 |
Signed-off-by: Nathan Azrak <[email protected]>
@DarkLight1337 I was able to successfully build and deploy to sagemaker with custom model args, and did some basic testing with Should be ready for additional CI/merging. |
@DarkLight1337 Failed test in CI looks unrelated to this PR. May be related to this recently merged PR? #11663 |
I'll get someone to force-merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move sagemaker-entrypoint.sh
into examples/
, so the root directory remain clean (and we are in progress cleaning it up further). Thank you!
Signed-off-by: Nathan Azrak <[email protected]>
Head branch was pushed to by a user without write access
@simon-mo Done :) Just a note that the Dockerfile will obviously have to be changed if docker files are put in their own directory or if the |
Fixes #11557
Implements
/ping
and/invocations
, and creates an alternate dockerfile, identical tovllm-openai
but with entrypoint setting port to 8080.Since the OpenAI server is more "production-ready" we use this functionality and its handlers as the base.
Considerations:
Dockerfile
The Dockerfile order has changed, defining the
vllm-sagemaker
image first, then building from that forvllm-openai
.This avoids repetition of the additional dependencies, and still defines
vllm-openai
last, so that it is the default fordocker build
. If we don't like usingvllm-sagemaker
as the base forvllm-openai
we can simply repeat the additional requirements between both, and revert tofrom vllm-base as vllm-openai
.Routing
model_validate
messages
is in the request to determine whether it is a chat inputNote that these changes make no changes to other images or APIs. IMO it should be ok to integrate them for the purpose of expanding to SageMaker use cases, without offering the full flexibility of being able to make requests to all the endpoints.
I have tested the new endpoints locally. I will be able to test building and deploying on SageMaker some time in the next couple of weeks, but welcome feedback.