Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Minimum requirements for SageMaker compatibility #11576

Merged
merged 12 commits into from
Jan 2, 2025

Conversation

nathan-az
Copy link
Contributor

@nathan-az nathan-az commented Dec 28, 2024

Fixes #11557

Implements /ping and /invocations, and creates an alternate dockerfile, identical to vllm-openai but with entrypoint setting port to 8080.

Since the OpenAI server is more "production-ready" we use this functionality and its handlers as the base.

Considerations:

Dockerfile

The Dockerfile order has changed, defining the vllm-sagemaker image first, then building from that for vllm-openai.

This avoids repetition of the additional dependencies, and still defines vllm-openai last, so that it is the default for docker build. If we don't like using vllm-sagemaker as the base for vllm-openai we can simply repeat the additional requirements between both, and revert to from vllm-base as vllm-openai.

Routing

  • The app state has slightly changed to include the model task to aid with inferring the correct task
  • We lose the FastAPI casting of the incoming request to the correct Pydantic model, so we explicitly do this with model_validate
  • We use whether messages is in the request to determine whether it is a chat input

Note that these changes make no changes to other images or APIs. IMO it should be ok to integrate them for the purpose of expanding to SageMaker use cases, without offering the full flexibility of being able to make requests to all the endpoints.

I have tested the new endpoints locally. I will be able to test building and deploying on SageMaker some time in the next couple of weeks, but welcome feedback.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: Nathan Azrak <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks for adding support for this!

Please let me now once you have tested this as per #11576 (comment)

@nathan-az
Copy link
Contributor Author

nathan-az commented Dec 29, 2024

Thanks @DarkLight1337 ! One limitation I just realised - I don't think Sagemaker allows specifying launcher args to the container, only environment variables.

I'm waiting on our AWS specialist to confirm.

As far as I can tell from the vLLM source, engine args can only be specified via CLI args, not env vars (please correct me if this is not true).

I wrote an extension to this PR, which involves a separate entrypoint which parses any environment variables prefixed with SM_VLLM_ to launcher args. I made a PR in my fork (to make it easier to review).

Let me know your thoughts. This is similarly non-invasive as it's all isolated to only the sagemaker image and a new entrypoint file. If you're happy with this pattern I can merge that into this branch, and use that for testing directly on Sagemaker.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 30, 2024

As far as I can tell from the vLLM source, engine args can only be specified via CLI args, not env vars (please correct me if this is not true).

Yes, this is correct. You can also specify the CLI args by passing a config file, but the file path itself still needs to be passed via CLI args.

@DarkLight1337
Copy link
Member

I wrote an extension to this PR, which involves a separate entrypoint which parses any environment variables prefixed with SM_VLLM_ to launcher args. I made a nathan-az#1 (to make it easier to review).

This looks good to me!

@DarkLight1337
Copy link
Member

Is this ready to merge now?

@nathan-az
Copy link
Contributor Author

Is this ready to merge now?

Not yet, just fixed a bug in the dockerfile. Docker builds take a while so iteration isn't very rapid.

I should have time to test by Friday, but will ping you when ready @DarkLight1337

@nathan-az
Copy link
Contributor Author

nathan-az commented Jan 1, 2025

@DarkLight1337 I was able to successfully build and deploy to sagemaker with custom model args, and did some basic testing with generate in both the messages and completions format without issues :)

Should be ready for additional CI/merging.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 2, 2025 03:25
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 2, 2025
@nathan-az
Copy link
Contributor Author

@DarkLight1337 Failed test in CI looks unrelated to this PR.

May be related to this recently merged PR? #11663

@DarkLight1337
Copy link
Member

I'll get someone to force-merge this.

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move sagemaker-entrypoint.sh into examples/, so the root directory remain clean (and we are in progress cleaning it up further). Thank you!

auto-merge was automatically disabled January 2, 2025 22:44

Head branch was pushed to by a user without write access

@nathan-az
Copy link
Contributor Author

@simon-mo Done :) Just a note that the Dockerfile will obviously have to be changed if docker files are put in their own directory or if the examples directory is restructured/organised.

Dockerfile Outdated Show resolved Hide resolved
@simon-mo simon-mo merged commit 68d3780 into vllm-project:main Jan 2, 2025
9 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support for SageMaker-required endpoints
3 participants