Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

Open
hoyajigi opened this issue Dec 9, 2024 · 0 comments
Open

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

hoyajigi opened this issue Dec 9, 2024 · 0 comments

Comments

@hoyajigi
Copy link
Member

hoyajigi commented Dec 9, 2024

Main idea

Hello, I am a user who loves and uses backend.ai well.

�In the Model Serving, there is vLLM Runtime Variant, but there is no function to dynamically specify engine arg (such as max_num_batched_tokens) and it must be embedded in the Docker image. Since this is a strongly parametric value and needs to be optimized, I hope you create a way to run the model by specifying those in the UI.

Alternative ideas

No response

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant