Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

hoyajigi · 2024-12-09T07:57:29Z

Main idea

Hello, I am a user who loves and uses backend.ai well.

�In the Model Serving, there is vLLM Runtime Variant, but there is no function to dynamically specify engine arg (such as max_num_batched_tokens) and it must be embedded in the Docker image. Since this is a strongly parametric value and needs to be optimized, I hope you create a way to run the model by specifying those in the UI.

Alternative ideas

No response

Anything else?

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

hoyajigi commented Dec 9, 2024

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

Dynamic Engine Argument Configuration for vLLM Runtime in WebUI #3230

Comments

hoyajigi commented Dec 9, 2024

Main idea

Alternative ideas

Anything else?