You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am a user who loves and uses backend.ai well.
�In the Model Serving, there is vLLM Runtime Variant, but there is no function to dynamically specify engine arg (such as max_num_batched_tokens) and it must be embedded in the Docker image. Since this is a strongly parametric value and needs to be optimized, I hope you create a way to run the model by specifying those in the UI.
Alternative ideas
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Main idea
Hello, I am a user who loves and uses backend.ai well.
�In the Model Serving, there is vLLM Runtime Variant, but there is no function to dynamically specify engine arg (such as max_num_batched_tokens) and it must be embedded in the Docker image. Since this is a strongly parametric value and needs to be optimized, I hope you create a way to run the model by specifying those in the UI.
Alternative ideas
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: