You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I scale up my deployment using a lora adapter I see that all the traffic to the lora adapter always goes to the pod that initially was there when the ModelAdapter was created.
Requests to the base model are still load balanced across the new backends
Additionally after scaling down if the initial pod is removed i see errors from the system when requesting the lora module BUT kubectl describe modeladapter lora-name shows that it is still Running (though likely pointing to a dead resource)
Steps to Reproduce
Deploy a lora adapter with HPA on deployment
Load test to trigger scale up
Observe the number of running requests per pod are all on one pod
Expected behavior
Load should be balanced across all running pods
Environment
2.1.0
The text was updated successfully, but these errors were encountered:
🐛 Describe the bug
When I scale up my deployment using a lora adapter I see that all the traffic to the lora adapter always goes to the pod that initially was there when the ModelAdapter was created.
Requests to the base model are still load balanced across the new backends
Additionally after scaling down if the initial pod is removed i see errors from the system when requesting the lora module BUT
kubectl describe modeladapter lora-name
shows that it is still Running (though likely pointing to a dead resource)Steps to Reproduce
Expected behavior
Load should be balanced across all running pods
Environment
2.1.0
The text was updated successfully, but these errors were encountered: