ModelAdapters do not dynamically route to new pods #1095

TheCodeWrangler · 2025-05-16T20:21:22Z

🐛 Describe the bug

When I scale up my deployment using a lora adapter I see that all the traffic to the lora adapter always goes to the pod that initially was there when the ModelAdapter was created.

Requests to the base model are still load balanced across the new backends

Additionally after scaling down if the initial pod is removed i see errors from the system when requesting the lora module BUT kubectl describe modeladapter lora-name shows that it is still Running (though likely pointing to a dead resource)

Steps to Reproduce

Deploy a lora adapter with HPA on deployment
Load test to trigger scale up
Observe the number of running requests per pod are all on one pod

Expected behavior

Load should be balanced across all running pods

Environment

2.1.0

The text was updated successfully, but these errors were encountered:

TheCodeWrangler · 2025-05-17T21:55:48Z

Thinking about this more I am using a keda autoscaler on my backends.... Do I need to use AIbrix Auto scalers?

dittops · 2025-05-23T06:55:49Z

I'm facing the same issue with v3.0.0. The controller is not triggering the adapter load for the new pods.

dittops linked a pull request May 23, 2025 that will close this issue

[Misc] Support adapter scaling to all replicas #1132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ModelAdapters do not dynamically route to new pods #1095

ModelAdapters do not dynamically route to new pods #1095

TheCodeWrangler commented May 16, 2025

TheCodeWrangler commented May 17, 2025

Uh oh!

dittops commented May 23, 2025

Uh oh!

ModelAdapters do not dynamically route to new pods #1095

ModelAdapters do not dynamically route to new pods #1095

Comments

TheCodeWrangler commented May 16, 2025

🐛 Describe the bug

Steps to Reproduce

Expected behavior

Environment

TheCodeWrangler commented May 17, 2025

Uh oh!

dittops commented May 23, 2025

Uh oh!