Skip to content

ModelAdapters do not dynamically route to new pods #1095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TheCodeWrangler opened this issue May 16, 2025 · 2 comments · May be fixed by #1132
Open

ModelAdapters do not dynamically route to new pods #1095

TheCodeWrangler opened this issue May 16, 2025 · 2 comments · May be fixed by #1132

Comments

@TheCodeWrangler
Copy link

🐛 Describe the bug

When I scale up my deployment using a lora adapter I see that all the traffic to the lora adapter always goes to the pod that initially was there when the ModelAdapter was created.

Requests to the base model are still load balanced across the new backends

Additionally after scaling down if the initial pod is removed i see errors from the system when requesting the lora module BUT kubectl describe modeladapter lora-name shows that it is still Running (though likely pointing to a dead resource)

Steps to Reproduce

  1. Deploy a lora adapter with HPA on deployment
  2. Load test to trigger scale up
  3. Observe the number of running requests per pod are all on one pod

Expected behavior

Load should be balanced across all running pods

Environment

2.1.0

@TheCodeWrangler
Copy link
Author

Thinking about this more I am using a keda autoscaler on my backends.... Do I need to use AIbrix Auto scalers?

@dittops
Copy link
Contributor

dittops commented May 23, 2025

I'm facing the same issue with v3.0.0. The controller is not triggering the adapter load for the new pods.

@dittops dittops linked a pull request May 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants