Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support autoscaling for the Ollama model server #123

Closed
nstogner opened this issue Aug 24, 2024 · 1 comment · Fixed by #261
Closed

Support autoscaling for the Ollama model server #123

nstogner opened this issue Aug 24, 2024 · 1 comment · Fixed by #261
Labels
enhancement New feature or request

Comments

@nstogner
Copy link
Contributor

nstogner commented Aug 24, 2024

Currently KubeAI only supports initial scale-from zero for the Ollama backend. Autoscaling of vLLM is currently implemented by scraping metrics directly from the vLLM Pods. Ideally we would implement the same process for Ollama.

Waiting on metrics support (ollama/ollama#3144) to land in the upstream project.

@nstogner
Copy link
Contributor Author

Still no movement on the Ollama metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant