Support autoscaling for the Ollama model server #123

nstogner · 2024-08-24T11:39:49Z

Currently KubeAI only supports initial scale-from zero for the Ollama backend. Autoscaling of vLLM is currently implemented by scraping metrics directly from the vLLM Pods. Ideally we would implement the same process for Ollama.

Waiting on metrics support (ollama/ollama#3144) to land in the upstream project.

nstogner · 2024-09-25T01:17:34Z

Still no movement on the Ollama metrics

nstogner added the enhancement New feature or request label Sep 25, 2024

nstogner mentioned this issue Oct 4, 2024

Autoscale based on KubeAI OpenTelemetry active requests metrics #261

Merged

nstogner closed this as completed in #261 Oct 4, 2024

nstogner closed this as completed in 884aa62 Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support autoscaling for the Ollama model server #123

Support autoscaling for the Ollama model server #123

nstogner commented Aug 24, 2024 •

edited

Loading

nstogner commented Sep 25, 2024

Support autoscaling for the Ollama model server #123

Support autoscaling for the Ollama model server #123

Comments

nstogner commented Aug 24, 2024 • edited Loading

nstogner commented Sep 25, 2024

nstogner commented Aug 24, 2024 •

edited

Loading