Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

Open
andrewsykim opened this issue Apr 5, 2024 · 0 comments
Open

RAG deployment sometimes need a 3rd g2-standard-24 instance #572

andrewsykim opened this issue Apr 5, 2024 · 0 comments

Comments

@andrewsykim
Copy link
Collaborator

The RAG deployment creates a CPU node pool of size 2 and a GPU node pool of size 2, both with autoscaling enabled. The GPU node pool is g2-standard-24 with 2 L4 GPUs and CPU node pool is n1-standard-16.

With the current default configuration, the cluster autoscaler sometimes scales the GPU node pool to fit everything in the RAG deployment. We should try to have the default node pool settings work without having to scale more nodes. GPUs are also expensive so we should avoid scaling a 3rd g2-standard-24 when possible. GPUs are also scarce so it increases failure rate of the RAG deployment.

I haven't thoroughly investigated why a 3rd g2-standard-24 instance is needed, but I suspect it's due to various components of the stack having their CPU / memory limits increased over the past several months. We should consider reducing the CPU/Memory requests whereever possible so we can run everything using the default node counts.

This will likely be fixed when we default all deployments to Autopilot as well. But even then we should try to reduce resource requests wherever possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant