-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initializing Azure instances is very slow #328
Comments
+1. I had this slow initialization issue too. I might miss something but why is |
Because ray-autoscaler is using it. For GCP and AWS, their CLIs are already installed. |
oh I see. It's used on the head node to further provision resources for worker nodes? Is it correct? |
hmmm, it is mostly used by ray autoscaler for monitoring |
I tried revisiting this issue briefly. For a cpunode:
I hacked the template by using the same resource group per region -- no speedup. So the root cause seems to be Azure's python SDK being much slower than their console. We can take a deeper look. Typical output
|
Might be good to verify this hypothesis by using their pure python SDK (without ray autoscaler) to provision a VM and measure time. Here's an example. |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
We should also keep this one open unless we are satisfied with the speed with Azure. |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
Can this be re-opened? Still very slow today. For reference, a simple vllm setup takes 18 mins. |
Related #3695 |
This issue should be mitigated by #3704. Closing for now. |
It takes me 14min to spin up a cluster with 2 cpu nodes.
The most time consuming part is installing pip packages, especially
azure-cli
. This may be addressed by releasing images withazure-cli
pre-installed.The text was updated successfully, but these errors were encountered: