-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Cloud SQL & cluster creation #328
Conversation
imreddy13
commented
Mar 11, 2024
•
edited
Loading
edited
- Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook
- Set Cloud SQL region in frontend
- Fix bug with cluster creation where random zones without GPU availability were being selected
- Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)
1fabd8e
to
bc016a0
Compare
bc016a0
to
3d56c3e
Compare
/gcbrun |
3d56c3e
to
00faf09
Compare
/gcbrun |
infrastructure/main.tf
Outdated
zone = length(split("-", var.cluster_location)) > 2 ? split(",", var.cluster_location) : [] | ||
# zone needs to be set even for regional clusters, otherwise this module picks random zones that don't have GPU availability: | ||
# https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/af354afdf13b336014cefbfe8f848e52c17d4415/main.tf#L46 | ||
zone = length(split("-", var.cluster_location)) > 2 ? split(",", var.cluster_location) : split(",", local.gpu_l4_t4_location[local.region]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, good catch. It's kind of unexpected that it does this
00faf09
to
227cdfc
Compare
bypassing pr-review-trigger after updating 2 files with tf fmt passed on previous iteration: https://github.com/GoogleCloudPlatform/ai-on-gke/pull/328/checks?check_run_id=22591229809 |
@@ -62,7 +62,9 @@ locals { | |||
subnetwork_name = var.create_network ? module.custom-network[0].subnets_names[0] : var.subnetwork_name | |||
region = length(split("-", var.cluster_location)) == 2 ? var.cluster_location : "" | |||
regional = local.region != "" ? true : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case regional should be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so complicated.
I think region = var.cluster_location if it's a regional cluster.
so local.region != "" is true for regional clusters. so this is correct.
@umeshkumhar this naming is super confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify.. we take cluster_location input only.
Then we check if user provided region or zone by splitting on hypen. Once done, we check if region is set that means cluster will be regional, so make it true, else false.
1) Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook 2) Set Cloud SQL region in frontend 3) Fix bug with cluster creation where random zones without GPU availability were being selected 4) Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)
227cdfc
to
b556c85
Compare
Improvements to Cloud SQL & cluster creation: 1) Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook 2) Set Cloud SQL region in frontend 3) Fix bug with cluster creation where random zones without GPU availability were being selected 4) Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)