Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Cloud SQL & cluster creation #328

Merged
merged 1 commit into from
Mar 13, 2024
Merged

Conversation

imreddy13
Copy link
Collaborator

@imreddy13 imreddy13 commented Mar 11, 2024

  1. Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook
  2. Set Cloud SQL region in frontend
  3. Fix bug with cluster creation where random zones without GPU availability were being selected
  4. Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)

applications/rag/main.tf Outdated Show resolved Hide resolved
@imreddy13 imreddy13 changed the title Pass cloud SQL instance connection name to Ray workers Improvements to Cloud SQL & cluster creation Mar 13, 2024
@imreddy13
Copy link
Collaborator Author

/gcbrun

@imreddy13
Copy link
Collaborator Author

/gcbrun

zone = length(split("-", var.cluster_location)) > 2 ? split(",", var.cluster_location) : []
# zone needs to be set even for regional clusters, otherwise this module picks random zones that don't have GPU availability:
# https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/af354afdf13b336014cefbfe8f848e52c17d4415/main.tf#L46
zone = length(split("-", var.cluster_location)) > 2 ? split(",", var.cluster_location) : split(",", local.gpu_l4_t4_location[local.region])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, good catch. It's kind of unexpected that it does this

@imreddy13
Copy link
Collaborator Author

bypassing pr-review-trigger after updating 2 files with tf fmt

passed on previous iteration: https://github.com/GoogleCloudPlatform/ai-on-gke/pull/328/checks?check_run_id=22591229809

@@ -62,7 +62,9 @@ locals {
subnetwork_name = var.create_network ? module.custom-network[0].subnets_names[0] : var.subnetwork_name
region = length(split("-", var.cluster_location)) == 2 ? var.cluster_location : ""
regional = local.region != "" ? true : false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case regional should be false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so complicated.
I think region = var.cluster_location if it's a regional cluster.
so local.region != "" is true for regional clusters. so this is correct.

@umeshkumhar this naming is super confusing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify.. we take cluster_location input only.

Then we check if user provided region or zone by splitting on hypen. Once done, we check if region is set that means cluster will be regional, so make it true, else false.

1) Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook
2) Set Cloud SQL region in frontend
3) Fix bug with cluster creation where random zones without GPU availability were being selected
4) Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)
@imreddy13 imreddy13 merged commit 169a45b into main Mar 13, 2024
5 of 6 checks passed
annapendleton pushed a commit to annapendleton/ai-on-gke that referenced this pull request Mar 26, 2024
Improvements to Cloud SQL & cluster creation:

1) Pass already constructed cloud SQL instance connection name to Ray workers so users don't have to set it in the notebook
2) Set Cloud SQL region in frontend
3) Fix bug with cluster creation where random zones without GPU availability were being selected
4) Reduce SA name length (gets truncated when IM deployment prefix is added for Marketplace)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants