The infrastructure module creates the GKE cluster and other related resources for the AI applications / workloads to be deployed on them.
-
Update the
platform.tfvars
file with the required configuration. Kindly refer totfvars_examples
for sample configuration. -
Run
terraform init
andterraform apply --var-file=platform.tfvars
For the GCP project where the infra resources are being created, the following prerequisites should be met
- Billing is enabled
- GPU quotas in place
Following service APIs are enabled,
- container.googleapis.com
- gkehub.googleapis.com
- servicenetworking.googleapis.com
- cloudresourcemanager.googleapis.com
if not already enabled, use the following command:
gcloud services enable container.googleapis.com gkehub.googleapis.com \
servicenetworking.googleapis.com cloudresourcemanager.googleapis.com
The default configuration in platform.tfvars
creates a private GKE cluster with internal endpoints and adds the cluster to a project-scoped Anthos fleet.
For admin access to cluster, Anthos Connect Gateway is used.
Clusters with external endpoints can be accessed by configuing Autorized Networks. VPC network (10.100.0.0/16) is already configured for control plane authorized networks.
Lorum Ipsum
- cluster-name
- region
- project_id
Name | Version |
---|---|
helm | ~> 2.8.0 |
kubernetes | 2.18.1 |
No providers.
Name | Source | Version |
---|---|---|
cloud-nat | terraform-google-modules/cloud-nat/google | 5.0.0 |
custom-network | terraform-google-modules/network/google | 8.0.0 |
private-gke-autopilot-cluster | ../modules/gke-autopilot-private-cluster | n/a |
private-gke-standard-cluster | ../modules/gke-standard-private-cluster | n/a |
public-gke-autopilot-cluster | ../modules/gke-autopilot-public-cluster | n/a |
public-gke-standard-cluster | ../modules/gke-standard-public-cluster | n/a |
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
all_node_pools_labels | n/a | map(string) |
n/a | yes |
all_node_pools_metadata | n/a | map(string) |
n/a | yes |
all_node_pools_oauth_scopes | n/a | list(string) |
n/a | yes |
all_node_pools_tags | n/a | list(string) |
n/a | yes |
autopilot_cluster | n/a | bool |
n/a | yes |
cluster_labels | GKE cluster labels | map |
n/a | yes |
cluster_name | n/a | string |
n/a | yes |
cluster_region | n/a | string |
n/a | yes |
cluster_regional | n/a | bool |
n/a | yes |
cluster_zones | n/a | list(string) |
n/a | yes |
cpu_pools | n/a | list(map(any)) |
n/a | yes |
create_cluster | # GKE variables | bool |
n/a | yes |
create_network | # network variables | bool |
n/a | yes |
deletion_protection | n/a | bool |
false |
no |
enable_gpu | Set to true to create TPU node pool | bool |
true |
no |
enable_tpu | Set to true to create TPU node pool | bool |
false |
no |
gcs_fuse_csi_driver | n/a | bool |
false |
no |
gpu_pools | n/a | list(map(any)) |
n/a | yes |
ip_range_pods | n/a | string |
n/a | yes |
ip_range_services | n/a | string |
n/a | yes |
kubernetes_version | n/a | string |
"latest" |
no |
master_authorized_networks | n/a | list(object({ |
[] |
no |
monitoring_enable_managed_prometheus | n/a | bool |
false |
no |
network_name | n/a | string |
n/a | yes |
network_secondary_ranges | n/a | map(list(object({ range_name = string, ip_cidr_range = string }))) |
n/a | yes |
private_cluster | n/a | bool |
true |
no |
project_id | GCP project id | string |
"umeshkumhar" |
no |
region | GCP project region or zone | string |
"us-central1" |
no |
subnetwork_cidr | n/a | string |
n/a | yes |
subnetwork_description | n/a | string |
n/a | yes |
subnetwork_name | n/a | string |
n/a | yes |
subnetwork_private_access | n/a | string |
n/a | yes |
subnetwork_region | n/a | string |
n/a | yes |
tpu_pools | n/a | list(map(any)) |
n/a | yes |
Name | Description |
---|---|
cluster_name | n/a |
cluster_region | n/a |
project_id | n/a |