Skip to content

Commit

Permalink
TPU - Add reserved property for nodeset_tpu
Browse files Browse the repository at this point in the history
  • Loading branch information
jvilarru committed Jan 8, 2024
1 parent a308f7d commit 93292b1
Show file tree
Hide file tree
Showing 6 changed files with 28 additions and 2 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

All notable changes to this project will be documented in this file.

## \[6.3.1\]

- Add reserved property for nodeset_tpu

## \[6.3.0\]

- Upgrade installed Slurm to 23.02.7
Expand Down
5 changes: 5 additions & 0 deletions scripts/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -1148,6 +1148,10 @@ def enable_public_ip(self):
def preemptible(self):
return self._nodeset.preemptible

@property
def reserved(self):
return self._nodeset.reserved

@property
def service_account(self):
return self._nodeset.service_account
Expand Down Expand Up @@ -1277,6 +1281,7 @@ def create_node(self, nodename):
node.service_account.email = self.nodeset.service_account.email
node.service_account.scope = self.nodeset.service_account.scopes
node.scheduling_config.preemptible = self.preemptible
node.scheduling_config.reserved = self.reserved
if self.nodeset.network:
node.network_config.network = self.nodeset.network
if self.nodeset.subnetwork:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ No modules.
| <a name="input_preemptible"></a> [preemptible](#input\_preemptible) | Specify whether TPU-vms in this nodeset are preemtible, see https://cloud.google.com/tpu/docs/preemptible for details. | `bool` | `false` | no |
| <a name="input_preserve_tpu"></a> [preserve\_tpu](#input\_preserve\_tpu) | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | `bool` | `true` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project ID to create resources in. | `string` | n/a | yes |
| <a name="input_reserved"></a> [reserved](#input\_reserved) | Specify whether TPU-vms in this nodeset are created under a reservation. | `bool` | `false` | no |
| <a name="input_service_account"></a> [service\_account](#input\_service\_account) | Service account to attach to the TPU-vm.<br>If none is given, the default service account and scopes will be used. | <pre>object({<br> email = string<br> scopes = set(string)<br> })</pre> | `null` | no |
| <a name="input_subnetwork"></a> [subnetwork](#input\_subnetwork) | The name of the subnetwork to attach the TPU-vm of this nodeset to. | `string` | `null` | no |
| <a name="input_tf_version"></a> [tf\_version](#input\_tf\_version) | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | `string` | n/a | yes |
Expand Down
4 changes: 4 additions & 0 deletions terraform/slurm_cluster/modules/slurm_nodeset_tpu/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ resource "null_resource" "nodeset_tpu" {
condition = sum([var.node_count_dynamic_max, var.node_count_static]) > 0
error_message = "Sum of node_count_dynamic_max and node_count_static must be > 0."
}
precondition {
condition = !(var.preemptible && var.reserved)
error_message = "Nodeset cannot be preemptible and reserved at the same time."
}
precondition {
condition = !(var.subnetwork == null && !var.enable_public_ip)
error_message = "Using the default subnetwork for the TPU nodeset requires enable_public_ip set to true."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ variable "preemptible" {
default = false
}

variable "reserved" {
description = "Specify whether TPU-vms in this nodeset are created under a reservation."
type = bool
default = false
}

variable "preserve_tpu" {
description = "Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted"
type = bool
Expand Down
10 changes: 8 additions & 2 deletions terraform/slurm_cluster/modules/slurm_nodeset_tpu/versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@ terraform {
required_version = "~> 1.2"

required_providers {
google = ">= 3.53, < 5.0"
null = "~> 3.0"
google = {
source = "hashicorp/google"
version = ">= 3.53, < 5.0"
}
null = {
source = "hashicorp/null"
version = "~> 3.0"
}
}
}

0 comments on commit 93292b1

Please sign in to comment.