Skip to content

Commit

Permalink
Merge branch 'dev' into 'master'
Browse files Browse the repository at this point in the history
6.3.1

See merge request SchedMD/slurm-gcp!82
  • Loading branch information
jvilarru committed Jan 9, 2024
2 parents a308f7d + 464a613 commit 96f057e
Show file tree
Hide file tree
Showing 7 changed files with 30 additions and 3 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

All notable changes to this project will be documented in this file.

## \[6.3.1\]

- Add reserved property for nodeset_tpu
- update lustre repository url

## \[6.3.0\]

- Upgrade installed Slurm to 23.02.7
Expand Down
2 changes: 1 addition & 1 deletion ansible/roles/lustre/vars/redhat-8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

lustre_repo_url: https://downloads.whamcloud.com/public/lustre/latest-release/el8.8/client
lustre_repo_url: https://downloads.whamcloud.com/public/lustre/latest-release/el8.9/client

lustre_packages:
- lustre-client
Expand Down
5 changes: 5 additions & 0 deletions scripts/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -1148,6 +1148,10 @@ def enable_public_ip(self):
def preemptible(self):
return self._nodeset.preemptible

@property
def reserved(self):
return self._nodeset.reserved

@property
def service_account(self):
return self._nodeset.service_account
Expand Down Expand Up @@ -1277,6 +1281,7 @@ def create_node(self, nodename):
node.service_account.email = self.nodeset.service_account.email
node.service_account.scope = self.nodeset.service_account.scopes
node.scheduling_config.preemptible = self.preemptible
node.scheduling_config.reserved = self.reserved
if self.nodeset.network:
node.network_config.network = self.nodeset.network
if self.nodeset.subnetwork:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ No modules.
| <a name="input_preemptible"></a> [preemptible](#input\_preemptible) | Specify whether TPU-vms in this nodeset are preemtible, see https://cloud.google.com/tpu/docs/preemptible for details. | `bool` | `false` | no |
| <a name="input_preserve_tpu"></a> [preserve\_tpu](#input\_preserve\_tpu) | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | `bool` | `true` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project ID to create resources in. | `string` | n/a | yes |
| <a name="input_reserved"></a> [reserved](#input\_reserved) | Specify whether TPU-vms in this nodeset are created under a reservation. | `bool` | `false` | no |
| <a name="input_service_account"></a> [service\_account](#input\_service\_account) | Service account to attach to the TPU-vm.<br>If none is given, the default service account and scopes will be used. | <pre>object({<br> email = string<br> scopes = set(string)<br> })</pre> | `null` | no |
| <a name="input_subnetwork"></a> [subnetwork](#input\_subnetwork) | The name of the subnetwork to attach the TPU-vm of this nodeset to. | `string` | `null` | no |
| <a name="input_tf_version"></a> [tf\_version](#input\_tf\_version) | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | `string` | n/a | yes |
Expand Down
4 changes: 4 additions & 0 deletions terraform/slurm_cluster/modules/slurm_nodeset_tpu/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ resource "null_resource" "nodeset_tpu" {
condition = sum([var.node_count_dynamic_max, var.node_count_static]) > 0
error_message = "Sum of node_count_dynamic_max and node_count_static must be > 0."
}
precondition {
condition = !(var.preemptible && var.reserved)
error_message = "Nodeset cannot be preemptible and reserved at the same time."
}
precondition {
condition = !(var.subnetwork == null && !var.enable_public_ip)
error_message = "Using the default subnetwork for the TPU nodeset requires enable_public_ip set to true."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ variable "preemptible" {
default = false
}

variable "reserved" {
description = "Specify whether TPU-vms in this nodeset are created under a reservation."
type = bool
default = false
}

variable "preserve_tpu" {
description = "Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted"
type = bool
Expand Down
10 changes: 8 additions & 2 deletions terraform/slurm_cluster/modules/slurm_nodeset_tpu/versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@ terraform {
required_version = "~> 1.2"

required_providers {
google = ">= 3.53, < 5.0"
null = "~> 3.0"
google = {
source = "hashicorp/google"
version = ">= 3.53, < 5.0"
}
null = {
source = "hashicorp/null"
version = "~> 3.0"
}
}
}

0 comments on commit 96f057e

Please sign in to comment.