Skip to content

Commit

Permalink
[GCP] Fix GCP labels for TPU (#3652)
Browse files Browse the repository at this point in the history
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

* Fix labels for GCP TPU

* format

* fix key

---------

Co-authored-by: Gurcan Gercek <[email protected]>
Co-authored-by: Zhanghao Wu <[email protected]>
Co-authored-by: Gurcan Gercek <[email protected]>
  • Loading branch information
4 people committed Aug 23, 2024
1 parent 3cd26e2 commit 385232f
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
4 changes: 4 additions & 0 deletions sky/clouds/gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,10 @@ def make_deploy_resources_variables(
('gcp', 'managed_instance_group'), None)
use_mig = managed_instance_group_config is not None
resources_vars['gcp_use_managed_instance_group'] = use_mig
# Convert boolean to 0 or 1 in string, as GCP does not support boolean
# value in labels for TPU VM APIs.
resources_vars['gcp_use_managed_instance_group_value'] = str(
int(use_mig))
if use_mig:
resources_vars.update(managed_instance_group_config)
return resources_vars
Expand Down
2 changes: 1 addition & 1 deletion sky/templates/gcp-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ available_node_types:
{%- for label_key, label_value in labels.items() %}
{{ label_key }}: {{ label_value|tojson }}
{%- endfor %}
managed-instance-group: {{ gcp_use_managed_instance_group }}
use-managed-instance-group: {{ gcp_use_managed_instance_group_value|tojson }}
{%- if gcp_use_managed_instance_group %}
managed-instance-group:
run_duration: {{ run_duration }}
Expand Down

0 comments on commit 385232f

Please sign in to comment.