Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TPU] tpu-v5-8 failed to create and says 'Insufficient reserved capacity' when no reservation is used #4579

Open
cblmemo opened this issue Jan 17, 2025 · 0 comments

Comments

@cblmemo
Copy link
Collaborator

cblmemo commented Jan 17, 2025

$ sky launch --gpus tpu-v5p-8 --region us-central1 -c ttpu
Missing runtime_version in accelerator_args, using default (v2-alpha-tpuv5)
Considered resources (1 node):
-----------------------------------------------------------------------------------------
 CLOUD   INSTANCE   vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE     COST ($)   CHOSEN   
-----------------------------------------------------------------------------------------
 GCP     TPU-VM     -       -         tpu-v5p-8:1    us-central1-a   16.80         ✔     
-----------------------------------------------------------------------------------------
Launching a new cluster 'ttpu'. Proceed? [Y/n]: 
⚙︎ Launching on GCP us-central1 (us-central1-a).
W 01-16 17:42:36 instance_utils.py:112] Got return code 8 in us-central1-a: 'Insufficient reserved capacity. Contact customer support to increase your reservation. [EID: 0x7adf6d58c08b0aa4]'
sky.exceptions.ResourcesUnavailableError: Failed to acquire resources in us-central1-a for {GCP({'tpu-v5p-8': 1}, accelerator_args={'runtime_version': 'v2-alpha-tpuv5'})}. 

↺ Trying other potential resources.
⨯ Failed to provision resources. View logs at: ~/sky_logs/sky-2025-01-16-17-42-20-623267/provision.log

sky.exceptions.ResourcesUnavailableError: Failed to provision all possible launchable resources. Relax the task's resource requirements: 1x GCP({'tpu-v5p-8': 1}, accelerator_args={'runtime_version': 'v2-alpha-tpuv5'})
To keep retrying until the cluster is up, use the `--retry-until-up` flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant