-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][TPU] Support TPU V5 #3814
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @cblmemo! Let's get it in soon, as there are some users asking for this.
@@ -546,7 +671,7 @@ def get_catalog_df(region_prefix: str) -> 'pd.DataFrame': | |||
region_prefix)] if not gpu_df.empty else gpu_df | |||
|
|||
gcp_tpu_skus = get_skus(TPU_SERVICE_ID) | |||
tpu_df = get_tpu_df(gcp_tpu_skus) | |||
tpu_df = get_tpu_df(gcp_skus, gcp_tpu_skus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a gcp_skus
here? Can we add a comment for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TPU V5 is not in the tpu skus but in the gcp skus. Added a comment here. Thanks!
A note to myself: check price info is correct before merge this. |
@Michaelvll I've checked the documentation again and ensured the pricing calculation is aligned with the doc. The price estimation is still not available on the GCP console - it would be great if we could contact the user for them to double check as I suspect the price will only be shown for those who have TPU v5 quota, but the current impl should be ready to go as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the support of v5! LGTM except for several comments below.
# TODO(tian): Seems like there is no 'Pod' kw in the v5 description. | ||
# Does that means v5 only have TPU Node (instead of VM)? Another | ||
# possibility is that the price shown for v5 TPU is for one core. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm this TODO before merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is due to its pricing unit - per chip hour instead of the whole pod. TPU Node is deprecated by GCP now. Reference: https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_architectures
Just updated the comment. Thanks for catching this!
Closes #2661.
TODO:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh