Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Catalog] Bump catalog schema version #4470

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Dec 13, 2024

Due to the recent update of GCP catalog that adds H100 instances with 1,2,4 ACC counts #4456, our gcp_catalog.py met an backward compatibility issue, where the newly added H100:1 instances causes a KeyError.

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sky/clouds/service_catalog/gcp_catalog.py", line 448, in <lambda>
    lambda x: _get_host_instance_type(x['AcceleratorName'],
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sky/clouds/service_catalog/gcp_catalog.py", line 391, in _get_host_instance_type
    instance_types = _ACC_INSTANCE_TYPE_DICTS[acc_name][acc_count]
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 1

We have to bump the catalog schema version to ensure the older SkyPilot version works, by keeping the original catalog version to have no H100:1 instances.

This should go in after: skypilot-org/skypilot-catalog#105

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Comment on lines +295 to +297
instance_types = _ACC_INSTANCE_TYPE_DICTS[acc_name].get(acc_count, None)
if instance_types is None:
return None, []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need catalog schema bump if we use .get(acc_count, None)? won't it work with v5?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since this changes will not affect the old SkyPilot version, we have to bump the catalog version to ensure old skypilot release can work. We add this get for future proof. : )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nvm, I think I understand - we still need the catalog version bump for back compat.

@Michaelvll Michaelvll merged commit 3466469 into master Dec 13, 2024
19 checks passed
@Michaelvll Michaelvll deleted the catalog-backward-compatibility branch December 13, 2024 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants