Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Relax cluster name restrictions and process cluster names #2743

Closed

Conversation

dtran24
Copy link
Contributor

@dtran24 dtran24 commented Oct 29, 2023

Relax cluster name restrictions and process cluster names

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@concretevitamin
Copy link
Member

concretevitamin commented Oct 31, 2023

Hello @dtran24! Thanks for taking a stab at this.

A few things we need to take care of:

cluster_name_on_cloud = common_utils.make_cluster_name_on_cloud(
cluster_name, max_length=cloud.max_cluster_name_length())

def make_cluster_name_on_cloud(cluster_name: str,

Apologies that #2709 was under-specified. From the above we need to design a more involved solution.


Update: some example errors from GCP and Azure if _ is used:

GCP

I 11-01 21:48:29 cloud_vm_ray_backend.py:759] Got googleapiclient.errors.HttpError: <HttpError 400 when requesting https://compute.googleapis.com/compute/v1/projects/XXXXXXzones/us-central1-f/instances?alt=json returned "Invalid value for field 'resource.name': 'ray-__-8a39-head-7b5f5104-compute'. Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)'". Details: "[{'message': "Invalid value for field 'resource.name': 'ray-__-8a39-head-7b5f5104-compute'. Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)'", 'domain': 'global', 'reason': 'invalid'}]">

Azure

Exception Details:      (BadRequest) {
          "error": {
            "code": "InvalidParameter",
            "message": "The entity name 'resourceGroupName' is invalid according to its validation rule: ^[^_\\W][\\w-._()]{0,89}(?<![-.])$.",
            "target": "resourceGroupName"
          }
        }
        Code: BadRequest
        Message: {
          "error": {
            "code": "InvalidParameter",
            "message": "The entity name 'resourceGroupName' is invalid according to its validation rule: ^[^_\\W][\\w-._()]{0,89}(?<![-.])$.",
            "target": "resourceGroupName"
          }
        }

@dtran24
Copy link
Contributor Author

dtran24 commented Nov 2, 2023

I see. So one approach could be: if _ is used, we'd want to create a cloud cluster name that doesn't use _ (at least if it doesn't work on the given cloud provider), but would still use _ as the local cluster name

I'll look into setting up GCP and Azure to better reproduce these errors

@dtran24
Copy link
Contributor Author

dtran24 commented Nov 3, 2023

@concretevitamin Let me know if I'm on the right track here, and I can start proposing possible solutions!

If I should be more patient with feedback, please don't hesitate to share. Don't want to be too overbearing :)

@concretevitamin
Copy link
Member

concretevitamin commented Nov 3, 2023

Moving the discussion to the issue: #2709 (comment)

@dtran24 dtran24 changed the title Remove Cluster Name Validation Relax cluster name restrictions and process local cluster names Nov 15, 2023
@dtran24
Copy link
Contributor Author

dtran24 commented Nov 15, 2023

Hey @concretevitamin, could this get another review when you have a free cycle?

As per #2709 (comment),

  • I pass the processed cluster name to the generic cluster name regex. I didn't feed it to cloud-specific checks since I didn't see any, but definitely could've missed it. I also moved the method for performing the generic cluster name check since calling it in the new context led to a circular import, and after removing its uses in the Cloud class, the check_cluster_name_is_valid didn't seem necessary to be in that class anymore
  • This PR maps -c LoRA, -c lora, and -c lo.ra all to the same internal name and treats these arguments as referring to the same cluster. I decided to go with this approach first since treating them differently would further complicate the make_cluster_name_on_cloud method (code). Am happy to treat the args as referring to different clusters if the community finds it useful, though it may be good to limit the scope for this PR
    --

I tried running some of the smoke tests in my container, but the container seems to just stall out. I did manually test that these changes work for AWS and GCP.

Edit: Did some further testing with AWS and am seeing multiple clusters in INIT status. Will look further into this Just had some leftover clusters from the smoke tests in the container. Should be fine for review!

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome PR @dtran24! Left a comment for the internal cluster name representation.

sky/utils/common_utils__tests.py Outdated Show resolved Hide resolved
sky/utils/common_utils.py Outdated Show resolved Hide resolved
@dtran24 dtran24 requested a review from Michaelvll December 2, 2023 21:08
@dtran24 dtran24 changed the title Relax cluster name restrictions and process local cluster names Relax cluster name restrictions and process cluster names Dec 2, 2023
@concretevitamin
Copy link
Member

We should pick this back up at some point, as users frequently complain about _ and . in cluster names being invalid.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dtran24! This is a very important feature to add. The main concern I have is what the cluster_name displayed should look like in our sky status. There are two options:

  1. Adjust the cluster name and show the adjusted name in all our outputs
  2. Only adjust the cluster name when launching and querying the cloud, and keep the all of our outputs align with what users specified.

We can bring this to the discussion.

sky/utils/common_utils.py Show resolved Hide resolved
sky/utils/common_utils.py Show resolved Hide resolved
tests/test_common_utils.py Outdated Show resolved Hide resolved
sky/backends/backend.py Outdated Show resolved Hide resolved
sky/backends/backend.py Show resolved Hide resolved
@dtran24 dtran24 changed the title Relax cluster name restrictions and process cluster names [Core] Relax cluster name restrictions and process cluster names Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS cluster name restrictions should've been relaxed
3 participants