Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] warning before launching jobs/serve when using a reauth required credentials #4479

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

weih1121
Copy link
Contributor

@weih1121 weih1121 commented Dec 18, 2024

issue link: #4433

Test Case:
Test for ENV with leakage warning:

~/skypilot/sky on dev/hong/controller wip >1 !1 > aws configure list                               took 19s py sky at 15:50:01
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************F5MQ              env
secret_key     ****************TWEG              env
    region                <not set>             None    None
截屏2024-12-18 15 46 01

Test for ENV without leakage warning:
截屏2024-12-18 17 07 16

Test GCP SHARED_CREDENTIALS_FILE
bash

-> gcloud auth describe [email protected]

expired: false
expiry: 12-19-2024 06:37:31

Launch jobs with warning

~/skypilot/sky on dev/hong/controller wip > python cli.py jobs launch ~/hello-sky/hello_sky.yaml --use-spot

Task from YAML spec: /Users/hong/hello-sky/hello_sky.yaml
Managed job 'sky-8ace-hong' will be launched on (estimated):
Considered resources (1 node):
--------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE             vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE    COST ($)   CHOSEN
--------------------------------------------------------------------------------------------------
 GCP     n1-highmem-8[Spot]   8       52        V100:1         asia-east1-c   0.83          ✔
--------------------------------------------------------------------------------------------------
Launching a managed job 'sky-8ace-hong'. Proceed? [Y/n]: y
Expiring credentials detected for [GCP].Clusters may be leaked if the credentials expire while jobs are running. It is recommended to use credentials that never expire or a service account.
⚙︎ Translating workdir to SkyPilot Storage...
  Workdir: '.' -> storage: 'skypilot-workdir-hong-820545aa'.
  Created GCS bucket 'skypilot-workdir-hong-820545aa' in ASIA-EAST1 with storage class STANDARD
  Excluded files to sync to cluster based on .gitignore.

TEST GCP service account
active service account

~/skypilot/sky on dev/hong/controller wip > gcloud auth list             py sky gcloud sky-dev at 14:34:23
                   Credentialed Accounts
ACTIVE  ACCOUNT
*       [email protected]
        [email protected]

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Launch job with service account without warning

~/skypilot/sky on dev/hong/controller wip > python cli.py jobs launch ~/hello-sky/hello_sky.yaml --use-spot

Task from YAML spec: /Users/hong/hello-sky/hello_sky.yaml
Managed job 'sky-f7b8-hong' will be launched on (estimated):
Considered resources (1 node):
--------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE             vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE    COST ($)   CHOSEN
--------------------------------------------------------------------------------------------------
 GCP     n1-highmem-8[Spot]   8       52        V100:1         asia-east1-c   0.83          ✔
--------------------------------------------------------------------------------------------------
Launching a managed job 'sky-f7b8-hong'. Proceed? [Y/n]: y
⚙︎ Translating workdir to SkyPilot Storage...

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@weih1121 weih1121 marked this pull request as draft December 18, 2024 05:40
@weih1121 weih1121 changed the title [UX] warning before launching jobs/serve when using a auth required credentials [UX] warning before launching jobs/serve when using a reauth required credentials Dec 18, 2024
sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
def can_credential_expire(self) -> bool:
"""Check if the AWS identity type can expire."""
expirable_types = {
AWSIdentityType.SSO, AWSIdentityType.ENV, AWSIdentityType.IAM_ROLE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are IAM_ROLE and ENV guaranteed to be expirable? Is there some quick command we can run to check if credentials are expiring instead?

Copy link
Contributor Author

@weih1121 weih1121 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordig to https://d-926790655e.awsapps.com/start/#/?tab=accounts
IAM_ROLE has a longer validity period after configed by aws confirure sso.

@weih1121 weih1121 marked this pull request as ready for review December 18, 2024 06:40
Comment on lines 112 to 115
expirable_types = {
AWSIdentityType.SSO, AWSIdentityType.IAM_ROLE,
AWSIdentityType.CONTAINER_ROLE
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure these credential types are expiring? We may want to use a CLI command to check expiry of credentials. e.g., claude tells me to use sts get-session-token --query 'Credentials.Expiration', but we should double check if these actually work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for gcloud CLI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we may want to trigger the CLI check only if IdentityType is one of AWSIdentityType.SSO, AWSIdentityType.IAM_ROLE, AWSIdentityType.CONTAINER_ROLE. We may also want to use it with functools.lru_cache().

This is because CLI check can be expensive and we don't want to slow down sky jobs launch.

Copy link
Contributor Author

@weih1121 weih1121 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude result is not correct

~/skypilot on dev/hong/controller wip > aws sts get-session-token --query 'Credentials.Expiration'           py sky at 16:28:58

An error occurred (AccessDenied) when calling the GetSessionToken operation: Cannot call GetSessionToken with session credentials

we can find session infos from ~/.aws/sso/cache/*json

~/skypilot on dev/hong/controller wip > cat ~/.aws/sso/cache/                                            254 py sky at 16:29:05
7505d64a54e061b7acd54ccd58b49dc43500b635.json  d45209530cbfb256d65dc516ece32996c9054fed.json

Each sso login will generate two json files, one is aws_sso_credentials.json the other is sso_cache.json
in any of the file we can find "expiresAt": "2024-12-19T08:22:34Z",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About get-session-token refer to https://docs.aws.amazon.com/cli/latest/reference/sts/get-session-token.html:
Returns a set of temporary credentials for an Amazon Web Services account or IAM user. The credentials consist of an access key ID, a secret access key, and a security token. Typically, you use GetSessionToken if you want to use MFA to protect programmatic calls to specific Amazon Web Services API operations like Amazon EC2 StopInstances .

MFA-enabled IAM users must call GetSessionToken and submit an MFA code that is associated with their MFA device.

@@ -536,6 +536,10 @@ def get_credential_file_mounts(self) -> Dict[str, str]:
"""
raise NotImplementedError

def can_credential_expire(self) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
def can_credential_expire(self) -> bool:
def can_credentials_expire(self) -> bool:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks the active credential(only one), the original make sense I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants