Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws] cache user identity by 'aws configure list' #4507

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aylei
Copy link

@aylei aylei commented Dec 25, 2024

Cache aws sts get-caller-identity result, indexed by hashing the output of aws configure list. This will speed up sky launch by ~1s in subsequent runs.

original discussion: #3159 (comment)

A simple benchmark (enabled clouds: [aws, gcp, azure, kubernetes, runpod, cudo, paperspace], the code is slightly modified to return before the confirmation is prompted):

# master, avg: 2.558s 
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       25,58 real        16,57 user        25,29 sys

# PR branch, avg: 1.596s
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       15,96 real        16,84 user        24,06 sys

# PR branch with https://github.com/skypilot-org/skypilot/pull/4483 picked, avg: 1.321s
$ /usr/bin/time bash -c 'for i in {1..10}; do SKYPILOT_DISABLE_USAGE_COLLECTION=1 python -m sky.cli launch --gpus H100:8> /dev/null; done;'
       13,21 real        16,28 user        11,18 sys

@Michaelvll Please kindly take a look

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (see above benchmark)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aylei! This PR looks good to me!

@@ -773,6 +784,38 @@ def get_user_identities(cls) -> Optional[List[List[str]]]:
# automatic switching for AWS. Currently we only support one identity.
return [user_ids]

@classmethod
@functools.lru_cache(maxsize=1) # Cache since getting identity is slow.
def get_user_identities(cls) -> Optional[List[List[str]]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a docstr for this function for the behavior of caching and returning identity. It might also be good to move the docstr from _sts_get_caller_identity to here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! just moved the docstring here and added the description of caching behavior

@Michaelvll
Copy link
Collaborator

I assume the time tests listed in the PR description has the launch to return immediately after the optimizer has been run, i.e. not going to the confirmation line.

@aylei
Copy link
Author

aylei commented Jan 2, 2025

I assume the time tests listed in the PR description has the launch to return immediately after the optimizer has been run, i.e. not going to the confirmation line.

Yes, I updated the PR description to make the result clearer~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants