-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] make per-cloud catalog lookup parallel #4483
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aylei! Adding the parallelization for all cloud methods makes sense to me. Left some minor comments mainly for interfaces.
Investigating AWS and kubernetes latency should be a great next step. IIRC, trying to fetch the mapping for AWS zone names can cause a large latency even when we don't have AWS enabled.
@Michaelvll Thanks for your suggestions❤️. I reproduced the AWS zone name issue by cleaning catalog cache, excluding
My idea is to apply the same semantics to other commands like launch, wdyt? @Michaelvll As of normal cases, the zone mapping is usually cached with AWS userID as the cache key (
get-caller-identity in daily usage, with the assumption that the AWS account used rarely changes.
Therefore, for the mainstream scenarios, I currently think there might be no optimization method for the |
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
@Michaelvll I've addressed the comments and added sub thread status support. Please kindly take a look. I will move further work on #3159 to separate PRs so each of them can be focused and coherent. Current look when sub thread status is involved: PS: fetch zone mapping is slow indeed on cache miss😂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aylei! It looks good to me. Just some minor nits. Let's proceed with the optimization of the second step. : )
feasible_list = subprocess_utils.run_in_parallel( | ||
lambda cloud, r=resources, n=task.num_nodes: | ||
(cloud, cloud.get_feasible_launchable_resources(r, n)), | ||
clouds_list) | ||
for cloud, feasible_resources in feasible_list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nit: the run_in_parallel
keeps the original order of the arguments, so no need to have cloud
in the output, but zip(cloud_list, feasible_list)
should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, with this new knowledge, I personally still tend to loosen the behavior dependency on the run_in_parallel
function for maintainability, wdyt?
sky/utils/rich_utils.py
Outdated
@@ -1,16 +1,21 @@ | |||
"""Rich status spinner utils.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! However, since this can affect many other part of the code, let's move this to another PR, and keep this PR clean for only having the parallelism changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but since status of sub thread is silent on master, there would be a UX degradation in between, is that okay?
sky/utils/subprocess_utils.py
Outdated
@@ -113,6 +114,12 @@ def run_in_parallel(func: Callable, | |||
A list of the return values of the function func, in the same order as the | |||
arguments. | |||
""" | |||
if isinstance(args, collections.abc.Sized): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about we change the input args to be List[Any]
and change the invocations to use lists to avoid this collections.abc.Sized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will try this~
Signed-off-by: Aylei <[email protected]>
Signed-off-by: Aylei <[email protected]>
@Michaelvll ready for the next review |
Signed-off-by: Aylei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aylei! LGTM. Could you share a bit of how the UX looks like without the thread-based handling for spinner?
return method(*args, **kwargs) | ||
|
||
results = subprocess_utils.run_in_parallel(_execute_catalog_method, | ||
list(clouds), len(clouds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: use kwargs for non-obvious argument meaning : )
return [] | ||
# Short-circuit for single element | ||
if len(args) == 1: | ||
return [func(next(iter(args)))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [func(next(iter(args)))] | |
return [func(args[0])] |
Part of #3159
This is a draft PR with a known issue to be solved.
The test is done locally with the duration of launch to confirm timed (aylei@e662df0). On my laptop, this PR improves the performance in 4 test cases with 7 clouds enabled compared to main (f0ebf13):
sky launch --gpus H100:8
, from 2.38s to 2.06srm -rf ~/.sky/catalogs/** && sky launch --gpus H100:8
(cold start), from 12.15s to 6.29ssky launch test.yaml
, ordered resources with non-canonical names, from 5.75s to 5.07srm -rf ~/.sky/catalogs/** && sky launch test.yaml
, same as above, but cold start, from 15.75s to 10.17stest.yaml:
Cold-start and test.yaml are chosen deliberately. Because I found that for the case of "--gpu H100:8", the improvement was less than I expected (I expected a ~1s improvement according to former analysis).
The reason is GIL. Through profiling (source: gil.svg), I found that each cloud spends 10~100+ ms on init, which can not be parallelized due to GIL:
Actually in my setup only K8S and AWS would file network IO, where K8S ListNode costs ~0.5s and AWS get-caller-identity costs ~1s. So I tested the cold-start case which shows in scenarios with more I/O calls, the performance optimization of concurrency will be more significant.
For
test.yaml
, I deliberately included non-canonical GPU names to cover the non short-circuit path, where the IO calls will be advanced toget_accelerators
.I thought carefully about whether this PR is a premature optimization and my final thought is that processing each cloud concurrently in multi-cloud scenario is necessary for scalability as skypilot is a multi-cloud solution at its core(IIUC). This is why I did not parallelize the single
get_accelerators
function but parallelize all catalog lookup calls in_map_clouds_catalog
. @Michaelvll What do you think?Note, since sub thread status is silent yet, sky launch will lose catalog progress after this PR merged. Follow up: #4491
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh