[k8s] Remove SSH jump pod for port-forward mode #3657

romilbhardwaj · 2024-06-11T20:43:04Z

Closes #3566. SSH jump pod is not required when using port-forward mode. This PR directly kubectl port-forwards to the head pod.

Also removes the sleep in our proxycommand. This was previously required for thread-safe concurrent SSH connections when SkyPilot was using SSHCommandRunner for Kubernetes (#2628), but with #3157, SSH is no longer used. This improves SSH connection latency significantly (~2 seconds). Up to 5 concurrent SSH connection requests work fine without the sleep, which should be enough for most usage of SSH outside of SkyPilot.

Also lays the groundwork for easy switching between kubecontexts/kubeconfigs while retaining SSH functionality (requested by user).

Benchmarks

======= This branch - After removing SSH Jump pod and sleep =======

1: multitime -n 5 ssh test ls
            Mean        Std.Dev.    Min         Median      Max
real        1.801       0.113       1.732       1.751       2.027
user        0.021       0.002       0.019       0.021       0.024
sys         0.008       0.001       0.007       0.008       0.010

======= Master - with SSH Jump pod =======

1: multitime -n 5 ssh test ls
            Mean        Std.Dev.    Min         Median      Max
real        3.466       0.123       3.278       3.500       3.605
user        0.024       0.004       0.019       0.022       0.029
sys         0.008       0.002       0.006       0.007       0.010

Tested (run the relevant ones):

Code formatting: bash format.sh
Manual tests: sky launch -c test --num-nodes 2 --cloud kubernetes, followed by ssh test ls and ssh test-worker1 ls
Backward compatibility tests
Kubernetes smoke tests

…o k8s_sshjump_remove

Michaelvll

This is awesome @romilbhardwaj! This should improve the robustness of our Kubernetes support. The code looks mostly good to me.
Can we test the backward compatibility for existing clusters launched in master?

sky/provision/kubernetes/network_utils.py

romilbhardwaj · 2024-06-19T23:02:19Z

Thanks @Michaelvll! Ran manual backward compatibility tests by launching from master -> switching to this branch -> try ssh on master cluster -> launch new cluster -> verify ssh, sky exec for new and old cluster.

Running smoke tests.

…o k8s_sshjump_remove_v2

romilbhardwaj · 2024-06-20T05:26:47Z

Ran into an issue with custom images which use a different default username than sky. For such images, the ssh proxy command fails since authentication.py is hardcoded to use [email protected] as the user@ip for jumping. This is slightly tricky since the proxy command is populated before the pod is even started. Looking into a solution for this...

romilbhardwaj · 2024-06-21T01:41:13Z

Fixed the custom image support by dynamically updating ProxyCommand once the ssh_user is determined and updated in the cluster handle.

Running smoke tests.

Kubernetes smoke tests

Michaelvll

Thanks for updating the PR @romilbhardwaj! LGTM with a minor comment.

sky/authentication.py

Michaelvll · 2024-06-21T23:19:55Z

sky/backends/cloud_vm_ray_backend.py

+            auth_config = backend_utils.ssh_credential_from_yaml(
+                handle.cluster_yaml,
+                ssh_user=handle.ssh_user,
+                docker_user=handle.docker_user)


Could we test this for other clouds with image_id specified with docker:xxx, just to make sure changing this will not affect ssh for those?

Or, if we have passed in the ssh_user and docker_user here, should we remove the argument of handle.docker_user and handle.ssh_user in the add_cluster function below?

Good point - tested with pytest tests/test_smoke.py::test_job_queue_with_docker --gcp.

romilbhardwaj · 2024-06-29T17:35:27Z

Thanks! Tested:

pytest tests/test_smoke.py::test_job_queue_with_docker --gcp
pytest tests/test_smoke.py --kubernetes

* working prototype of direct-to-pod port-forwarding * lint * switch to using head as jump * removed ssh jump pod * remove sleep * update note * comments * remove vestiges * updates * remove slash * add ssh_user placeholder * fix private key * lint

romilbhardwaj added 11 commits May 21, 2024 16:44

working prototype of direct-to-pod port-forwarding

98dc040

lint

f74517e

Merge branch 'master' of https://github.com/skypilot-org/skypilot int…

0f69135

…o k8s_sshjump_remove

switch to using head as jump

c27cd2f

Merge branch 'master' of https://github.com/skypilot-org/skypilot int…

4361bae

…o k8s_sshjump_remove

removed ssh jump pod

4ca0aae

remove sleep

0fa4591

update note

5bb1f22

comments

4209e9d

remove vestiges

73a391f

updates

8453971

This was referenced Jun 11, 2024

[k8s] Remove SSH jump pod for port-forward mode #3577

Closed

[k8s] Add checks for shuf dependency #3009

Closed

Michaelvll reviewed Jun 18, 2024

View reviewed changes

sky/provision/kubernetes/network_utils.py Outdated Show resolved Hide resolved

remove slash

268a32b

Merge branch 'master' of https://github.com/skypilot-org/skypilot int…

632d41e

…o k8s_sshjump_remove_v2

add ssh_user placeholder

50515e5

Michaelvll approved these changes Jun 21, 2024

View reviewed changes

romilbhardwaj added this to the v0.6.1 milestone Jun 25, 2024

romilbhardwaj added 2 commits June 29, 2024 09:02

fix private key

f6af12c

lint

17b4519

romilbhardwaj mentioned this pull request Jun 30, 2024

[k8s] Show nicer errors if ssh jump pod fails #3261

Closed

romilbhardwaj merged commit 7633d2e into master Jun 30, 2024
20 checks passed

romilbhardwaj deleted the k8s_sshjump_remove_v2 branch June 30, 2024 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s] Remove SSH jump pod for port-forward mode #3657

[k8s] Remove SSH jump pod for port-forward mode #3657

romilbhardwaj commented Jun 11, 2024 •

edited

Loading

Michaelvll left a comment •

edited

Loading

romilbhardwaj commented Jun 19, 2024 •

edited

Loading

romilbhardwaj commented Jun 20, 2024

romilbhardwaj commented Jun 21, 2024 •

edited

Loading

Michaelvll left a comment

Michaelvll Jun 21, 2024

romilbhardwaj Jun 29, 2024

romilbhardwaj commented Jun 29, 2024

[k8s] Remove SSH jump pod for port-forward mode #3657

[k8s] Remove SSH jump pod for port-forward mode #3657

Conversation

romilbhardwaj commented Jun 11, 2024 • edited Loading

Benchmarks

Michaelvll left a comment • edited Loading

Choose a reason for hiding this comment

romilbhardwaj commented Jun 19, 2024 • edited Loading

romilbhardwaj commented Jun 20, 2024

romilbhardwaj commented Jun 21, 2024 • edited Loading

Michaelvll left a comment

Choose a reason for hiding this comment

Michaelvll Jun 21, 2024

Choose a reason for hiding this comment

romilbhardwaj Jun 29, 2024

Choose a reason for hiding this comment

romilbhardwaj commented Jun 29, 2024

romilbhardwaj commented Jun 11, 2024 •

edited

Loading

Michaelvll left a comment •

edited

Loading

romilbhardwaj commented Jun 19, 2024 •

edited

Loading

romilbhardwaj commented Jun 21, 2024 •

edited

Loading