Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K8s] Zero config networking for Kubernetes #2500

Merged
merged 211 commits into from
Sep 16, 2023
Merged

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Aug 31, 2023

This PR introduces new networking features for our Kubernetes support. In particular, we no longer need opening many ports on the Kubernetes cluster nodes. Now we support two modes of operation:

  1. portforward: Open no ports, and we use kubectl port-forward under the hood to reach the pods. This requires zero configuration on the user's end, and is only marginally worse (~10%) in performance (see benchmarks). Given the significantly better UX, this will the default mode of operation.
  2. nodeport: Open 1 port, and we run a ssh jump pod on that port to reach other pods. This requires opening one port on any one node in the Kubernetes cluster, and offers the highest performance while minimizing the number of open ports needed.

Users who don't want to use portforward can switch to nodeport by modifying their ~/.sky/config file:

kubernetes:
  networking: nodeport

Note that we currently create one jump pod per user. Eventually, we want to share the jump pod across many users (See #2499)


This PR also has other bug fixes, including populating k8s envvars when the user runs SSH (#2287 and #2453 will also be closed by this PR).

Thanks to @landscapepainter, @aviweit and @hemildesai for their contributions.

Tested (run the relevant ones):

@romilbhardwaj
Copy link
Collaborator Author

Blocked on #2556. This will likely need minor changes after it is merged. Rest can still be reviewed.

@romilbhardwaj romilbhardwaj mentioned this pull request Sep 14, 2023
1 task
@romilbhardwaj romilbhardwaj removed the blocked PR blocked by other issues label Sep 15, 2023
landscapepainter and others added 4 commits September 14, 2023 19:30
* surface provision failure message

* nit

* nit

* format

* nit

* CPU message fix

* update Insufficient memory handling

* nit

* nit

* Update sky/skylet/providers/kubernetes/node_provider.py

Co-authored-by: Romil Bhardwaj <[email protected]>

* Update sky/skylet/providers/kubernetes/node_provider.py

Co-authored-by: Romil Bhardwaj <[email protected]>

* Update sky/skylet/providers/kubernetes/node_provider.py

Co-authored-by: Romil Bhardwaj <[email protected]>

* Update sky/skylet/providers/kubernetes/node_provider.py

Co-authored-by: Romil Bhardwaj <[email protected]>

* format

* update gpu failure message and condition

* fix GPU handling cases

* fix

* comment

* nit

* add try except block with general error handling

---------

Co-authored-by: Romil Bhardwaj <[email protected]>
…roconf_networking

# Conflicts:
#	sky/clouds/kubernetes.py
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @romilbhardwaj @landscapepainter @aviweit and @hemildesai ! Just tested with a newly launched GKE cluster (1 t4, 2 n2-highmem-8) without any network configuration.
Tried the following commands and it works like magic:

sky launch -c test-k8s --memory 60+ echo hi
sky launch -c test-k8s-2 --memory 60+ echo hi
sky launch -c test-k8s-3 --gpus t4 nvidia-smi
ssh test-k8s-3; nvidia-smi

The code looks mostly good to me. One question I have is whether we would like to preserve the old NodePort way, as it seems we have removed some NodePort related code, not sure if it will still work. Also, for code simplicity, it would be nice if we can remove the old mode, if there is no strong need for it. ; )

sky/authentication.py Outdated Show resolved Hide resolved
Comment on lines -2328 to -2337
svc_name = f'{self.cluster_name_on_cloud}-ray-head-ssh'
retry_cnt = 0
while True:
try:
head_ssh_port = clouds.Kubernetes.get_port(svc_name)
break
except Exception: # pylint: disable=broad-except
retry_cnt += 1
if retry_cnt >= max_attempts:
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removing this mean the NodePort mode will not work?

Copy link
Collaborator Author

@romilbhardwaj romilbhardwaj Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, NodePort would still work - it's just that now everything goes through a SSH Jump Pod, so the SSH port remains fixed at 22 and we don't need to get port here. Note that the jump port is dynamic and is fetched in kubernetes_utils.get_ssh_proxy_command at provisioning time.

sky/skylet/providers/kubernetes/node_provider.py Outdated Show resolved Hide resolved
sky/templates/kubernetes-ray.yml.j2 Outdated Show resolved Hide resolved
sky/utils/command_runner.py Outdated Show resolved Hide resolved
sky/utils/kubernetes/sshjump_lcm.py Outdated Show resolved Hide resolved
sky/utils/kubernetes/sshjump_lcm.py Outdated Show resolved Hide resolved
sky/utils/kubernetes_utils.py Show resolved Hide resolved
sky/utils/kubernetes_utils.py Outdated Show resolved Hide resolved
sky/utils/kubernetes_utils.py Outdated Show resolved Hide resolved
@romilbhardwaj
Copy link
Collaborator Author

romilbhardwaj commented Sep 15, 2023

Thanks for the reviews @Michaelvll! This is ready for another look.

Running smoke tests on GKE now:

  • pytest tests/test_smoke.py --kubernetes -k "not TestStorageWithCredentials" with default port-forward mode
  • pytest tests/test_smoke.py --kubernetes -k "not TestStorageWithCredentials" with nodeport mode set in ~/.sky/config
  • Tested jump pod lifecycle management by making sure ssh jump pod terminates after 10 min of no SkyPilot pods running in the cluster.

One question I have is whether we would like to preserve the old NodePort way, as it seems we have removed some NodePort related code, not sure if it will still work. Also, for code simplicity, it would be nice if we can remove the old mode, if there is no strong need for it. ; )

That's a good point. The NodePort method is preserved for now since the port-forward mode might be considered as a hack by some (since it relies on tunneling over the API server, and that tunnel is designed only for development work). I was thinking we could collect feedback from users and deprecate it in the future if port-forward works fine. In the meanwhile, we have an easy way to switch between methods if port-forward doesn't work for them. We are also not documenting the NodePort ability for now to make sure users do not use it, unless they really need to.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @romilbhardwaj! The code looks pretty good to me.

@romilbhardwaj
Copy link
Collaborator Author

Thanks for the fast reviews @Michaelvll! Waiting on nodeport smoke tests to pass, will merge after that.

…roconf_networking

# Conflicts:
#	tests/kubernetes/README.md
@romilbhardwaj romilbhardwaj merged commit f0d3dfc into master Sep 16, 2023
18 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_zeroconf_networking branch September 16, 2023 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[k8s] CUDA envvars don't work in ssh [k8s] Kubernetes environment variables don't show up in SkyPilot tasks
5 participants