Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

Closed
romilbhardwaj opened this issue Jul 21, 2023 · 2 comments · Fixed by #2500
Closed

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

romilbhardwaj opened this issue Jul 21, 2023 · 2 comments · Fixed by #2500
Assignees
Milestone

Comments

@romilbhardwaj
Copy link
Collaborator

romilbhardwaj commented Jul 21, 2023

Background

Kubernetes automatically populates containers with environment variables for discovering services running in the cluster. See documentation.

These look like:

# My custom service:
SKY_9DA1_ROMILB_RAY_HEAD_SSH_PORT_22_TCP_ADDR=10.96.70.90
SKY_9DA1_ROMILB_RAY_HEAD_SERVICE_HOST=10.96.67.126
SKY_9DA1_ROMILB_RAY_HEAD_PORT_10001_TCP_PORT=10001

# Service to connect to Kubernetes API server:
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_SERVICE_PORT=443
....

These variables can also be seen when you kubectl exec into the pod. Applications running inside the pod use these environment variables to get the IP address and ports of services they need to connect to.

Problem

In SkyPilot, these environment variables do not show up when you run a task (e.g., sky launch -- printenv) or when you ssh into the cluster.

This may be problematic for users trying to run SkyPilot tasks that connect to other non-SkyPilot services running in the Kubernetes cluster.

Our code also runs into this issue when we try to call load_incluster_config() to setup kubernetes auth, since it uses the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT variables.

Note that this is likely not going to be a problem for multi-node support, since we will take care of populating the SKYPILOT_NODE_IPS environment variables, which can then be directly used by users.

Workaround

For now, we can ask users to use the DNS discovery mechanism instead of envvars. This is also how we workaround for making load_incluster_config work.

@hemildesai
Copy link
Contributor

I can help with this.

@romilbhardwaj
Copy link
Collaborator Author

Thanks @hemildesai for getting #2347 merged! This was particularly important for GPU support, since GKE sets cuda envvars through kubernetes. We can now access these envvars in the setup and run sections of our YAML.

Can we also extend this to support ssh? For example, if a user runs ssh <cluster-name>, can we make the same envvars available there? This would be super useful for folks running GPU jobs on Kubernetes and wanting to debug them through ssh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants