[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

romilbhardwaj · 2023-07-21T21:29:09Z

Background

Kubernetes automatically populates containers with environment variables for discovering services running in the cluster. See documentation.

These look like:

# My custom service:
SKY_9DA1_ROMILB_RAY_HEAD_SSH_PORT_22_TCP_ADDR=10.96.70.90
SKY_9DA1_ROMILB_RAY_HEAD_SERVICE_HOST=10.96.67.126
SKY_9DA1_ROMILB_RAY_HEAD_PORT_10001_TCP_PORT=10001

# Service to connect to Kubernetes API server:
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_SERVICE_PORT=443
....

These variables can also be seen when you kubectl exec into the pod. Applications running inside the pod use these environment variables to get the IP address and ports of services they need to connect to.

Problem

In SkyPilot, these environment variables do not show up when you run a task (e.g., sky launch -- printenv) or when you ssh into the cluster.

This may be problematic for users trying to run SkyPilot tasks that connect to other non-SkyPilot services running in the Kubernetes cluster.

Our code also runs into this issue when we try to call load_incluster_config() to setup kubernetes auth, since it uses the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT variables.

Note that this is likely not going to be a problem for multi-node support, since we will take care of populating the SKYPILOT_NODE_IPS environment variables, which can then be directly used by users.

Workaround

For now, we can ask users to use the DNS discovery mechanism instead of envvars. This is also how we workaround for making load_incluster_config work.

The text was updated successfully, but these errors were encountered:

hemildesai · 2023-07-30T01:05:00Z

I can help with this.

romilbhardwaj · 2023-08-07T05:45:37Z

Thanks @hemildesai for getting #2347 merged! This was particularly important for GPU support, since GKE sets cuda envvars through kubernetes. We can now access these envvars in the setup and run sections of our YAML.

Can we also extend this to support ssh? For example, if a user runs ssh <cluster-name>, can we make the same envvars available there? This would be super useful for folks running GPU jobs on Kubernetes and wanting to debug them through ssh.

romilbhardwaj assigned hemildesai Jul 30, 2023

This was referenced Aug 1, 2023

Query cloud specific env vars in task setup #2334

Closed

Query cloud specific env vars in task setup #2347

Merged

romilbhardwaj added this to the k8s milestone Aug 16, 2023

This was referenced Aug 18, 2023

[k8s] GPU Support and fractional resources for Kubernetes #2328

Merged

[k8s] Zero config networking for Kubernetes #2435

Closed

[k8s] CUDA envvars don't work in ssh #2453

Closed

romilbhardwaj mentioned this issue Aug 31, 2023

[K8s] Zero config networking for Kubernetes #2500

Merged

3 tasks

romilbhardwaj modified the milestones: k8s, 0.4 Sep 11, 2023

romilbhardwaj linked a pull request Sep 11, 2023 that will close this issue

[K8s] Zero config networking for Kubernetes #2500

Merged

3 tasks

romilbhardwaj closed this as completed in #2500 Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

romilbhardwaj commented Jul 21, 2023 •

edited

Loading

hemildesai commented Jul 30, 2023

romilbhardwaj commented Aug 7, 2023

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

[k8s] Kubernetes environment variables don't show up in SkyPilot tasks #2287

Comments

romilbhardwaj commented Jul 21, 2023 • edited Loading

Background

Problem

Workaround

hemildesai commented Jul 30, 2023

romilbhardwaj commented Aug 7, 2023

romilbhardwaj commented Jul 21, 2023 •

edited

Loading