You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kubernetes automatically populates containers with environment variables for discovering services running in the cluster. See documentation.
These look like:
# My custom service:
SKY_9DA1_ROMILB_RAY_HEAD_SSH_PORT_22_TCP_ADDR=10.96.70.90
SKY_9DA1_ROMILB_RAY_HEAD_SERVICE_HOST=10.96.67.126
SKY_9DA1_ROMILB_RAY_HEAD_PORT_10001_TCP_PORT=10001
# Service to connect to Kubernetes API server:
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_SERVICE_PORT=443
....
These variables can also be seen when you kubectl exec into the pod. Applications running inside the pod use these environment variables to get the IP address and ports of services they need to connect to.
Problem
In SkyPilot, these environment variables do not show up when you run a task (e.g., sky launch -- printenv) or when you ssh into the cluster.
This may be problematic for users trying to run SkyPilot tasks that connect to other non-SkyPilot services running in the Kubernetes cluster.
Our code also runs into this issue when we try to call load_incluster_config() to setup kubernetes auth, since it uses the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT variables.
Note that this is likely not going to be a problem for multi-node support, since we will take care of populating the SKYPILOT_NODE_IPS environment variables, which can then be directly used by users.
Workaround
For now, we can ask users to use the DNS discovery mechanism instead of envvars. This is also how we workaround for making load_incluster_config work.
The text was updated successfully, but these errors were encountered:
Thanks @hemildesai for getting #2347 merged! This was particularly important for GPU support, since GKE sets cuda envvars through kubernetes. We can now access these envvars in the setup and run sections of our YAML.
Can we also extend this to support ssh? For example, if a user runs ssh <cluster-name>, can we make the same envvars available there? This would be super useful for folks running GPU jobs on Kubernetes and wanting to debug them through ssh.
Background
Kubernetes automatically populates containers with environment variables for discovering services running in the cluster. See documentation.
These look like:
These variables can also be seen when you
kubectl exec
into the pod. Applications running inside the pod use these environment variables to get the IP address and ports of services they need to connect to.Problem
In SkyPilot, these environment variables do not show up when you run a task (e.g.,
sky launch -- printenv
) or when you ssh into the cluster.This may be problematic for users trying to run SkyPilot tasks that connect to other non-SkyPilot services running in the Kubernetes cluster.
Our code also runs into this issue when we try to call
load_incluster_config()
to setup kubernetes auth, since it uses theKUBERNETES_SERVICE_HOST
andKUBERNETES_SERVICE_PORT
variables.Note that this is likely not going to be a problem for multi-node support, since we will take care of populating the
SKYPILOT_NODE_IPS
environment variables, which can then be directly used by users.Workaround
For now, we can ask users to use the DNS discovery mechanism instead of envvars. This is also how we workaround for making
load_incluster_config
work.The text was updated successfully, but these errors were encountered: