diff --git a/docs/source/cloud-setup/cloud-permissions/kubernetes.rst b/docs/source/cloud-setup/cloud-permissions/kubernetes.rst index 5318d76b1a3..34d29b49f0a 100644 --- a/docs/source/cloud-setup/cloud-permissions/kubernetes.rst +++ b/docs/source/cloud-setup/cloud-permissions/kubernetes.rst @@ -31,7 +31,7 @@ SkyPilot can operate using either of the following three authentication methods: remote_identity: SERVICE_ACCOUNT For details on the permissions that are granted to the service account, - refer to the `Permissions required for SkyPilot`_ section below. + refer to the `Minimum Permissions Required for SkyPilot`_ section below. 3. **Using a custom service account**: If you have a custom service account with the `necessary permissions `__, you can configure @@ -53,8 +53,8 @@ Below are the permissions required by SkyPilot and an example service account YA .. _k8s-permissions: -Permissions required for SkyPilot ---------------------------------- +Minimum Permissions Required for SkyPilot +----------------------------------------- SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster: @@ -62,12 +62,12 @@ SkyPilot requires permissions equivalent to the following roles to be able to ma # Namespaced role for the service account # Required for creating pods, services and other necessary resources in the namespace. - # Note these permissions only apply in the namespace where SkyPilot is deployed. + # Note these permissions only apply in the namespace where SkyPilot is deployed, and the namespace can be changed below. kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: - name: sky-sa-role - namespace: default + name: sky-sa-role # Can be changed if needed + namespace: default # Change to your namespace if using a different one. rules: - apiGroups: ["*"] resources: ["*"] @@ -77,49 +77,104 @@ SkyPilot requires permissions equivalent to the following roles to be able to ma kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: - name: sky-sa-cluster-role - namespace: default - labels: - parent: skypilot + name: sky-sa-cluster-role # Can be changed if needed + namespace: default # Change to your namespace if using a different one. + labels: + parent: skypilot rules: - - apiGroups: [""] - resources: ["nodes"] # Required for getting node resources. - verbs: ["get", "list", "watch"] - - apiGroups: ["rbac.authorization.k8s.io"] - resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod. - verbs: ["get", "list", "watch"] - - apiGroups: ["node.k8s.io"] - resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes. - verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["nodes"] # Required for getting node resources. + verbs: ["get", "list", "watch"] + - apiGroups: ["node.k8s.io"] + resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes. + verbs: ["get", "list", "watch"] + + +.. tip:: + + If you are using a different namespace than ``default``, make sure to change the namespace in the above manifests. + +These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured). + +If your tasks use object store mounting or require access to ingress resources, you will need to grant additional permissions as described below. + +Permissions for Object Store Mounting +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If your tasks use object store mounting (e.g., S3, GCS, etc.), SkyPilot will need to run a DaemonSet to expose the FUSE device as a Kubernetes resource to SkyPilot pods. + +To allow this, you will need to also create a ``skypilot-system`` namespace which will run the DaemonSet and grant the necessary permissions to the service account in that namespace. + + +.. code-block:: yaml + + # Required only if using object store mounting + # Create namespace for SkyPilot system + apiVersion: v1 + kind: Namespace + metadata: + name: skypilot-system # Do not change this + labels: + parent: skypilot --- - # Optional: If using ingresses, role for accessing ingress service IP + # Role for the skypilot-system namespace to create FUSE device manager and + # any other system components required by SkyPilot. + # This role must be bound in the skypilot-system namespace to the service account used for SkyPilot. + kind: Role + apiVersion: rbac.authorization.k8s.io/v1 + metadata: + name: skypilot-system-service-account-role # Can be changed if needed + namespace: skypilot-system # Do not change this namespace + labels: + parent: skypilot + rules: + - apiGroups: ["*"] + resources: ["*"] + verbs: ["*"] + + +Permissions for using Ingress +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If your tasks use :ref:`Ingress ` for exposing ports, you will need to grant the necessary permissions to the service account in the ``ingress-nginx`` namespace. + +.. code-block:: yaml + + # Required only if using ingresses + # Role for accessing ingress service IP apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: - namespace: ingress-nginx - name: sky-sa-role-ingress-nginx + namespace: ingress-nginx # Do not change this + name: sky-sa-role-ingress-nginx # Can be changed if needed rules: - - apiGroups: [""] - resources: ["services"] - verbs: ["list", "get"] + - apiGroups: [""] + resources: ["services"] + verbs: ["list", "get"] -These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured). .. _k8s-sa-example: Example using Custom Service Account ------------------------------------ -To create a service account that has the necessary permissions for SkyPilot, you can use the following YAML: +To create a service account that has all necessary permissions for SkyPilot (including for accessing object stores), you can use the following YAML. + +.. tip:: + + In this example, the service account is named ``sky-sa`` and is created in the ``default`` namespace. + Change the namespace and service account name as needed. + .. code-block:: yaml + :linenos: # create-sky-sa.yaml kind: ServiceAccount apiVersion: v1 metadata: - name: sky-sa - namespace: default + name: sky-sa # Change to your service account name + namespace: default # Change to your namespace if using a different one. labels: parent: skypilot --- @@ -127,8 +182,8 @@ To create a service account that has the necessary permissions for SkyPilot, you kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: - name: sky-sa-role - namespace: default + name: sky-sa-role # Can be changed if needed + namespace: default # Change to your namespace if using a different one. labels: parent: skypilot rules: @@ -140,85 +195,126 @@ To create a service account that has the necessary permissions for SkyPilot, you kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: - name: sky-sa-rb - namespace: default + name: sky-sa-rb # Can be changed if needed + namespace: default # Change to your namespace if using a different one. labels: parent: skypilot subjects: - - kind: ServiceAccount - name: sky-sa + - kind: ServiceAccount + name: sky-sa # Change to your service account name roleRef: - kind: Role - name: sky-sa-role - apiGroup: rbac.authorization.k8s.io + kind: Role + name: sky-sa-role # Use the same name as the role at line 14 + apiGroup: rbac.authorization.k8s.io --- - # Role for accessing ingress resources + # ClusterRole for the service account + kind: ClusterRole + apiVersion: rbac.authorization.k8s.io/v1 + metadata: + name: sky-sa-cluster-role # Can be changed if needed + namespace: default # Change to your namespace if using a different one. + labels: + parent: skypilot + rules: + - apiGroups: [""] + resources: ["nodes"] # Required for getting node resources. + verbs: ["get", "list", "watch"] + - apiGroups: ["node.k8s.io"] + resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes. + verbs: ["get", "list", "watch"] + - apiGroups: ["networking.k8s.io"] # Required for exposing services through ingresses + resources: ["ingressclasses"] + verbs: ["get", "list", "watch"] + --- + # ClusterRoleBinding for the service account apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRoleBinding + metadata: + name: sky-sa-cluster-role-binding # Can be changed if needed + namespace: default # Change to your namespace if using a different one. + labels: + parent: skypilot + subjects: + - kind: ServiceAccount + name: sky-sa # Change to your service account name + namespace: default # Change to your namespace if using a different one. + roleRef: + kind: ClusterRole + name: sky-sa-cluster-role # Use the same name as the cluster role at line 43 + apiGroup: rbac.authorization.k8s.io + --- + # Optional: If using object store mounting, create the skypilot-system namespace + apiVersion: v1 + kind: Namespace + metadata: + name: skypilot-system # Do not change this + labels: + parent: skypilot + --- + # Optional: If using object store mounting, create role in the skypilot-system + # namespace to create FUSE device manager. kind: Role + apiVersion: rbac.authorization.k8s.io/v1 metadata: - namespace: ingress-nginx - name: sky-sa-role-ingress-nginx + name: skypilot-system-service-account-role # Can be changed if needed + namespace: skypilot-system # Do not change this namespace + labels: + parent: skypilot rules: - - apiGroups: [""] - resources: ["services"] - verbs: ["list", "get", "watch"] - - apiGroups: ["rbac.authorization.k8s.io"] - resources: ["roles", "rolebindings"] - verbs: ["list", "get", "watch"] + - apiGroups: ["*"] + resources: ["*"] + verbs: ["*"] --- - # RoleBinding for accessing ingress resources + # Optional: If using object store mounting, create rolebinding in the skypilot-system + # namespace to create FUSE device manager. apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: - name: sky-sa-rolebinding-ingress-nginx - namespace: ingress-nginx + name: sky-sa-skypilot-system-role-binding + namespace: skypilot-system # Do not change this namespace + labels: + parent: skypilot subjects: - - kind: ServiceAccount - name: sky-sa - namespace: default + - kind: ServiceAccount + name: sky-sa # Change to your service account name + namespace: default # Change this to the namespace where the service account is created roleRef: kind: Role - name: sky-sa-role-ingress-nginx + name: skypilot-system-service-account-role # Use the same name as the role at line 88 apiGroup: rbac.authorization.k8s.io --- - # ClusterRole for the service account - kind: ClusterRole + # Optional: Role for accessing ingress resources apiVersion: rbac.authorization.k8s.io/v1 + kind: Role metadata: - name: sky-sa-cluster-role - namespace: default + name: sky-sa-role-ingress-nginx # Can be changed if needed + namespace: ingress-nginx # Do not change this namespace labels: parent: skypilot rules: - - apiGroups: [""] - resources: ["nodes"] # Required for getting node resources. - verbs: ["get", "list", "watch"] - - apiGroups: ["rbac.authorization.k8s.io"] - resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod. - verbs: ["get", "list", "watch"] - - apiGroups: ["node.k8s.io"] - resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes. - verbs: ["get", "list", "watch"] - - apiGroups: ["networking.k8s.io"] # Required for exposing services. - resources: ["ingressclasses"] - verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["services"] + verbs: ["list", "get", "watch"] + - apiGroups: ["rbac.authorization.k8s.io"] + resources: ["roles", "rolebindings"] + verbs: ["list", "get", "watch"] --- - # ClusterRoleBinding for the service account + # Optional: RoleBinding for accessing ingress resources apiVersion: rbac.authorization.k8s.io/v1 - kind: ClusterRoleBinding + kind: RoleBinding metadata: - name: sky-sa-cluster-role-binding - namespace: default + name: sky-sa-rolebinding-ingress-nginx # Can be changed if needed + namespace: ingress-nginx # Do not change this namespace labels: - parent: skypilot + parent: skypilot subjects: - - kind: ServiceAccount - name: sky-sa - namespace: default + - kind: ServiceAccount + name: sky-sa # Change to your service account name + namespace: default # Change this to the namespace where the service account is created roleRef: - kind: ClusterRole - name: sky-sa-cluster-role - apiGroup: rbac.authorization.k8s.io + kind: Role + name: sky-sa-role-ingress-nginx # Use the same name as the role at line 119 + apiGroup: rbac.authorization.k8s.io Create the service account using the following command: @@ -226,9 +322,12 @@ Create the service account using the following command: $ kubectl apply -f create-sky-sa.yaml -After creating the service account, configure SkyPilot to use it through ``~/.sky/config.yaml``: +After creating the service account, the cluster admin may distribute kubeconfigs with the ``sky-sa`` service account to users who need to access the cluster. + +Users should also configure SkyPilot to use the ``sky-sa`` service account through ``~/.sky/config.yaml``: .. code-block:: yaml + # ~/.sky/config.yaml kubernetes: remote_identity: sky-sa # Or your service account name diff --git a/docs/source/reference/kubernetes/kubernetes-getting-started.rst b/docs/source/reference/kubernetes/kubernetes-getting-started.rst index c2162da3779..99a777ce1c0 100644 --- a/docs/source/reference/kubernetes/kubernetes-getting-started.rst +++ b/docs/source/reference/kubernetes/kubernetes-getting-started.rst @@ -19,8 +19,8 @@ To connect and use a Kubernetes cluster, SkyPilot needs: In a typical workflow: -1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for - different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) are included in the :ref:`Kubernetes cluster setup guide `. +1. A cluster administrator sets up a Kubernetes cluster. Refer to admin guides for + :ref:`Kubernetes cluster setup ` for different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) and :ref:`required permissions `. 2. Users who want to run SkyPilot tasks on this cluster are issued Kubeconfig files containing their credentials (`kube-context `_). diff --git a/docs/source/reference/kubernetes/kubernetes-setup.rst b/docs/source/reference/kubernetes/kubernetes-setup.rst index 3ed1b8c89f0..4acf271bdca 100644 --- a/docs/source/reference/kubernetes/kubernetes-setup.rst +++ b/docs/source/reference/kubernetes/kubernetes-setup.rst @@ -18,7 +18,7 @@ SkyPilot's Kubernetes support is designed to work with most Kubernetes distribut To connect to a Kubernetes cluster, SkyPilot needs: * An existing Kubernetes cluster running Kubernetes v1.20 or later. -* A `Kubeconfig `_ file containing access credentials and namespace to be used. +* A `Kubeconfig `_ file containing access credentials and namespace to be used. To reduce the permissions for a user, check :ref:`required permissions guide`. Deployment Guides diff --git a/sky/provision/kubernetes/config.py b/sky/provision/kubernetes/config.py index 65c494fcebf..c4c834d85fe 100644 --- a/sky/provision/kubernetes/config.py +++ b/sky/provision/kubernetes/config.py @@ -46,6 +46,21 @@ def bootstrap_instances( _configure_autoscaler_cluster_role(namespace, config.provider_config) _configure_autoscaler_cluster_role_binding(namespace, config.provider_config) + # SkyPilot system namespace is required for FUSE mounting. Here we just + # create the namespace and set up the necessary permissions. + # + # We need to setup the namespace outside the + # if config.provider_config.get('fuse_device_required') block below + # because if we put in the if block, the following happens: + # 1. User launches job controller on Kubernetes with SERVICE_ACCOUNT. No + # namespace is created at this point since the controller does not + # require FUSE. + # 2. User submits a job requiring FUSE. + # 3. The namespace is created here, but since the job controller is + # using DEFAULT_SERVICE_ACCOUNT_NAME, it does not have the necessary + # permissions to create a role for itself to create the FUSE manager. + # 4. The job fails to launch. + _configure_skypilot_system_namespace(config.provider_config) if config.provider_config.get('port_mode', 'loadbalancer') == 'ingress': logger.info('Port mode is set to ingress, setting up ingress role ' 'and role binding.') @@ -69,26 +84,8 @@ def bootstrap_instances( elif requested_service_account != 'default': logger.info(f'Using service account {requested_service_account!r}, ' 'skipping role and role binding setup.') - - # SkyPilot system namespace is required for FUSE mounting. Here we just - # create the namespace and set up the necessary permissions. - # - # We need to setup the namespace outside the if block below because if - # we put in the if block, the following happens: - # 1. User launches job controller on Kubernetes with SERVICE_ACCOUNT. No - # namespace is created at this point since the controller does not - # require FUSE. - # 2. User submits a job requiring FUSE. - # 3. The namespace is created here, but since the job controller is using - # SERVICE_ACCOUNT, it does not have the necessary permissions to create - # a role for itself to create the FUSE device manager. - # 4. The job fails to launch. - _configure_skypilot_system_namespace(config.provider_config, - requested_service_account) - if config.provider_config.get('fuse_device_required', False): _configure_fuse_mounting(config.provider_config) - return config @@ -502,8 +499,7 @@ def _configure_ssh_jump(namespace, config: common.ProvisionConfig): def _configure_skypilot_system_namespace( - provider_config: Dict[str, - Any], service_account: Optional[str]) -> None: + provider_config: Dict[str, Any]) -> None: """Creates the namespace for skypilot-system mounting if it does not exist. Also patches the SkyPilot service account to have the necessary permissions @@ -513,34 +509,28 @@ def _configure_skypilot_system_namespace( skypilot_system_namespace = provider_config['skypilot_system_namespace'] kubernetes_utils.create_namespace(skypilot_system_namespace) - # Setup permissions if using the default service account. - # If the user has requested a different service account (via - # remote_identity in ~/.sky/config.yaml), we assume they have already set - # up the necessary roles and role bindings. - if service_account == kubernetes_utils.DEFAULT_SERVICE_ACCOUNT_NAME: - # Note - this must be run only after the service account has been - # created in the cluster (in bootstrap_instances). - # Create the role in the skypilot-system namespace if it does not exist. - _configure_autoscaler_role(skypilot_system_namespace, - provider_config, - role_field='autoscaler_skypilot_system_role') - # We must create a unique role binding per-namespace that SkyPilot is - # running in, so we override the name with a unique name identifying - # the namespace. This is required for multi-tenant setups where - # different SkyPilot instances may be running in different namespaces. - override_name = provider_config[ - 'autoscaler_skypilot_system_role_binding']['metadata'][ - 'name'] + '-' + svc_account_namespace - - # Create the role binding in the skypilot-system namespace, and have - # the subject namespace be the namespace that the SkyPilot service - # account is created in. - _configure_autoscaler_role_binding( - skypilot_system_namespace, - provider_config, - binding_field='autoscaler_skypilot_system_role_binding', - override_name=override_name, - override_subject_namespace=svc_account_namespace) + # Note - this must be run only after the service account has been + # created in the cluster (in bootstrap_instances). + # Create the role in the skypilot-system namespace if it does not exist. + _configure_autoscaler_role(skypilot_system_namespace, + provider_config, + role_field='autoscaler_skypilot_system_role') + # We must create a unique role binding per-namespace that SkyPilot is + # running in, so we override the name with a unique name identifying + # the namespace. This is required for multi-tenant setups where + # different SkyPilot instances may be running in different namespaces. + override_name = provider_config['autoscaler_skypilot_system_role_binding'][ + 'metadata']['name'] + '-' + svc_account_namespace + + # Create the role binding in the skypilot-system namespace, and have + # the subject namespace be the namespace that the SkyPilot service + # account is created in. + _configure_autoscaler_role_binding( + skypilot_system_namespace, + provider_config, + binding_field='autoscaler_skypilot_system_role_binding', + override_name=override_name, + override_subject_namespace=svc_account_namespace) def _configure_fuse_mounting(provider_config: Dict[str, Any]) -> None: diff --git a/sky/provision/kubernetes/utils.py b/sky/provision/kubernetes/utils.py index 9a3a82d5924..1149a737f1b 100644 --- a/sky/provision/kubernetes/utils.py +++ b/sky/provision/kubernetes/utils.py @@ -699,6 +699,12 @@ def get_current_kube_config_context_namespace() -> str: the default namespace. """ k8s = kubernetes.kubernetes + # Get namespace if using in-cluster config + ns_path = '/var/run/secrets/kubernetes.io/serviceaccount/namespace' + if os.path.exists(ns_path): + with open(ns_path, encoding='utf-8') as f: + return f.read().strip() + # If not in-cluster, get the namespace from kubeconfig try: _, current_context = k8s.config.list_kube_config_contexts() if 'namespace' in current_context['context']: diff --git a/sky/utils/kubernetes/generate_static_kubeconfig.sh b/sky/utils/kubernetes/generate_static_kubeconfig.sh index 30ea929177a..3b0c331584d 100755 --- a/sky/utils/kubernetes/generate_static_kubeconfig.sh +++ b/sky/utils/kubernetes/generate_static_kubeconfig.sh @@ -1,26 +1,38 @@ #!/bin/bash # This script creates a new k8s Service Account and generates a kubeconfig with -# its credentials. This Service Account has all the necessary permissions for +# its credentials. This Service Account has the minimal permissions necessary for # SkyPilot. The kubeconfig is written in the current directory. # -# You must configure your local kubectl to point to the right k8s cluster and -# have admin-level access. +# Before running this script, you must configure your local kubectl to point to +# the right k8s cluster and have admin-level access. # -# Note: all of the k8s resources are created in namespace "skypilot". If you -# delete any of these objects, SkyPilot will stop working. +# By default, this script will create a service account "sky-sa" in "default" +# namespace. If you want to use a different namespace or service account name: # -# You can override the default namespace "skypilot" using the -# SKYPILOT_NAMESPACE environment variable. -# You can override the default service account name "skypilot-sa" using the -# SKYPILOT_SA_NAME environment variable. +# * Specify SKYPILOT_NAMESPACE env var to override the default namespace +# * Specify SKYPILOT_SA_NAME env var to override the default service account name +# * Specify SKIP_SA_CREATION=1 to skip creating the service account and use an existing one +# +# Usage: +# # Create "sky-sa" service account with minimal permissions in "default" namespace and generate kubeconfig +# $ ./generate_static_kubeconfig.sh +# +# # Create "my-sa" account with minimal permissions in "my-namespace" namespace and generate kubeconfig +# $ SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_static_kubeconfig.sh +# +# # Use an existing service account "my-sa" in "my-namespace" namespace and generate kubeconfig +# $ SKIP_SA_CREATION=1 SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_static_kubeconfig.sh set -eu -o pipefail # Allow passing in common name and username in environment. If not provided, # use default. -SKYPILOT_SA=${SKYPILOT_SA_NAME:-skypilot-sa} +SKYPILOT_SA=${SKYPILOT_SA_NAME:-sky-sa} NAMESPACE=${SKYPILOT_NAMESPACE:-default} +echo "Service account: ${SKYPILOT_SA}" +echo "Namespace: ${NAMESPACE}" + # Set OS specific values. if [[ "$OSTYPE" == "linux-gnu" ]]; then BASE64_DECODE_FLAG="-d" @@ -33,41 +45,165 @@ else exit 1 fi -echo "Creating the Kubernetes Service Account with minimal RBAC permissions." -kubectl apply -f - <