Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Robust service account and namespace support #3632

Merged
merged 4 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 181 additions & 82 deletions docs/source/cloud-setup/cloud-permissions/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ SkyPilot can operate using either of the following three authentication methods:
remote_identity: SERVICE_ACCOUNT

For details on the permissions that are granted to the service account,
refer to the `Permissions required for SkyPilot`_ section below.
refer to the `Minimum Permissions Required for SkyPilot`_ section below.

3. **Using a custom service account**: If you have a custom service account
with the `necessary permissions <k8s-permissions_>`__, you can configure
Expand All @@ -53,21 +53,21 @@ Below are the permissions required by SkyPilot and an example service account YA

.. _k8s-permissions:

Permissions required for SkyPilot
---------------------------------
Minimum Permissions Required for SkyPilot
-----------------------------------------

SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster:

.. code-block:: yaml

# Namespaced role for the service account
# Required for creating pods, services and other necessary resources in the namespace.
# Note these permissions only apply in the namespace where SkyPilot is deployed.
# Note these permissions only apply in the namespace where SkyPilot is deployed, and the namespace can be changed below.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role
namespace: default
name: sky-sa-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
rules:
- apiGroups: ["*"]
resources: ["*"]
Expand All @@ -77,58 +77,113 @@ SkyPilot requires permissions equivalent to the following roles to be able to ma
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role
namespace: default
labels:
parent: skypilot
name: sky-sa-cluster-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]


.. tip::

If you are using a different namespace than ``default``, make sure to change the namespace in the above manifests.

These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured).

If your tasks use object store mounting or require access to ingress resources, you will need to grant additional permissions as described below.

Permissions for Object Store Mounting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your tasks use object store mounting (e.g., S3, GCS, etc.), SkyPilot will need to run a DaemonSet to expose the FUSE device as a Kubernetes resource to SkyPilot pods.

To allow this, you will need to also create a ``skypilot-system`` namespace which will run the DaemonSet and grant the necessary permissions to the service account in that namespace.


.. code-block:: yaml

# Required only if using object store mounting
# Create namespace for SkyPilot system
apiVersion: v1
kind: Namespace
metadata:
name: skypilot-system # Do not change this
labels:
parent: skypilot
---
# Optional: If using ingresses, role for accessing ingress service IP
# Role for the skypilot-system namespace to create FUSE device manager and
# any other system components required by SkyPilot.
# This role must be bound in the skypilot-system namespace to the service account used for SkyPilot.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skypilot-system-service-account-role # Can be changed if needed
namespace: skypilot-system # Do not change this namespace
labels:
parent: skypilot
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]


Permissions for using Ingress
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your tasks use :ref:`Ingress <kubernetes-ingress>` for exposing ports, you will need to grant the necessary permissions to the service account in the ``ingress-nginx`` namespace.

.. code-block:: yaml

# Required only if using ingresses
# Role for accessing ingress service IP
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx
name: sky-sa-role-ingress-nginx
namespace: ingress-nginx # Do not change this
name: sky-sa-role-ingress-nginx # Can be changed if needed
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get"]

These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured).

.. _k8s-sa-example:

Example using Custom Service Account
------------------------------------

To create a service account that has the necessary permissions for SkyPilot, you can use the following YAML:
To create a service account that has all necessary permissions for SkyPilot (including for accessing object stores), you can use the following YAML.

.. tip::

In this example, the service account is named ``sky-sa`` and is created in the ``default`` namespace.
Change the namespace and service account name as needed.


.. code-block:: yaml
:linenos:

# create-sky-sa.yaml
kind: ServiceAccount
apiVersion: v1
metadata:
name: sky-sa
namespace: default
name: sky-sa # Change to your service account name
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
---
# Role for the service account
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role
namespace: default
name: sky-sa-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
rules:
Expand All @@ -140,95 +195,139 @@ To create a service account that has the necessary permissions for SkyPilot, you
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-rb
namespace: default
name: sky-sa-rb # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa
- kind: ServiceAccount
name: sky-sa # Change to your service account name
roleRef:
kind: Role
name: sky-sa-role
apiGroup: rbac.authorization.k8s.io
kind: Role
name: sky-sa-role # Use the same name as the role at line 14
apiGroup: rbac.authorization.k8s.io
---
# Role for accessing ingress resources
# ClusterRole for the service account
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"] # Required for exposing services through ingresses
resources: ["ingressclasses"]
verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding for the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: sky-sa-cluster-role-binding # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa # Change to your service account name
namespace: default # Change to your namespace if using a different one.
roleRef:
kind: ClusterRole
name: sky-sa-cluster-role # Use the same name as the cluster role at line 43
apiGroup: rbac.authorization.k8s.io
---
# Optional: If using object store mounting, create the skypilot-system namespace
apiVersion: v1
kind: Namespace
metadata:
name: skypilot-system # Do not change this
labels:
parent: skypilot
---
# Optional: If using object store mounting, create role in the skypilot-system
# namespace to create FUSE device manager.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: ingress-nginx
name: sky-sa-role-ingress-nginx
name: skypilot-system-service-account-role # Can be changed if needed
namespace: skypilot-system # Do not change this namespace
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["list", "get", "watch"]
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
# RoleBinding for accessing ingress resources
# Optional: If using object store mounting, create rolebinding in the skypilot-system
# namespace to create FUSE device manager.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sky-sa-rolebinding-ingress-nginx
namespace: ingress-nginx
name: sky-sa-skypilot-system-role-binding
namespace: skypilot-system # Do not change this namespace
labels:
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa
namespace: default
- kind: ServiceAccount
name: sky-sa # Change to your service account name
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved
namespace: default # Change this to the namespace where the service account is created
roleRef:
kind: Role
name: sky-sa-role-ingress-nginx
name: skypilot-system-service-account-role # Use the same name as the role at line 88
apiGroup: rbac.authorization.k8s.io
---
# ClusterRole for the service account
kind: ClusterRole
# Optional: Role for accessing ingress resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: sky-sa-cluster-role
namespace: default
name: sky-sa-role-ingress-nginx # Can be changed if needed
namespace: ingress-nginx # Do not change this namespace
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"] # Required for exposing services.
resources: ["ingressclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["list", "get", "watch"]
---
# ClusterRoleBinding for the service account
# Optional: RoleBinding for accessing ingress resources
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
kind: RoleBinding
metadata:
name: sky-sa-cluster-role-binding
namespace: default
name: sky-sa-rolebinding-ingress-nginx # Can be changed if needed
namespace: ingress-nginx # Do not change this namespace
labels:
parent: skypilot
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa
namespace: default
- kind: ServiceAccount
name: sky-sa # Change to your service account name
namespace: default # Change this to the namespace where the service account is created
roleRef:
kind: ClusterRole
name: sky-sa-cluster-role
apiGroup: rbac.authorization.k8s.io
kind: Role
name: sky-sa-role-ingress-nginx # Use the same name as the role at line 119
apiGroup: rbac.authorization.k8s.io

Create the service account using the following command:

.. code-block:: bash

$ kubectl apply -f create-sky-sa.yaml

After creating the service account, configure SkyPilot to use it through ``~/.sky/config.yaml``:
After creating the service account, the cluster admin may distribute kubeconfigs with the ``sky-sa`` service account to users who need to access the cluster.

Users should also configure SkyPilot to use the ``sky-sa`` service account through ``~/.sky/config.yaml``:
Comment on lines +325 to +327
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the sky-sa name above the same as the default name in our codebase, so after creation, the user does not need to specify it in a ~/.sky/config.yaml (we can still mention that a user can change the name of the service account and specify the config yaml, but we should keep the service account creation with the same name as the hardcoded one)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the default service account skypilot-service-account requires additional permissions because our code is self-correcting - it automatically fixes any service account misconfigurations by creating/patching resources (e.g., when user is upgrading versions or has accidentally deleted k8s RBAC resources created by SkyPilot).

  1. Our code will inspect and create/update the necessary roles and rolebindings. This requires additional "create", "patch" permissions on "clusterroles", "clusterrolebindings".
  2. Our code will inspect and create the skypilot-system namespace if it doesn't exist. This requires additional "list", "create" permissions on namespaces.

These extra permissions on "clusterroles", "clusterrolebindings", "namespaces" may be considered too permissive in some environments (e.g., shared clusters). Perhaps we should keep the permissions here limited to minimal permissions required?


.. code-block:: yaml

# ~/.sky/config.yaml
kubernetes:
remote_identity: sky-sa # Or your service account name
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ To connect and use a Kubernetes cluster, SkyPilot needs:

In a typical workflow:

1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for
different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) are included in the :ref:`Kubernetes cluster setup guide <kubernetes-setup>`.
1. A cluster administrator sets up a Kubernetes cluster. Refer to admin guides for
:ref:`Kubernetes cluster setup <kubernetes-setup>` for different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) and :ref:`required permissions <cloud-permissions-kubernetes>`.

2. Users who want to run SkyPilot tasks on this cluster are issued Kubeconfig
files containing their credentials (`kube-context <https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#define-clusters-users-and-contexts>`_).
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/kubernetes/kubernetes-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ SkyPilot's Kubernetes support is designed to work with most Kubernetes distribut
To connect to a Kubernetes cluster, SkyPilot needs:

* An existing Kubernetes cluster running Kubernetes v1.20 or later.
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used.
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used. To reduce the permissions for a user, check :ref:`required permissions guide<cloud-permissions-kubernetes>`.


Deployment Guides
Expand Down
Loading
Loading