Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into serve-proxy-prototype
Browse files Browse the repository at this point in the history
  • Loading branch information
cblmemo committed May 8, 2024
2 parents 906e423 + 7f30ce5 commit 5d276b5
Show file tree
Hide file tree
Showing 73 changed files with 2,163 additions and 683 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ sky_logs/
sky/clouds/service_catalog/data_fetchers/*.csv
.vscode/
.idea/

.env
1 change: 1 addition & 0 deletions docs/source/cloud-setup/cloud-permissions/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ Table of Contents
aws
gcp
vsphere
kubernetes
234 changes: 234 additions & 0 deletions docs/source/cloud-setup/cloud-permissions/kubernetes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
.. _cloud-permissions-kubernetes:

Kubernetes
==========

When running outside your Kubernetes cluster, SkyPilot uses your local ``~/.kube/config`` file
for authentication and creating resources on your Kubernetes cluster.

When running inside your Kubernetes cluster (e.g., as a Spot controller or Serve controller),
SkyPilot can operate using either of the following three authentication methods:

1. **Using your local kubeconfig file**: In this case, SkyPilot will
copy your local ``~/.kube/config`` file to the controller pod and use it for
authentication. This is the default method when running inside the cluster,
and no additional configuration is required.

.. note::

If your cluster uses exec based authentication in your ``~/.kube/config`` file
(e.g., GKE uses exec auth by default), SkyPilot may not be able to authenticate using this method. In this case,
consider using the service account methods below.

2. **Creating a service account**: SkyPilot can automatically create the service
account and roles for itself to manage resources in the Kubernetes cluster.
To use this method, set ``remote_identity: SERVICE_ACCOUNT`` to your
Kubernetes configuration in the :ref:`~/.sky/config.yaml <config-yaml>` file:

.. code-block:: yaml
kubernetes:
remote_identity: SERVICE_ACCOUNT
For details on the permissions that are granted to the service account,
refer to the `Permissions required for SkyPilot`_ section below.

3. **Using a custom service account**: If you have a custom service account
with the `necessary permissions <k8s-permissions_>`__, you can configure
SkyPilot to use it by adding this to your :ref:`~/.sky/config.yaml <config-yaml>` file:

.. code-block:: yaml
kubernetes:
remote_identity: your-service-account-name
.. note::

Service account based authentication applies only when the remote SkyPilot
cluster (including spot and serve controller) is launched inside the
Kubernetes cluster. When running outside the cluster (e.g., on AWS),
SkyPilot will use the local ``~/.kube/config`` file for authentication.

Below are the permissions required by SkyPilot and an example service account YAML that you can use to create a service account with the necessary permissions.

.. _k8s-permissions:

Permissions required for SkyPilot
---------------------------------

SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster:

.. code-block:: yaml
# Namespaced role for the service account
# Required for creating pods, services and other necessary resources in the namespace.
# Note these permissions only apply in the namespace where SkyPilot is deployed.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role
namespace: default
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
# ClusterRole for accessing cluster-wide resources. Details for each resource below:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role
namespace: default
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
---
# Optional: If using ingresses, role for accessing ingress service IP
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx
name: sky-sa-role-ingress-nginx
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get"]
These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured).

.. _k8s-sa-example:

Example using Custom Service Account
------------------------------------

To create a service account that has the necessary permissions for SkyPilot, you can use the following YAML:

.. code-block:: yaml
# create-sky-sa.yaml
kind: ServiceAccount
apiVersion: v1
metadata:
name: sky-sa
namespace: default
labels:
parent: skypilot
---
# Role for the service account
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role
namespace: default
labels:
parent: skypilot
rules:
- apiGroups: ["*"] # Required for creating pods, services, secrets and other necessary resources in the namespace.
resources: ["*"]
verbs: ["*"]
---
# RoleBinding for the service account
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-rb
namespace: default
labels:
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa
roleRef:
kind: Role
name: sky-sa-role
apiGroup: rbac.authorization.k8s.io
---
# Role for accessing ingress resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx
name: sky-sa-role-ingress-nginx
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["list", "get", "watch"]
---
# RoleBinding for accessing ingress resources
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sky-sa-rolebinding-ingress-nginx
namespace: ingress-nginx
subjects:
- kind: ServiceAccount
name: sky-sa
namespace: default
roleRef:
kind: Role
name: sky-sa-role-ingress-nginx
apiGroup: rbac.authorization.k8s.io
---
# ClusterRole for the service account
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role
namespace: default
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["clusterroles", "clusterrolebindings"] # Required for launching more SkyPilot clusters from within the pod.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"] # Required for exposing services.
resources: ["ingressclasses"]
verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding for the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: sky-sa-cluster-role-binding
namespace: default
labels:
parent: skypilot
subjects:
- kind: ServiceAccount
name: sky-sa
namespace: default
roleRef:
kind: ClusterRole
name: sky-sa-cluster-role
apiGroup: rbac.authorization.k8s.io
Create the service account using the following command:

.. code-block:: bash
$ kubectl apply -f create-sky-sa.yaml
After creating the service account, configure SkyPilot to use it through ``~/.sky/config.yaml``:

.. code-block:: yaml
kubernetes:
remote_identity: sky-sa # Or your service account name
68 changes: 63 additions & 5 deletions docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Available fields and semantics:
# permission to create a security group.
security_group_name: my-security-group
# Identity to use for all AWS instances (optional).
# Identity to use for AWS instances (optional).
#
# LOCAL_CREDENTIALS: The user's local credential files will be uploaded to
# AWS instances created by SkyPilot. They are used for accessing cloud
Expand All @@ -120,6 +120,21 @@ Available fields and semantics:
# instances. SkyPilot will auto-create and reuse a service account (IAM
# role) for AWS instances.
#
# Customized service account (IAM role): <string> or <list of single-element dict>
# - <string>: apply the service account with the specified name to all instances.
# Example:
# remote_identity: my-service-account-name
# - <list of single-element dict>: A list of single-element dict mapping from the cluster name (pattern)
# to the service account name to use. The matching of the cluster name is done in the same order
# as the list.
# NOTE: If none of the wildcard expressions in the dict match the cluster name, LOCAL_CREDENTIALS will be used.
# To specify your default, use "*" as the wildcard expression.
# Example:
# remote_identity:
# - my-cluster-name: my-service-account-1
# - sky-serve-controller-*: my-service-account-2
# - "*": my-default-service-account
#
# Two caveats of SERVICE_ACCOUNT for multicloud users:
#
# - This only affects AWS instances. Local AWS credentials will still be
Expand Down Expand Up @@ -190,21 +205,21 @@ Available fields and semantics:
# Reserved capacity (optional).
#
#
# Whether to prioritize reserved instance types/locations (considered as 0
# cost) in the optimizer.
#
#
# If you have "automatically consumed" reservations in your GCP project:
# Setting this to true guarantees the optimizer will pick any matching
# reservation and GCP will auto consume your reservation, and setting to
# false means optimizer uses regular, non-zero pricing in optimization (if
# by chance any matching reservation is selected, GCP still auto consumes
# the reservation).
#
#
# If you have "specifically targeted" reservations (set by the
# `specific_reservations` field below): This field will automatically be set
# to true.
#
#
# Default: false.
prioritize_reservations: false
#
Expand Down Expand Up @@ -283,6 +298,30 @@ Available fields and semantics:
# Default: loadbalancer
ports: loadbalancer
# Identity to use for all Kubernetes pods (optional).
#
# LOCAL_CREDENTIALS: The user's local ~/.kube/config will be uploaded to the
# Kubernetes pods created by SkyPilot. They are used for authenticating with
# the Kubernetes API server and launching new pods (e.g., for
# spot/serve controllers).
#
# SERVICE_ACCOUNT: Local ~/.kube/config is not uploaded to Kubernetes pods.
# SkyPilot will auto-create and reuse a service account with necessary roles
# in the user's namespace.
#
# <string>: The name of a service account to use for all Kubernetes pods.
# This service account must exist in the user's namespace and have all
# necessary permissions. Refer to https://skypilot.readthedocs.io/en/latest/cloud-setup/cloud-permissions/kubernetes.html
# for details on the roles required by the service account.
#
# Using SERVICE_ACCOUNT or a custom service account only affects Kubernetes
# instances. Local ~/.kube/config will still be uploaded to non-Kubernetes
# instances (e.g., a serve controller on GCP or AWS may need to provision
# Kubernetes resources).
#
# Default: 'LOCAL_CREDENTIALS'.
remote_identity: my-k8s-service-account
# Attach custom metadata to Kubernetes objects created by SkyPilot
#
# Uses the same schema as Kubernetes metadata object: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#objectmeta-v1-meta
Expand Down Expand Up @@ -313,6 +352,25 @@ Available fields and semantics:
# Default: 10 seconds
provision_timeout: 10
# Autoscaler configured in the Kubernetes cluster (optional)
#
# This field informs SkyPilot about the cluster autoscaler used in the
# Kubernetes cluster. Setting this field disables pre-launch checks for
# GPU capacity in the cluster and SkyPilot relies on the autoscaler to
# provision nodes with the required GPU capacity.
#
# Remember to set provision_timeout accordingly when using an autoscaler.
#
# Supported values: gke, karpenter, generic
# gke: uses cloud.google.com/gke-accelerator label to identify GPUs on nodes
# karpenter: uses karpenter.k8s.aws/instance-gpu-name label to identify GPUs on nodes
# generic: uses skypilot.co/accelerator labels to identify GPUs on nodes
# Refer to https://skypilot.readthedocs.io/en/latest/reference/kubernetes/kubernetes-setup.html#setting-up-gpu-support
# for more details on setting up labels for GPU support.
#
# Default: null (no autoscaler, autodetect label format for GPU nodes)
autoscaler: gke
# Additional fields to override the pod fields used by SkyPilot (optional)
#
# Any key:value pairs added here would get added to the pod spec used to
Expand Down
18 changes: 17 additions & 1 deletion docs/source/reference/kubernetes/kubernetes-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ To use this mode:
# ingress-nginx-controller LoadBalancer 10.24.4.254 35.202.58.117 80:31253/TCP,443:32699/TCP
.. note::
If the ``EXTERNAL-IP`` field is ``<none>``, you must manually assign an External IP.
If the ``EXTERNAL-IP`` field is ``<none>``, you may manually assign it an External IP.
This can be done by patching the service with an IP that can be accessed from outside the cluster.
If the service type is ``NodePort``, you can set the ``EXTERNAL-IP`` to any node's IP address:

Expand All @@ -395,6 +395,22 @@ To use this mode:
If the ``EXTERNAL-IP`` field is left as ``<none>``, SkyPilot will use ``localhost`` as the external IP for the Ingress,
and the endpoint may not be accessible from outside the cluster.

.. note::
If you cannot update the ``EXTERNAL-IP`` field of the service, you can also
specify the Ingress IP or hostname through the ``skypilot.co/external-ip``
annotation on the ``ingress-nginx-controller`` service. In this case,
having a valid ``EXTERNAL-IP`` field is not required.

For example, if your ``ingress-nginx-controller`` service is ``NodePort``:

.. code-block:: bash
# Add skypilot.co/external-ip annotation to the nginx ingress service.
# Replace <IP> in the following command with the IP you select.
# Can be any node's IP if using NodePort service type.
$ kubectl annotate service ingress-nginx-controller skypilot.co/external-ip=<IP> -n ingress-nginx
3. Update the :ref:`SkyPilot config <config-yaml>` at :code:`~/.sky/config` to use the ingress mode.

.. code-block:: yaml
Expand Down
Loading

0 comments on commit 5d276b5

Please sign in to comment.