diff --git a/docs/source/reference/kubernetes/index.rst b/docs/source/reference/kubernetes/index.rst index b411f3558f6..c1a1ec1c74e 100644 --- a/docs/source/reference/kubernetes/index.rst +++ b/docs/source/reference/kubernetes/index.rst @@ -21,7 +21,10 @@ Why use SkyPilot on Kubernetes? .. grid-item-card:: ✅ Ease of Use :text-align: center - No complex kubernetes manifests - write a simple SkyPilot YAML and run ``sky launch``. + .. + TODO(romilb): We should have a comparison of a popular Kubernetes manifest vs a SkyPilot YAML in terms of LoC in a mini blog and link it here. + + No complex kubernetes manifests - write a simple SkyPilot YAML and run with one command ``sky launch``. .. grid-item-card:: 📋 Interactive development on Kubernetes :text-align: center @@ -58,247 +61,49 @@ Why use SkyPilot on Kubernetes? .. grid-item-card:: 👀 Observability :text-align: center - Works with your existing tools, such as the :ref:`Kubernetes Dashboard `. + Works with your existing observability and monitoring tools, such as the :ref:`Kubernetes Dashboard `. .. grid-item-card:: 🍽️ Self-serve infra for your teams :text-align: center - .. - This point should maybe talk about quotas + sharing through kueue/native k8s quotas. - - Reduce operational overhead by letting your teams provision their own resources on Kubernetes, while you retain control over the cluster. - - -Kubernetes Cluster Requirements -------------------------------- - -To connect and use a Kubernetes cluster, SkyPilot needs: - -* An existing Kubernetes cluster running Kubernetes v1.20 or later. -* A `Kubeconfig `_ file containing access credentials and namespace to be used. - -**Supported Kubernetes deployments:** - -* Hosted Kubernetes services (EKS, GKE) -* On-prem clusters (Kubeadm, Rancher, K3s) -* Local development clusters (KinD, minikube) - -In a typical workflow: - -1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for - different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) are included in the :ref:`Kubernetes cluster setup guide `. - -2. Users who want to run SkyPilot tasks on this cluster are issued Kubeconfig - files containing their credentials (`kube-context `_). - SkyPilot reads this Kubeconfig file to communicate with the cluster. - -Getting Started ---------------- -.. _kubernetes-instructions: - -Once your cluster administrator has :ref:`setup a Kubernetes cluster ` and provided you with a kubeconfig file: - -0. Make sure `kubectl `_, ``socat`` and ``nc`` (netcat) are installed on your local machine. - - .. code-block:: console - - $ # MacOS - $ brew install kubectl socat netcat - - $ # Linux (may have socat already installed) - $ sudo apt-get install kubectl socat netcat - - -1. Place your kubeconfig file at ``~/.kube/config``. - - .. code-block:: console - - $ mkdir -p ~/.kube - $ cp /path/to/kubeconfig ~/.kube/config - - You can verify your credentials are setup correctly by running :code:`kubectl get pods`. - -2. Run :code:`sky check` and verify that Kubernetes is enabled in SkyPilot. - - .. code-block:: console - - $ sky check - - Checking credentials to enable clouds for SkyPilot. - ... - Kubernetes: enabled - ... - - - .. note:: - :code:`sky check` will also check if GPU support is available on your cluster. If GPU support is not available, it - will show the reason. - To setup GPU support on the cluster, refer to the :ref:`Kubernetes cluster setup guide `. - -.. _kubernetes-optimizer-table: - -4. You can now run any SkyPilot task on your Kubernetes cluster. - - .. code-block:: console - - $ sky launch --cpus 2+ task.yaml - == Optimizer == - Target: minimizing cost - Estimated cost: $0.0 / hour - - Considered resources (1 node): - --------------------------------------------------------------------------------------------------- - CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN - --------------------------------------------------------------------------------------------------- - Kubernetes 2CPU--2GB 2 2 - kubernetes 0.00 ✔ - AWS m6i.large 2 8 - us-east-1 0.10 - Azure Standard_D2s_v5 2 8 - eastus 0.10 - GCP n2-standard-2 2 8 - us-central1 0.10 - IBM bx2-8x32 8 32 - us-east 0.38 - Lambda gpu_1x_a10 30 200 A10:1 us-east-1 0.60 - ---------------------------------------------------------------------------------------------------. - - -.. note:: - SkyPilot will use the cluster and namespace set in the ``current-context`` in the - kubeconfig file. To manage your ``current-context``: - - .. code-block:: console - - $ # See current context - $ kubectl config current-context - - $ # Switch current-context - $ kubectl config use-context mycontext - - $ # Set a specific namespace to be used in the current-context - $ kubectl config set-context --current --namespace=mynamespace - + Reduce operational overhead by letting your teams provision their own resources, while you retain control over the Kubernetes cluster. -Using Custom Images -------------------- -By default, we use and maintain a SkyPilot container image that has conda and a few other basic tools installed. -To use your own image, add :code:`image_id: docker:` to the :code:`resources` section of your task YAML. +Table of Contents +----------------- -.. code-block:: yaml +.. grid:: 3 + :gutter: 3 - resources: - image_id: docker:myrepo/myimage:latest - ... + .. grid-item-card:: 👋 Get Started + :link: kubernetes-getting-started + :link-type: ref + :text-align: center -Your image must satisfy the following requirements: + Already have a kubeconfig? Launch your first SkyPilot task on Kubernetes - it's as simple as ``sky launch``. -* Image must be **debian-based** and must have the apt package manager installed. -* The default user in the image must have root privileges or passwordless sudo access. + .. grid-item-card:: ⚙️ Cluster Configuration + :link: kubernetes-setup + :link-type: ref + :text-align: center -.. note:: + Are you a cluster admin? Find cluster deployment guides and setup instructions here. - If your cluster runs on non-x86_64 architecture (e.g., Apple Silicon), your image must be built natively for that architecture. Otherwise, your job may get stuck at :code:`Start streaming logs ...`. See `GitHub issue `_ for more. + .. grid-item-card:: 🔍️ Troubleshooting + :link: kubernetes-troubleshooting + :link-type: ref + :text-align: center -Using Images from Private Repositories -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -To use images from private repositories (e.g., Private DockerHub, Amazon ECR, Google Container Registry), create a `secret `_ in your Kubernetes cluster and edit your :code:`~/.sky/config.yaml` to specify the secret like so: + Running into problems with SkyPilot on your Kubernetes cluster? Find common issues and solutions here. -.. code-block:: yaml - kubernetes: - pod_config: - spec: - imagePullSecrets: - - name: your-secret-here - -.. tip:: - - If you use Amazon ECR, your secret credentials may expire every 12 hours. Consider using `k8s-ecr-login-renew `_ to automatically refresh your secrets. - - -Opening Ports -------------- - -Opening ports on SkyPilot clusters running on Kubernetes is supported through two modes: - -1. `LoadBalancer services `_ (default) -2. `Nginx IngressController `_ - -One of these modes must be supported and configured on your cluster. Refer to the :ref:`setting up ports on Kubernetes guide ` on how to do this. - -.. tip:: - - On Google GKE, Amazon EKS or other cloud-hosted Kubernetes services, the default LoadBalancer services mode is supported out of the box and no additional configuration is needed. - -Once your cluster is configured, launch a task which exposes services on a port by adding :code:`ports` to the :code:`resources` section of your task YAML. - -.. code-block:: yaml - - # task.yaml - resources: - ports: 8888 - - run: | - python -m http.server 8888 - -After launching the cluster with :code:`sky launch -c myclus task.yaml`, you can get the URL to access the port using :code:`sky status --endpoints myclus`. - -.. code-block:: bash - - # List all ports exposed by the cluster - $ sky status --endpoints myclus - 8888: 34.173.13.241:8888 - - # curl a specific port's endpoint - $ curl $(sky status --endpoint 8888 myclus) - ... - -.. tip:: - - To learn more about opening ports in SkyPilot tasks, see :ref:`Opening Ports `. - -FAQs ----- - -* **Are autoscaling Kubernetes clusters supported?** - - To run on an autoscaling cluster, you may need to adjust the resource provisioning timeout (:code:`Kubernetes.TIMEOUT` in `clouds/kubernetes.py`) to a large value to give enough time for the cluster to autoscale. We are working on a better interface to adjust this timeout - stay tuned! - -* **Can SkyPilot provision a Kubernetes cluster for me? Will SkyPilot add more nodes to my Kubernetes clusters?** - - The goal of Kubernetes support is to run SkyPilot tasks on an existing Kubernetes cluster. It does not provision any new Kubernetes clusters or add new nodes to an existing Kubernetes cluster. - -* **I have multiple users in my organization who share the same Kubernetes cluster. How do I provide isolation for their SkyPilot workloads?** - - For isolation, you can create separate Kubernetes namespaces and set them in the kubeconfig distributed to users. SkyPilot will use the namespace set in the kubeconfig for running all tasks. - -* **How do I view the pods created by SkyPilot on my Kubernetes cluster?** - - You can use your existing observability tools to filter resources with the label :code:`parent=skypilot` (:code:`kubectl get pods -l 'parent=skypilot'`). As an example, follow the instructions :ref:`here ` to deploy the Kubernetes Dashboard on your cluster. - -* **How can I specify custom configuration for the pods created by SkyPilot?** - - You can override the pod configuration used by SkyPilot by setting the :code:`pod_config` key in :code:`~/.sky/config.yaml`. - The value of :code:`pod_config` should be a dictionary that follows the `Kubernetes Pod API `_. - - For example, to set custom environment variables and attach a volume on your pods, you can add the following to your :code:`~/.sky/config.yaml` file: - - .. code-block:: yaml +.. toctree:: + :hidden: - kubernetes: - pod_config: - spec: - containers: - - env: - - name: MY_ENV_VAR - value: MY_ENV_VALUE - volumeMounts: # Custom volume mounts for the pod - - mountPath: /foo - name: example-volume - volumes: - - name: example-volume - hostPath: - path: /tmp - type: Directory + kubernetes-getting-started + kubernetes-setup + kubernetes-troubleshooting - For more details refer to :ref:`config-yaml`. Features and Roadmap -------------------- @@ -311,11 +116,4 @@ Kubernetes support is under active development. Some features are in progress an * Multi-node tasks - ✅ Available * Custom images - ✅ Available * Opening ports and exposing services - ✅ Available -* Multiple Kubernetes Clusters - 🚧 In progress - - -.. toctree:: - :hidden: - - kubernetes-setup - kubernetes-troubleshooting +* Multiple Kubernetes Clusters - 🚧 In progress \ No newline at end of file diff --git a/docs/source/reference/kubernetes/kubernetes-getting-started.rst b/docs/source/reference/kubernetes/kubernetes-getting-started.rst new file mode 100644 index 00000000000..c2162da3779 --- /dev/null +++ b/docs/source/reference/kubernetes/kubernetes-getting-started.rst @@ -0,0 +1,235 @@ +.. _kubernetes-getting-started: + +Getting Started on Kubernetes +============================= + +Prerequisites +------------- + +To connect and use a Kubernetes cluster, SkyPilot needs: + +* An existing Kubernetes cluster running Kubernetes v1.20 or later. +* A `Kubeconfig `_ file containing access credentials and namespace to be used. + +**Supported Kubernetes deployments:** + +* Hosted Kubernetes services (EKS, GKE) +* On-prem clusters (Kubeadm, Rancher, K3s) +* Local development clusters (KinD, minikube) + +In a typical workflow: + +1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for + different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) are included in the :ref:`Kubernetes cluster setup guide `. + +2. Users who want to run SkyPilot tasks on this cluster are issued Kubeconfig + files containing their credentials (`kube-context `_). + SkyPilot reads this Kubeconfig file to communicate with the cluster. + +Launching your first task +------------------------- +.. _kubernetes-instructions: + +Once your cluster administrator has :ref:`setup a Kubernetes cluster ` and provided you with a kubeconfig file: + +0. Make sure `kubectl `_, ``socat`` and ``nc`` (netcat) are installed on your local machine. + + .. code-block:: console + + $ # MacOS + $ brew install kubectl socat netcat + + $ # Linux (may have socat already installed) + $ sudo apt-get install kubectl socat netcat + + +1. Place your kubeconfig file at ``~/.kube/config``. + + .. code-block:: console + + $ mkdir -p ~/.kube + $ cp /path/to/kubeconfig ~/.kube/config + + You can verify your credentials are setup correctly by running :code:`kubectl get pods`. + +2. Run :code:`sky check` and verify that Kubernetes is enabled in SkyPilot. + + .. code-block:: console + + $ sky check + + Checking credentials to enable clouds for SkyPilot. + ... + Kubernetes: enabled + ... + + + .. note:: + :code:`sky check` will also check if GPU support is available on your cluster. If GPU support is not available, it + will show the reason. + To setup GPU support on the cluster, refer to the :ref:`Kubernetes cluster setup guide `. + +.. _kubernetes-optimizer-table: + +4. You can now run any SkyPilot task on your Kubernetes cluster. + + .. code-block:: console + + $ sky launch --cpus 2+ task.yaml + == Optimizer == + Target: minimizing cost + Estimated cost: $0.0 / hour + + Considered resources (1 node): + --------------------------------------------------------------------------------------------------- + CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN + --------------------------------------------------------------------------------------------------- + Kubernetes 2CPU--2GB 2 2 - kubernetes 0.00 ✔ + AWS m6i.large 2 8 - us-east-1 0.10 + Azure Standard_D2s_v5 2 8 - eastus 0.10 + GCP n2-standard-2 2 8 - us-central1 0.10 + IBM bx2-8x32 8 32 - us-east 0.38 + Lambda gpu_1x_a10 30 200 A10:1 us-east-1 0.60 + ---------------------------------------------------------------------------------------------------. + + +.. note:: + SkyPilot will use the cluster and namespace set in the ``current-context`` in the + kubeconfig file. To manage your ``current-context``: + + .. code-block:: console + + $ # See current context + $ kubectl config current-context + + $ # Switch current-context + $ kubectl config use-context mycontext + + $ # Set a specific namespace to be used in the current-context + $ kubectl config set-context --current --namespace=mynamespace + + +Using Custom Images +------------------- +By default, we use and maintain a SkyPilot container image that has conda and a few other basic tools installed. + +To use your own image, add :code:`image_id: docker:` to the :code:`resources` section of your task YAML. + +.. code-block:: yaml + + resources: + image_id: docker:myrepo/myimage:latest + ... + +Your image must satisfy the following requirements: + +* Image must be **debian-based** and must have the apt package manager installed. +* The default user in the image must have root privileges or passwordless sudo access. + +.. note:: + + If your cluster runs on non-x86_64 architecture (e.g., Apple Silicon), your image must be built natively for that architecture. Otherwise, your job may get stuck at :code:`Start streaming logs ...`. See `GitHub issue `_ for more. + +Using Images from Private Repositories +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +To use images from private repositories (e.g., Private DockerHub, Amazon ECR, Google Container Registry), create a `secret `_ in your Kubernetes cluster and edit your :code:`~/.sky/config.yaml` to specify the secret like so: + +.. code-block:: yaml + + kubernetes: + pod_config: + spec: + imagePullSecrets: + - name: your-secret-here + +.. tip:: + + If you use Amazon ECR, your secret credentials may expire every 12 hours. Consider using `k8s-ecr-login-renew `_ to automatically refresh your secrets. + + +Opening Ports +------------- + +Opening ports on SkyPilot clusters running on Kubernetes is supported through two modes: + +1. `LoadBalancer services `_ (default) +2. `Nginx IngressController `_ + +One of these modes must be supported and configured on your cluster. Refer to the :ref:`setting up ports on Kubernetes guide ` on how to do this. + +.. tip:: + + On Google GKE, Amazon EKS or other cloud-hosted Kubernetes services, the default LoadBalancer services mode is supported out of the box and no additional configuration is needed. + +Once your cluster is configured, launch a task which exposes services on a port by adding :code:`ports` to the :code:`resources` section of your task YAML. + +.. code-block:: yaml + + # task.yaml + resources: + ports: 8888 + + run: | + python -m http.server 8888 + +After launching the cluster with :code:`sky launch -c myclus task.yaml`, you can get the URL to access the port using :code:`sky status --endpoints myclus`. + +.. code-block:: bash + + # List all ports exposed by the cluster + $ sky status --endpoints myclus + 8888: 34.173.13.241:8888 + + # curl a specific port's endpoint + $ curl $(sky status --endpoint 8888 myclus) + ... + +.. tip:: + + To learn more about opening ports in SkyPilot tasks, see :ref:`Opening Ports `. + +FAQs +---- + +* **Are autoscaling Kubernetes clusters supported?** + + To run on an autoscaling cluster, you may need to adjust the resource provisioning timeout (:code:`Kubernetes.TIMEOUT` in `clouds/kubernetes.py`) to a large value to give enough time for the cluster to autoscale. We are working on a better interface to adjust this timeout - stay tuned! + +* **Can SkyPilot provision a Kubernetes cluster for me? Will SkyPilot add more nodes to my Kubernetes clusters?** + + The goal of Kubernetes support is to run SkyPilot tasks on an existing Kubernetes cluster. It does not provision any new Kubernetes clusters or add new nodes to an existing Kubernetes cluster. + +* **I have multiple users in my organization who share the same Kubernetes cluster. How do I provide isolation for their SkyPilot workloads?** + + For isolation, you can create separate Kubernetes namespaces and set them in the kubeconfig distributed to users. SkyPilot will use the namespace set in the kubeconfig for running all tasks. + +* **How do I view the pods created by SkyPilot on my Kubernetes cluster?** + + You can use your existing observability tools to filter resources with the label :code:`parent=skypilot` (:code:`kubectl get pods -l 'parent=skypilot'`). As an example, follow the instructions :ref:`here ` to deploy the Kubernetes Dashboard on your cluster. + +* **How can I specify custom configuration for the pods created by SkyPilot?** + + You can override the pod configuration used by SkyPilot by setting the :code:`pod_config` key in :code:`~/.sky/config.yaml`. + The value of :code:`pod_config` should be a dictionary that follows the `Kubernetes Pod API `_. + + For example, to set custom environment variables and attach a volume on your pods, you can add the following to your :code:`~/.sky/config.yaml` file: + + .. code-block:: yaml + + kubernetes: + pod_config: + spec: + containers: + - env: + - name: MY_ENV_VAR + value: MY_ENV_VALUE + volumeMounts: # Custom volume mounts for the pod + - mountPath: /foo + name: example-volume + volumes: + - name: example-volume + hostPath: + path: /tmp + type: Directory + + For more details refer to :ref:`config-yaml`. diff --git a/llm/gemma/serve.yaml b/llm/gemma/serve.yaml index 4c5a2c984c5..bd5658e8e2f 100644 --- a/llm/gemma/serve.yaml +++ b/llm/gemma/serve.yaml @@ -24,21 +24,9 @@ resources: ports: 8000 disk_tier: best -setup: | - conda activate gemma - if [ $? -ne 0 ]; then - conda create -n gemma -y python=3.10 - conda activate gemma - fi - pip install vllm==0.3.2 - pip install transformers==4.38.1 - python -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')" - run: | conda activate gemma export PATH=$PATH:/sbin - # --max-model-len is set to 1024 to avoid taking too much GPU memory on L4 and - # A10g with small memory. python -u -m vllm.entrypoints.openai.api_server \ --host 0.0.0.0 \ --model $MODEL_NAME \