From 1336ed2d183382112c67de101d47118311b59fff Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Mon, 18 Mar 2024 14:08:11 -0500 Subject: [PATCH] Update K8s operator install instructions Signed-off-by: davidmirror-ops --- docs/deployment/plugins/k8s/index.rst | 112 +++++++------------------- 1 file changed, 27 insertions(+), 85 deletions(-) diff --git a/docs/deployment/plugins/k8s/index.rst b/docs/deployment/plugins/k8s/index.rst index c78af9acae..640f091a2e 100644 --- a/docs/deployment/plugins/k8s/index.rst +++ b/docs/deployment/plugins/k8s/index.rst @@ -6,23 +6,18 @@ Configure Kubernetes Plugins .. tags:: Kubernetes, Integration, Spark, AWS, GCP, Advanced This guide help you configure the Flyte integrations that spin up resources on Kubernetes. -The steps are defined in terms of the `deployment method `_ you used to install Flyte. +The steps are defined in terms of the `deployment method `__ you used to install Flyte. Install the Kubernetes operator ------------------------------- +Select the integration you need and follow the steps to install the corresponding Kubernetes operator: + .. tabs:: .. group-tab:: PyTorch/TensorFlow/MPI - First, `install kustomize `__. - - Build and apply the training-operator. - - .. code-block:: bash - - export KUBECONFIG=$KUBECONFIG:~/.kube/config:~/.flyte/k3s/k3s.yaml - kustomize build "https://github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.5.0" | kubectl apply -f - + 1. Install the Kubeflow ``training-operator`` using `manifests https://github.com/kubeflow/training-operator?tab=readme-ov-file#installation>`__. **Optional: Using a gang scheduler** @@ -30,78 +25,38 @@ Install the Kubernetes operator due to resource constraints, you can opt for a gang scheduler. This ensures that all worker pods are scheduled simultaneously, reducing the likelihood of job failures caused by timeout errors. - To `enable gang scheduling for the Kubeflow training-operator `__, - you can install the `Kubernetes scheduler plugins `__ - or the `Apache YuniKorn scheduler `__. - - 1. Install the `scheduler plugin `_ or - `Apache YuniKorn `_ as a second scheduler. - 2. Configure the Kubeflow training-operator to use the new scheduler: - - Create a manifest called ``kustomization.yaml`` with the following content: - - .. code-block:: yaml - - apiVersion: kustomize.config.k8s.io/v1beta1 - kind: Kustomization - - resources: - - github.com/kubeflow/training-operator/manifests/overlays/standalone - - patchesStrategicMerge: - - patch.yaml - - Create a patch file called ``patch.yaml`` with the following content: - - .. code-block:: yaml - - apiVersion: apps/v1 - kind: Deployment - metadata: - name: training-operator - spec: - template: - spec: - containers: - - name: training-operator - command: - - /manager - - --gang-scheduler-name= - - Install the patched kustomization with the following command: + To enable gang scheduling for the ``training-operator``, you can either use the + `Kubernetes scheduler plugins with co-scheduling `__ + or `Apache YuniKorn `__ as a second scheduler. - .. code-block:: bash + 2. Configure a Flyte ``PodTemplate`` to use a gang scheduler for your Tasks: + + **K8s scheduler plugins with co-scheduling** - kustomize build path/to/overlay/directory | kubectl apply -f - + .. code-block::yaml - (Only for Apache YuniKorn) To configure gang scheduling with Apache YuniKorn, - make sure to set the following annotations in Flyte pod templates: + template: + spec: + schedulerName: "scheduler-plugins-scheduler" - - ``template.metadata.annotations.yunikorn.apache.org/task-group-name`` - - ``template.metadata.annotations.yunikorn.apache.org/task-groups`` - - ``template.metadata.annotations.yunikorn.apache.org/schedulingPolicyParameters`` + **Apache Yunikorn** - For more configuration details, - refer to the `Apache YuniKorn Gang-Scheduling documentation - `__. + .. code-block::yaml - 3. Use a Flyte pod template with ``template.spec.schedulerName: scheduler-plugins-scheduler`` - to use the new gang scheduler for your tasks. - - See :ref:`deployment-configuration-general` for more information on pod templates in Flyte. + template: + metadata: + annotations: + yunikorn.apache.org/task-group-name: "" + yunikorn.apache.org/task-groups: "" + yunikorn.apache.org/schedulingPolicyParameters: "" + + + See :ref:`deployment-configuration-general` for more information about Pod templates in Flyte. You can set the scheduler name in the pod template passed to the ``@task`` decorator. However, to prevent the two different schedulers from competing for resources, it is recommended to set the scheduler name in the pod template in the ``flyte`` namespace which is applied to all tasks. Non distributed training tasks can be scheduled by the gang scheduler as well. - - For more information on pod templates in Flyte, see :ref:`deployment-configuration-general`. - You can set the scheduler name in the pod template passed to the ``@task`` decorator. - However, to avoid resource competition between the two different schedulers, - it is recommended to set the scheduler name in the pod template in the ``flyte`` namespace, - which is applied to all tasks. This allows non-distributed training tasks to be - scheduled by the gang scheduler as well. - .. group-tab:: Ray To install the Ray Operator, run the following commands: @@ -114,7 +69,7 @@ Install the Kubernetes operator .. group-tab:: Spark - To add the Spark repository, run the following commands: + To add the Spark Helm repository, run the following commands: .. code-block:: bash @@ -128,7 +83,7 @@ Install the Kubernetes operator .. group-tab:: Dask - To add the Dask repository, run the following command: + To add the Dask Helm repository, run the following command: .. code-block:: bash @@ -140,19 +95,6 @@ Install the Kubernetes operator helm install dask-operator dask/dask-kubernetes-operator --namespace dask-operator --create-namespace - - - - - - - - - - - - - Spin up a cluster -----------------