diff --git a/ml-platform/04_setup_clusters/README.md b/ml-platform/04_setup_clusters/README.md index 5613a0b52..64f8240df 100644 --- a/ml-platform/04_setup_clusters/README.md +++ b/ml-platform/04_setup_clusters/README.md @@ -54,7 +54,7 @@ You just followed `GitOps` to promote changes from `dev` to higher environments. Open the configsync repo and go to `manifests/clusters`, you will see there is a cluster selector created for each cluster via yaml files. ### Install a cluster scoped software -This section describes how platform admins will use the configsync repo to manage cluster scoped software or cluster level objects. These softwares could be used by multiple teams in their namespaces. An example of such softwares is [kuberay][kuberay] that can manage ray clusters in multiple namespace. +This section describes how platform admins will use the configsync repo to manage cluster scoped software or cluster level objects. These software could be used by multiple teams in their namespaces. An example of such software is [kuberay][kuberay] that can manage ray clusters in multiple namespace. Let's install [Kuberay][kuberay] as a cluster level software that includes CRDs and deployments. Kuberay has a component called operator that facilitates `ray` on Kubernetes. We will install Kuberay operator in default namespace. The operator will then orchestrate `ray clusters` created in different namespace by different teams in the future. diff --git a/ml-platform/README.md b/ml-platform/README.md index 5b9d74241..54037e5b2 100644 --- a/ml-platform/README.md +++ b/ml-platform/README.md @@ -35,26 +35,22 @@ It addresses following personae and provides means to automate and simplify thei **CUJ 1** : Use ML tools like `ray` to perform their day to day tasks like data pre-processing, ML training etc. -**CUJ 2** : Use a development environment like Jupyter Notebook for faster inner loop of ML development. +**CUJ 2** : Use a development environment like Jupyter Notebook for faster inner loop of ML development. **[TBD]** ### Operators -**CUJ 1**: Act as a bridge between the Platform admins and the ML Engineers by providing and maintaining softwares needed by the ML engineers so they can focus on their job. +**CUJ 1**: Act as a bridge between the Platform admins and the ML Engineers by providing and maintaining software needed by the ML engineers so they can focus on their job. -**CUJ 2**: Deploying the models. +**CUJ 2**: Deploying the models. **[TBD]** -**CUJ 3**: Building observability on the models. +**CUJ 3**: Building observability on the models. **[TBD]** -**CUJ 4**: Operationalizing the models. +**CUJ 4**: Operationalizing the models. **[TBD]** ## Prerequistes 1. This tutorial has been tested on [Cloud Shell](https://shell.cloud.google.com) which comes preinstalled with [Google Cloud SDK](https://cloud.google.com/sdk) is required to complete this tutorial. -2. It is recommended to start the tutorial in a fresh project since the easiest way to clean up once complete is to delete the project. See [here](https://cloud.google.com/resource-manager/docs/creating-managing-projects) for more details. - -3. This tutorial requires a number of different GCP Quotas (>= 60 T4 GPUs and 400 CPU cores) in the region of your choosing. Please visit the [IAM -> Quotas page](https://console.cloud.google.com/iam-admin/quotas) in the context of your project and region to request additional quota before proceeding with this tutorial. - ## Deploy resources. Follow these steps in order to build the platform and use it. @@ -69,7 +65,7 @@ Follow these steps in order to build the platform and use it. - Run steps in [05_setup_teams][setup-teams]. This modules walks through how as platform admin you can set up spaces for ML teams on the cluster and transfer ownership to operators to maintain that space. -- Run steps in [06_operating_teams][operating-teams]. This module walks through how as an operator you will provide the softwares required by ML engineers. +- Run steps in [06_operating_teams][operating-teams]. This module walks through how as an operator you will provide the software required by ML engineers. [projects]: ./01_gcp_project/README.md