diff --git a/best-practices/ml-platform/README.md b/best-practices/ml-platform/README.md index b5bc89c9c..9261ba049 100644 --- a/best-practices/ml-platform/README.md +++ b/best-practices/ml-platform/README.md @@ -51,329 +51,13 @@ This reference architecture demonstrates how to build a GKE platform that facili - [Google Configuration Management repo-sync][repo-sync] - [GitHub][github] -## Deploy a single environment reference architecture +## Deploy the platform -README +[Sandbox Reference Architecture Guide](examples/platform/sandbox/README.md): Set up an environment to familiarize yourself with the architecture and get an understanding of the concepts. -This is the quick-start deployment guide. It can be used to set up an environment to familiarize yourself with the architecture and get an understanding of the concepts. +## Use cases -### Requirements - -In this guide you can choose to bring your project (BYOP) or have Terraform create a new project for you. The requirements are difference based on the option that you choose. - -#### Bring your own project (BYOP) - -- Project ID of a new Google Cloud Project, preferably with no APIs enabled -- `roles/owner` IAM permissions on the project -- GitHub Personal Access Token, steps to create the token are provided below - -#### Terraform managed project - -- Billing account ID -- Organization or folder ID -- `roles/billing.user` IAM permissions on the billing account specified -- `roles/resourcemanager.projectCreator` IAM permissions on the organization or folder specified -- GitHub Personal Access Token, steps to create the token are provided below - -### Pull the source code - -- Clone the repository and change directory to the guide directory - - ``` - git clone https://github.com/GoogleCloudPlatform/ai-on-gke - cd ai-on-gke/ml-platform - ``` - -- Set environment variables - - ``` - export MLP_BASE_DIR=$(pwd) && \ - echo "export MLP_BASE_DIR=${MLP_BASE_DIR}" >> ${HOME}/.bashrc - ``` - -### GitHub Configuration - -- Create a [Personal Access Token][personal-access-token] in [GitHub][github]: - - Note: It is recommended to use a [machine user account][machine-user-account] for this but you can use a personal user account just to try this reference architecture. - - **Fine-grained personal access token** - - - Go to https://github.com/settings/tokens and login using your credentials - - Click "Generate new token" >> "Generate new token (Beta)". - - Enter a Token name. - - Select the expiration. - - Select the Resource owner. - - Select All repositories - - Set the following Permissions: - - Repository permissions - - Administration: Read and write - - Content: Read and write - - Click "Generate token" - - **Personal access tokens (classic)** - - - Go to https://github.com/settings/tokens and login using your credentials - - Click "Generate new token" >> "Generate new token (classic)". - - You will be directed to a screen to created the new token. Provide the note and expiration. - - Choose the following two access: - - [x] repo - Full control of private repositories - - [x] delete_repo - Delete repositories - - Click "Generate token" - -- Store the token in a secure file. - - ``` - # Create a secure directory - mkdir -p ${HOME}/secrets/ - chmod go-rwx ${HOME}/secrets - - # Create a secure file - touch ${HOME}/secrets/mlp-github-token - chmod go-rwx ${HOME}/secrets/mlp-github-token - - # Put the token in the secure file using your preferred editor - nano ${HOME}/secrets/mlp-github-token - ``` - -- Set the GitHub environment variables in Cloud Shell - - Replace the following values: - - - `` is the GitHub organization or user namespace to use for the repositories - - `` is the GitHub account to use for authentication - - `` is the email address to use for commit - - ``` - export MLP_GITHUB_ORG="" - export MLP_GITHUB_USER="" - export MLP_GITHUB_EMAIL="" - ``` - -- Set the configuration variables - - ``` - sed -i "s/YOUR_GITHUB_EMAIL/${MLP_GITHUB_EMAIL}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - sed -i "s/YOUR_GITHUB_ORG/${MLP_GITHUB_ORG}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - sed -i "s/YOUR_GITHUB_USER/${MLP_GITHUB_USER}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - ``` - -### Project Configuration - -You only need to complete the section for the option that you have selected. - -#### Bring your own project (BYOP) - -- Set the project environment variables in Cloud Shell - - Replace the following values - - - `` is the ID of your existing Google Cloud project - - ``` - export MLP_PROJECT_ID="" - export MLP_STATE_BUCKET="${MLP_PROJECT_ID}-tf-state" - ``` - -- Set the default `gcloud` project - - ``` - gcloud config set project ${MLP_PROJECT_ID} - ``` - -- Authorize `gcloud` - - ``` - gcloud auth login --activate --no-launch-browser --quiet --update-adc - ``` - -- Create a Cloud Storage bucket to store the Terraform state - - ``` - gcloud storage buckets create gs://${MLP_STATE_BUCKET} --project ${MLP_PROJECT_ID} - ``` - -- Set the configuration variables - - ``` - sed -i "s/YOUR_STATE_BUCKET/${MLP_STATE_BUCKET}/g" ${MLP_BASE_DIR}/terraform/backend.tf - sed -i "s/YOUR_PROJECT_ID/${MLP_PROJECT_ID}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - ``` - -#### Terraform managed project - -- Set the configuration variables - - ``` - nano ${MLP_BASE_DIR}/terraform/initialize/initialize.auto.tfvars - ``` - - ``` - project = { - billing_account_id = "XXXXXX-XXXXXX-XXXXXX" - folder_id = "############" - name = "mlp" - org_id = "############" - } - ``` - - > `project.billing_account_id` the billing account ID - > - > Enter either `project.folder_id` **OR** `project.org_id` - > `project.folder_id` the folder ID - > `project.org_id` the organization ID - -- Authorize `gcloud` - - ``` - gcloud auth login --activate --no-launch-browser --quiet --update-adc - ``` - -- Create a new project - - ``` - cd ${MLP_BASE_DIR}/terraform/initialize - terraform init && \ - terraform plan -input=false -out=tfplan && \ - terraform apply -input=false tfplan && \ - rm tfplan && \ - terraform init -force-copy -migrate-state && \ - rm -rf state - ``` - -### Run Terraform - -- Create the resources - - ``` - cd ${MLP_BASE_DIR}/terraform && \ - terraform init && \ - terraform plan -input=false -var github_token="$(tr --delete '\n' < ${HOME}/secrets/mlp-github-token)" -out=tfplan && \ - terraform apply -input=false tfplan - rm tfplan - ``` - -### Review the resources - -#### GKE clusters and ConfigSync - -- Go to Google Cloud Console, click on the navigation menu and click on Kubernetes Engine > Clusters. You should see one cluster. - -- Go to Google Cloud Console, click on the navigation menu and click on Kubernetes Engine > Config. If you haven't enabled GKE Enterprise in the project earlier, Click `LEARN AND ENABLE` button and then `ENABLE GKE ENTERPRISE`. You should see a RootSync and RepoSync object. - ![configsync](docs/images/configsync.png) - -#### Software installed via RepoSync and RootSync - -Open Cloud Shell to execute the following commands: - -- Store your GKE cluster name in env variable: - - `export GKE_CLUSTER=` - -- Get cluster credentials: - - ``` - gcloud container fleet memberships get-credentials ${GKE_CLUSTER} - ``` - -- Fetch KubeRay operator CRDs - - ``` - kubectl get crd | grep ray - ``` - - The output will be similar to the following: - - ``` - rayclusters.ray.io 2024-02-12T21:19:06Z - rayjobs.ray.io 2024-02-12T21:19:09Z - rayservices.ray.io 2024-02-12T21:19:12Z - ``` - -- Fetch KubeRay operator pod - - ``` - kubectl get pods - ``` - - The output will be similar to the following: - - ``` - NAME READY STATUS RESTARTS AGE - kuberay-operator-56b8d98766-2nvht 1/1 Running 0 6m26s - ``` - -- Check the namespace `ml-team` created: - - ``` - kubectl get ns | grep ml-team - ``` - -- Check the RepoSync object created `ml-team` namespace: - ``` - kubectl get reposync -n ml-team - ``` -- Check the `raycluster` in `ml-team` namespace - - ``` - kubectl get raycluster -n ml-team - ``` - - The output will be similar to the following: - - ``` - NAME DESIRED WORKERS AVAILABLE WORKERS STATUS AGE - ray-cluster-kuberay 1 1 ready 29m - ``` - -- Check the head and worker pods of kuberay in `ml-team` namespace - ``` - kubectl get pods -n ml-team - ``` - The output will be similar to the following: - ``` - NAME READY STATUS RESTARTS AGE - ray-cluster-kuberay-head-sp6dg 2/2 Running 0 3m21s - ray-cluster-kuberay-worker-workergroup-rzpjw 2/2 Running 0 3m21s - ``` - -### Cleanup - -- Destroy the resources - - ``` - cd ${MLP_BASE_DIR}/terraform && \ - terraform init && \ - terraform destroy -auto-approve -var github_token="$(tr --delete '\n' < ${HOME}/secrets/mlp-github-token)" && \ - rm -rf .terraform .terraform.lock.hcl - ``` - -#### Project - -You only need to complete the section for the option that you have selected. - -##### Bring your own project (BYOP) - -- Delete the project - - ``` - gcloud projects delete ${MLP_PROJECT_ID} - ``` - -#### Terraform managed project - -- Destroy the project - - ``` - cd ${MLP_BASE_DIR}/terraform/initialize && \ - TERRAFORM_BUCKET_NAME=$(grep bucket backend.tf | awk -F"=" '{print $2}' | xargs) && \ - cp backend.tf.local backend.tf && \ - terraform init -force-copy -lock=false -migrate-state && \ - gsutil -m rm -rf gs://${TERRAFORM_BUCKET_NAME}/* && \ - terraform init && \ - terraform destroy -auto-approve && \ - rm -rf .terraform .terraform.lock.hcl - ``` +- [Distributed Data Processing with Ray](examples/use-case/ray/dataprocessing/README.md): Run a distributed data processing job using Ray. [gitops]: https://about.gitlab.com/topics/gitops/ [repo-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields diff --git a/best-practices/ml-platform/examples/platform/sandbox/README.md b/best-practices/ml-platform/examples/platform/sandbox/README.md index e4cdb3e6e..0ce83cbc7 100644 --- a/best-practices/ml-platform/examples/platform/sandbox/README.md +++ b/best-practices/ml-platform/examples/platform/sandbox/README.md @@ -2,6 +2,8 @@ This quick-start deployment guide can be used to set up an environment to familiarize yourself with the architecture and get an understanding of the concepts. +**NOTE: This environment is not intended to be a long lived environment. It is intended for temporary demonstration and learning purposes.** + ### Requirements In this guide you can choose to bring your project (BYOP) or have Terraform create a new project for you. The requirements are difference based on the option that you choose. @@ -25,8 +27,8 @@ In this guide you can choose to bring your project (BYOP) or have Terraform crea - Clone the repository and change directory to the guide directory ``` - git clone https://github.com/GoogleCloudPlatform/ai-on-gke - cd ai-on-gke/ml-platform + git clone https://github.com/GoogleCloudPlatform/ai-on-gke && \ + cd ai-on-gke/best-practices/ml-platform ``` - Set environment variables @@ -34,6 +36,10 @@ In this guide you can choose to bring your project (BYOP) or have Terraform crea ``` export MLP_BASE_DIR=$(pwd) && \ echo "export MLP_BASE_DIR=${MLP_BASE_DIR}" >> ${HOME}/.bashrc + + cd examples/platform/sandbox && \ + export MLP_TYPE_BASE_DIR=$(pwd) && \ + echo "export MLP_TYPE_BASE_DIR=${MLP_TYPE_BASE_DIR}" >> ${HOME}/.bashrc ``` ### GitHub Configuration @@ -98,9 +104,9 @@ In this guide you can choose to bring your project (BYOP) or have Terraform crea - Set the configuration variables ``` - sed -i "s/YOUR_GITHUB_EMAIL/${MLP_GITHUB_EMAIL}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - sed -i "s/YOUR_GITHUB_ORG/${MLP_GITHUB_ORG}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars - sed -i "s/YOUR_GITHUB_USER/${MLP_GITHUB_USER}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars + sed -i "s/YOUR_GITHUB_EMAIL/${MLP_GITHUB_EMAIL}/g" ${MLP_TYPE_BASE_DIR}/mlp.auto.tfvars + sed -i "s/YOUR_GITHUB_ORG/${MLP_GITHUB_ORG}/g" ${MLP_TYPE_BASE_DIR}/mlp.auto.tfvars + sed -i "s/YOUR_GITHUB_USER/${MLP_GITHUB_USER}/g" ${MLP_TYPE_BASE_DIR}/mlp.auto.tfvars ``` ### Project Configuration @@ -141,8 +147,8 @@ You only need to complete the section for the option that you have selected. - Set the configuration variables ``` - sed -i "s/YOUR_STATE_BUCKET/${MLP_STATE_BUCKET}/g" ${MLP_BASE_DIR}/terraform/backend.tf - sed -i "s/YOUR_PROJECT_ID/${MLP_PROJECT_ID}/g" ${MLP_BASE_DIR}/terraform/mlp.auto.tfvars + sed -i "s/YOUR_STATE_BUCKET/${MLP_STATE_BUCKET}/g" ${MLP_TYPE_BASE_DIR}/backend.tf + sed -i "s/YOUR_PROJECT_ID/${MLP_PROJECT_ID}/g" ${MLP_TYPE_BASE_DIR}/mlp.auto.tfvars ``` #### Terraform managed project @@ -150,7 +156,7 @@ You only need to complete the section for the option that you have selected. - Set the configuration variables ``` - nano ${MLP_BASE_DIR}/terraform/initialize/initialize.auto.tfvars + nano ${MLP_BASE_DIR}/terraform/features/initialize/initialize.auto.tfvars ``` ``` @@ -177,7 +183,7 @@ You only need to complete the section for the option that you have selected. - Create a new project ``` - cd ${MLP_BASE_DIR}/terraform/initialize + cd ${MLP_BASE_DIR}/terraform/features/initialize terraform init && \ terraform plan -input=false -out=tfplan && \ terraform apply -input=false tfplan && \ @@ -191,7 +197,7 @@ You only need to complete the section for the option that you have selected. - Create the resources ``` - cd ${MLP_BASE_DIR}/terraform && \ + cd ${MLP_TYPE_BASE_DIR} && \ terraform init && \ terraform plan -input=false -var github_token="$(tr --delete '\n' < ${HOME}/secrets/mlp-github-token)" -out=tfplan && \ terraform apply -input=false tfplan @@ -205,7 +211,7 @@ You only need to complete the section for the option that you have selected. - Go to Google Cloud Console, click on the navigation menu and click on Kubernetes Engine > Clusters. You should see one cluster. - Go to Google Cloud Console, click on the navigation menu and click on Kubernetes Engine > Config. If you haven't enabled GKE Enterprise in the project earlier, Click `LEARN AND ENABLE` button and then `ENABLE GKE ENTERPRISE`. You should see a RootSync and RepoSync object. - ![configsync](docs/images/configsync.png) + ![configsync](/best-practices/ml-platform/docs/images/configsync.png) #### Software installed via RepoSync and RootSync @@ -287,7 +293,7 @@ Open Cloud Shell to execute the following commands: - Destroy the resources ``` - cd ${MLP_BASE_DIR}/terraform && \ + cd ${MLP_TYPE_BASE_DIR} && \ terraform init && \ terraform destroy -auto-approve -var github_token="$(tr --delete '\n' < ${HOME}/secrets/mlp-github-token)" && \ rm -rf .terraform .terraform.lock.hcl @@ -310,7 +316,7 @@ You only need to complete the section for the option that you have selected. - Destroy the project ``` - cd ${MLP_BASE_DIR}/terraform/initialize && \ + cd ${MLP_BASE_DIR}/terraform/features/initialize && \ TERRAFORM_BUCKET_NAME=$(grep bucket backend.tf | awk -F"=" '{print $2}' | xargs) && \ cp backend.tf.local backend.tf && \ terraform init -force-copy -lock=false -migrate-state && \ diff --git a/best-practices/ml-platform/examples/use-case/ray/dataprocessing/README.md b/best-practices/ml-platform/examples/use-case/ray/dataprocessing/README.md index d573f1c47..9f0e4f6c9 100644 --- a/best-practices/ml-platform/examples/use-case/ray/dataprocessing/README.md +++ b/best-practices/ml-platform/examples/use-case/ray/dataprocessing/README.md @@ -1,20 +1,23 @@ # Distributed Data Processing with Ray on GKE ## Dataset + [This](https://www.kaggle.com/datasets/PromptCloudHQ/flipkart-products) is a pre-crawled public dataset, taken as a subset of a bigger dataset (more than 5.8 million products) that was created by extracting data from [Flipkart](https://www.flipkart.com/), a leading Indian eCommerce store. ## Architecture - ![DataPreprocessing](/ml-platform/docs/images/ray-dataprocessing-workflow.png) + +![DataPreprocessing](/best-practices/ml-platform/docs/images/ray-dataprocessing-workflow.png) ## Data processing steps -The dataset has product information such as id, name, brand, description, image urls, product specifications. +The dataset has product information such as id, name, brand, description, image urls, product specifications. The preprocessing.py file does the following: -* Read the csv from Cloud Storage -* Clean up the product description text -* Extract image urls, validate and download the images into cloud storage -* Cleanup & extract attributes as key-value pairs + +- Read the csv from Cloud Storage +- Clean up the product description text +- Extract image urls, validate and download the images into cloud storage +- Cleanup & extract attributes as key-value pairs ## How to use this repo: @@ -26,14 +29,12 @@ The preprocessing.py file does the following: DOCKER_IMAGE_URL=us-docker.pkg.dev/${PROJECT_ID}/dataprocessing/dp:v0.0.1 ``` - 2. Create a Cloud Storage bucket to store raw data ``` gcloud storage buckets create gs://${PROCESSING_BUCKET} --project ${PROJECT_ID} ``` - 3. Download the raw data csv file from above and store into the bucket created in the previous step. The kaggle cli can be installed using the following [instructions](https://github.com/Kaggle/kaggle-api#installation) To use the cli you must create an API token (Kaggle > User Profile > API > Create New Token), the downloaded file should be stored in HOME/.kaggle/kaggle.json. @@ -60,6 +61,7 @@ The preprocessing.py file does the following: ``` 5. Create Artifact Registry repository for your docker image + ``` gcloud artifacts repositories create dataprocessing \ --repository-format=docker \ @@ -69,6 +71,7 @@ The preprocessing.py file does the following: ``` 6. Build container image using Cloud Build and push the image to Artifact Registry + ``` gcloud builds submit . \ --tag ${DOCKER_IMAGE_URL}:v0.0.1 @@ -91,13 +94,14 @@ kubectl apply -f job.yaml -n ml-team ``` 9. Monitor the execution in Ray Dashboard - a. Jobs -> Running Job ID - i) See the Tasks/actors overview for Running jobs - ii) See the Task Table for a detailed view of task and assigned node(s) - b. Cluster -> Node List - i) See the Ray actors running on the worker process + a. Jobs -> Running Job ID + i) See the Tasks/actors overview for Running jobs + ii) See the Task Table for a detailed view of task and assigned node(s) + b. Cluster -> Node List + i) See the Ray actors running on the worker process + +10. Once the Job is completed, both the prepared dataset as a CSV and the images are stored in Google Cloud Storage. -11. Once the Job is completed, both the prepared dataset as a CSV and the images are stored in Google Cloud Storage. ``` gcloud storage ls \ gs://${PROCESSING_BUCKET}/flipkart_preprocessed_dataset/flipkart.csv