ml-platform release from development branch (GoogleCloudPlatform#715)

* Updated terraform providers * Standardized GitOps scripts and added Kueue * Added initial test harness * Added h100 DWS node pool * Add notebook packaging guide to docs (GoogleCloudPlatform#690) add notebook packaging guide * Added enhancements to the dataprocessing use cases * Updated Kueue to use the 0.7.0 manifests * Increased the cluster resource limits * Added products and features outline * Added Secret Manager add-on to the cluster * Changed configsync git respository name to allow for easier use of multiple environments * Added a GitLab project module * Standardized git variables to support GitHub or GitLab * Added a100 40GB node pools * Moved cpu node pool from n2 to n4 machines * Add environment_name to the Ray dashboard endpoint * Removed fleet level configmanagement and Google service accounts for each namespace to allow for multiple environments in a single project * Added Config Controller Terraform module * Added NVIDIA DCGM * Added allow KubeRay Operator to the namespace network policy --------- Co-authored-by: Kent Hua <[email protected]> Co-authored-by: Jun Sheng <[email protected]> Co-authored-by: Ishmeet Mehta <[email protected]> Co-authored-by: Kavitha Rajendran <[email protected]> Co-authored-by: kenthua <[email protected]>
xiangshen-dk · Jul 3, 2024 · 0d7231d · 0d7231d
1 parent 1892f97
commit 0d7231d
Show file tree

Hide file tree

Showing 139 changed files with 18,135 additions and 1,159 deletions.
diff --git a/benchmarks/infra/stage-1/sample-tfvars/gpu-sample.tfvars b/benchmarks/infra/stage-1/sample-tfvars/gpu-sample.tfvars
@@ -1,4 +1,4 @@
-project_id = "$PROJECT_ID"
+project_id   = "$PROJECT_ID"
 cluster_name = "ai-benchmark"
 region       = "us-central1"
 gke_location = "us-central1-a"

diff --git a/best-practices/ml-platform/.gitignore b/best-practices/ml-platform/.gitignore
@@ -0,0 +1,2 @@
+test/log/*.log
+test/scripts/locks/*.lock
diff --git a/best-practices/ml-platform/README.md b/best-practices/ml-platform/README.md
@@ -10,6 +10,8 @@ This reference architecture demonstrates how to build a GKE platform that facili
 - Platform admins will create a namespace per application and provide the application team member full access to it.
 - The namespace scoped resources will be created by the Application/ML teams either via [Config Sync][config-sync] or through a deployment tool like [Cloud Deploy][cloud-deploy]
 
+For an outline of products and features used in the platform, see the [Platform Products and Features](/best-practices/ml-platform/docs/platform/products-and-features.md) document.
+
 ## Critical User Journeys (CUJs)
 
 ### Persona : Platform Admin
@@ -60,6 +62,10 @@ This reference architecture demonstrates how to build a GKE platform that facili
 
 - [Distributed Data Processing with Ray](examples/use-case/ray/dataprocessing/README.md): Run a distributed data processing job using Ray.
 
+## Resources
+
+- [Packaging Jupyter notebooks](docs/notebook/packaging.md): Patterns and tools to get your ipynb's ready for deployment in a container runtime.
+
 [gitops]: https://about.gitlab.com/topics/gitops/
 [repo-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields
 [root-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields

diff --git a/best-practices/ml-platform/docs/images/notebook/dockerfile.png b/best-practices/ml-platform/docs/images/notebook/dockerfile.png
diff --git a/best-practices/ml-platform/docs/images/notebook/jupyter-generate-requirements.png b/best-practices/ml-platform/docs/images/notebook/jupyter-generate-requirements.png
diff --git a/best-practices/ml-platform/docs/images/notebook/jupyter-gpt-j-online-ipynb.png b/best-practices/ml-platform/docs/images/notebook/jupyter-gpt-j-online-ipynb.png
diff --git a/best-practices/ml-platform/docs/images/notebook/jupyter-gpt-j-online-py.png b/best-practices/ml-platform/docs/images/notebook/jupyter-gpt-j-online-py.png
diff --git a/best-practices/ml-platform/docs/images/notebook/jupyter-nbconvert.png b/best-practices/ml-platform/docs/images/notebook/jupyter-nbconvert.png
diff --git a/best-practices/ml-platform/docs/images/notebook/jupyter-pairing.png b/best-practices/ml-platform/docs/images/notebook/jupyter-pairing.png
diff --git a/best-practices/ml-platform/docs/notebook/packaging.md b/best-practices/ml-platform/docs/notebook/packaging.md
@@ -0,0 +1,93 @@
+# Packaging Jupyter notebook as deployable code
+
+Jupyter notebook is widely used by data scientists and machine learning experts in their day to day work to interactively and iteratively develop. However, the `ipynb` format is typically not used as a deployable or packagable artifact. There are two scenarios that notebooks are converted to deployable/package artifacts: 
+  1. Model training tasks needed to convert to batch jobs to scale up with more computational resources
+  1. Model inference tasks needed to convert to an API server to serve the end-user requests
+
+In this guide we will showcase two different tools which may help faciliate converting your notebook to a deployable/packageable raw python library.
+
+This process can also be automated utilizing Continuous Integration (CI) tools such as [Cloud Build](https://cloud.google.com/build/).
+
+## Use jupytext to convert notebook to raw python and containerize
+
+1. Update the notebook to `Pair Notebook with Percent Format`
+
+    Jupytext comes with recent jupyter notebook or jupyter-lab. In addition to just converting from `ipynb` to python, it can pair between the formats. This allows for updates made in `ipynb` to be propagated to python and vice versa.
+
+    To pair the notebook, simply use the pair function in the File menu:
+
+    ![jupyter-pairing](../images/notebook/jupyter-pairing.png)
+
+    In this example we use the file [gpt-j-online.ipynb](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/examples/notebooks/gpt-j-online.ipynb):![jupyter-gpt-j-online-ipynb](../images/notebook/jupyter-gpt-j-online-ipynb.png)
+
+1. After pairing, we get the generated raw python:
+
+    ![jupyter-gpt-j-online-py](../images/notebook/jupyter-gpt-j-online-py.png)
+
+    **NOTE**: This conversion can also be performed via the `jupytext` cli with the following command:
+
+    ```sh
+    jupytext --set-formats ipynb,py:percent \
+        --to py gpt-j-online.ipynb
+    ```
+
+1. Extract the module dependencies
+
+    In the notebook environment, users typically install required python modules using `pip install` commands, but in the container environment, these dependencies need to be installed into the container prior to executing the python library.
+
+    We can use the `pipreqs` tool to generate the dependencies. Add the following snippet in a new cell of your notebook and run it:
+
+    ```sh
+    !pip install pipreqs
+    !pipreqs --scan-notebooks
+    ```
+
+    The following is an example output:
+
+    ![jupyter-generate-requirements](../images/notebook/jupyter-generate-requirements.png)
+    **NOTE**: (the `!cat requirements.txt` line is an example of the generated `requirements.txt`)
+
+1. Create the Dockerfile
+
+    To create the docker image of your generated raw python, we need to create a `Dockerfile`, below is an example. Replace `_THE_GENERATED_PYTHON_FILE_` with your generated python file:
+
+    ```Dockerfile
+    FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
+
+    RUN apt-get update && \
+        apt-get -y --no-install-recommends install python3-dev gcc python3-pip git && \
+        rm -rf /var/lib/apt/lists/*
+
+    COPY requirements.txt _THE_GENERATED_PYTHON_FILE_ /_THE_GENERATED_PYTHON_FILE_
+    
+    RUN pip3 install --no-cache-dir -r requirements.txt
+
+    ENV PYTHONUNBUFFERED 1
+
+    CMD python3 /_THE_GENERATED_PYTHON_FILE_
+    ```
+
+1. [Optional] Lint and remove unused code
+
+    Using `pylint` to validate the generated code is a good practice. Pylint can detect unordered `import` statements, unused code and provide code readability suggestions.   
+
+    To use `pylint`, create a new cell in your notebook, run the code below and replace `_THE_GENERATED_PYTHON_FILE_` to your filename:
+
+    ```sh
+    !pip install pylint
+    !pylint _THE_GENERATED_PYTHON_FILE_
+    ```
+
+## Use nbconvert to convert notebook to raw python
+
+We can convert a Jupyter notebook to python script using nbconvert tool.  
+The nbconvert tool is available inside your Jupyter notebook environment in Google Colab Enterprise. If you are in another environment and it is not available, it can be found [here](https://pypi.org/project/nbconvert/)
+
+1. Run the nbconvert command in your notebook. In this example, we are using `gsutil` to copy the notebook to the Colab Enterprise notebook.
+
+    ```sh
+    !jupyter nbconvert --to python Fine-tune-Llama-Google-Colab.ipynb
+    ```
+
+    Below is an example of the commands
+    ![jupyter-nbconvert](../images/notebook/jupyter-nbconvert.png)