Skip to content

Commit

Permalink
ml-platform release from development branch (GoogleCloudPlatform#715)
Browse files Browse the repository at this point in the history
* Updated terraform providers

* Standardized GitOps scripts and added Kueue

* Added initial test harness

* Added h100 DWS node pool

* Add notebook packaging guide to docs (GoogleCloudPlatform#690)

add notebook packaging guide

* Added enhancements to the dataprocessing use cases

* Updated Kueue to use the 0.7.0 manifests

* Increased the cluster resource limits

* Added products and features outline

* Added Secret Manager add-on to the cluster

* Changed configsync git respository name to allow for easier use of multiple environments

* Added a GitLab project module

* Standardized git variables to support GitHub or GitLab

* Added a100 40GB node pools

* Moved cpu node pool from n2 to n4 machines

* Add environment_name to the Ray dashboard endpoint

* Removed fleet level configmanagement and Google service accounts for each namespace to allow for multiple environments in a single project

* Added Config Controller Terraform module

* Added NVIDIA DCGM

* Added allow KubeRay Operator to the namespace network policy

---------

Co-authored-by: Kent Hua <[email protected]>
Co-authored-by: Jun Sheng <[email protected]>
Co-authored-by: Ishmeet Mehta <[email protected]>
Co-authored-by: Kavitha Rajendran <[email protected]>
Co-authored-by: kenthua <[email protected]>
  • Loading branch information
6 people committed Jul 3, 2024
1 parent 1892f97 commit 0d7231d
Show file tree
Hide file tree
Showing 139 changed files with 18,135 additions and 1,159 deletions.
2 changes: 1 addition & 1 deletion benchmarks/infra/stage-1/sample-tfvars/gpu-sample.tfvars
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
project_id = "$PROJECT_ID"
project_id = "$PROJECT_ID"
cluster_name = "ai-benchmark"
region = "us-central1"
gke_location = "us-central1-a"
Expand Down
2 changes: 2 additions & 0 deletions best-practices/ml-platform/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
test/log/*.log
test/scripts/locks/*.lock
6 changes: 6 additions & 0 deletions best-practices/ml-platform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ This reference architecture demonstrates how to build a GKE platform that facili
- Platform admins will create a namespace per application and provide the application team member full access to it.
- The namespace scoped resources will be created by the Application/ML teams either via [Config Sync][config-sync] or through a deployment tool like [Cloud Deploy][cloud-deploy]

For an outline of products and features used in the platform, see the [Platform Products and Features](/best-practices/ml-platform/docs/platform/products-and-features.md) document.

## Critical User Journeys (CUJs)

### Persona : Platform Admin
Expand Down Expand Up @@ -60,6 +62,10 @@ This reference architecture demonstrates how to build a GKE platform that facili

- [Distributed Data Processing with Ray](examples/use-case/ray/dataprocessing/README.md): Run a distributed data processing job using Ray.

## Resources

- [Packaging Jupyter notebooks](docs/notebook/packaging.md): Patterns and tools to get your ipynb's ready for deployment in a container runtime.

[gitops]: https://about.gitlab.com/topics/gitops/
[repo-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields
[root-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 93 additions & 0 deletions best-practices/ml-platform/docs/notebook/packaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Packaging Jupyter notebook as deployable code

Jupyter notebook is widely used by data scientists and machine learning experts in their day to day work to interactively and iteratively develop. However, the `ipynb` format is typically not used as a deployable or packagable artifact. There are two scenarios that notebooks are converted to deployable/package artifacts:
1. Model training tasks needed to convert to batch jobs to scale up with more computational resources
1. Model inference tasks needed to convert to an API server to serve the end-user requests

In this guide we will showcase two different tools which may help faciliate converting your notebook to a deployable/packageable raw python library.

This process can also be automated utilizing Continuous Integration (CI) tools such as [Cloud Build](https://cloud.google.com/build/).

## Use jupytext to convert notebook to raw python and containerize

1. Update the notebook to `Pair Notebook with Percent Format`

Jupytext comes with recent jupyter notebook or jupyter-lab. In addition to just converting from `ipynb` to python, it can pair between the formats. This allows for updates made in `ipynb` to be propagated to python and vice versa.

To pair the notebook, simply use the pair function in the File menu:

![jupyter-pairing](../images/notebook/jupyter-pairing.png)

In this example we use the file [gpt-j-online.ipynb](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/examples/notebooks/gpt-j-online.ipynb):![jupyter-gpt-j-online-ipynb](../images/notebook/jupyter-gpt-j-online-ipynb.png)

1. After pairing, we get the generated raw python:

![jupyter-gpt-j-online-py](../images/notebook/jupyter-gpt-j-online-py.png)

**NOTE**: This conversion can also be performed via the `jupytext` cli with the following command:

```sh
jupytext --set-formats ipynb,py:percent \
--to py gpt-j-online.ipynb
```

1. Extract the module dependencies

In the notebook environment, users typically install required python modules using `pip install` commands, but in the container environment, these dependencies need to be installed into the container prior to executing the python library.

We can use the `pipreqs` tool to generate the dependencies. Add the following snippet in a new cell of your notebook and run it:

```sh
!pip install pipreqs
!pipreqs --scan-notebooks
```

The following is an example output:

![jupyter-generate-requirements](../images/notebook/jupyter-generate-requirements.png)
**NOTE**: (the `!cat requirements.txt` line is an example of the generated `requirements.txt`)

1. Create the Dockerfile

To create the docker image of your generated raw python, we need to create a `Dockerfile`, below is an example. Replace `_THE_GENERATED_PYTHON_FILE_` with your generated python file:

```Dockerfile
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
RUN apt-get update && \
apt-get -y --no-install-recommends install python3-dev gcc python3-pip git && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt _THE_GENERATED_PYTHON_FILE_ /_THE_GENERATED_PYTHON_FILE_
RUN pip3 install --no-cache-dir -r requirements.txt
ENV PYTHONUNBUFFERED 1
CMD python3 /_THE_GENERATED_PYTHON_FILE_
```

1. [Optional] Lint and remove unused code

Using `pylint` to validate the generated code is a good practice. Pylint can detect unordered `import` statements, unused code and provide code readability suggestions.

To use `pylint`, create a new cell in your notebook, run the code below and replace `_THE_GENERATED_PYTHON_FILE_` to your filename:

```sh
!pip install pylint
!pylint _THE_GENERATED_PYTHON_FILE_
```

## Use nbconvert to convert notebook to raw python

We can convert a Jupyter notebook to python script using nbconvert tool.
The nbconvert tool is available inside your Jupyter notebook environment in Google Colab Enterprise. If you are in another environment and it is not available, it can be found [here](https://pypi.org/project/nbconvert/)

1. Run the nbconvert command in your notebook. In this example, we are using `gsutil` to copy the notebook to the Colab Enterprise notebook.

```sh
!jupyter nbconvert --to python Fine-tune-Llama-Google-Colab.ipynb
```

Below is an example of the commands
![jupyter-nbconvert](../images/notebook/jupyter-nbconvert.png)
Loading

0 comments on commit 0d7231d

Please sign in to comment.