-
Notifications
You must be signed in to change notification settings - Fork 177
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
25 changed files
with
172 additions
and
104 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Ray on GKE | ||
|
||
This repository contains a Terraform template for installing a Standard or Autopilot GKE cluster in your GCP project. | ||
It sets up the cluster to work seamlessly with the `ray-on-gke` and `jupyter-on-gke` modules in the `ai-on-gke` repo. | ||
|
||
Platform resources: | ||
* GKE Cluster | ||
* Nvidia GPU drivers | ||
* Kuberay operator and CRDs | ||
|
||
## Installation | ||
|
||
Preinstall the following on your computer: | ||
* Kubectl | ||
* Terraform | ||
* Helm | ||
* Gcloud | ||
|
||
> **_NOTE:_** Terraform keeps state metadata in a local file called `terraform.tfstate`. If you need to reinstall any resources, make sure to delete this file as well. | ||
### Platform | ||
|
||
1. git clone https://github.com/GoogleCloudPlatform/ai-on-gke | ||
|
||
2. `cd ai-on-gke/gke-platform` | ||
|
||
3. Edit `variables.tf` with your GCP settings. | ||
|
||
4. Run `terraform init` | ||
|
||
5. Run `terraform apply` |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,4 +40,4 @@ variable "enable_tpu" { | |
type = bool | ||
description = "Set to true to create TPU node pool" | ||
default = false | ||
} | ||
} |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# Ray on GKE | ||
|
||
This repository contains a Terraform template for running [JupyterHub](https://jupyter.org/hub) on Google Kubernetes Engine. | ||
We've also included some example notebooks (`ai-on-gke/ray-on-gke/example_notebooks`), including one that serves a GPT-J-6B model with Ray AIR (see | ||
[here](https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html) for the original notebook). | ||
|
||
This module assumes you already have a functional GKE cluster. If not, follow the instructions under `ai-on-gke/gke-platform/README.md` | ||
to install a Standard or Autopilot GKE cluster, then follow the instructions in this module to install Ray. | ||
|
||
This module deploys the following resources, once per user: | ||
* JupyterHub deployment | ||
* User namespace | ||
* Kubernetes service accounts | ||
|
||
## Installation | ||
|
||
Preinstall the following on your computer: | ||
* Kubectl | ||
* Terraform | ||
* Helm | ||
* Gcloud | ||
|
||
> **_NOTE:_** Terraform keeps state metadata in a local file called `terraform.tfstate`. If you need to reinstall any resources, make sure to delete this file as well. | ||
### JupyterHub | ||
|
||
> **_NOTE:_** Currently the cd/user/jupyterhub/jupyter_config/config.yaml has 3 profiles that uses the same jupyter images, this can be changed, as well as the description of these profiles. | ||
1. `cd ai-on-gke/jupyter-on-gke/jupyterhub` | ||
|
||
2. Edit `variables.tf` with your GCP settings. The `<your user name>` that you specify will become a K8s namespace for your Jupyterhub services. | ||
Note: | ||
If using this with the Ray module (`ai-on-gke/ray-on-gke/`), it is recommended to use the same k8s namespace | ||
for both i.e. set this to the same namespace as `ai-on-gke/ray-on-gke/user/variables.tf`. | ||
If not, set `enable_create_namespace` to `true` so a new k8s namespace is created for the Jupyter resources. | ||
|
||
3. Run `terraform init` | ||
|
||
4. Find the name and location of the GKE cluster you want to use. | ||
Run `gcloud container clusters list --project=<your GCP project> to see all the available clusters. | ||
Note: If you created the GKE cluster via the ai-on-gke/gke-platform repo, you can get the cluster info from `ai-on-gke/gke-platform/variables.tf` | ||
|
||
5. Run `gcloud container clusters get-credentials %gke_cluster_name% --location=%location%` | ||
Configuring `gcloud` [instructions](https://cloud.google.com/sdk/docs/initializing) | ||
|
||
6. Run `terraform apply` | ||
|
||
## Using Ray with Jupyter | ||
|
||
1. Run `kubectl get services -n <namespace>`. The namespace is the user name that you specified above. | ||
|
||
2. Copy the external IP for the notebook. | ||
|
||
3. Open the external IP in a browser and login. The default user names and | ||
passwords can be found in the [Jupyter | ||
settings](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/jupyter_config/jupyter_config/config.yaml) file. | ||
|
||
4. The Ray cluster is available at `ray://example-cluster-kuberay-head-svc:10001`. To access the cluster, you can open one of the sample notebooks under `example_notebooks` (via `File` -> `Open from URL` in the Jupyter notebook window and use the raw file URL from GitHub) and run through the example. Ex url: https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/ray-on-gke/example_notebooks/gpt-j-online.ipynb | ||
|
||
5. To use the Ray dashboard, run the following command to port-forward: | ||
``` | ||
kubectl port-forward -n <namespace> service/example-cluster-kuberay-head-svc 8265:8265 | ||
``` | ||
|
||
And then open the dashboard using the following URL: | ||
``` | ||
http://localhost:8265 | ||
``` | ||
|
||
## Securing Your Cluster Endpoints | ||
|
||
For demo purposes, this repo creates a public IP for the Jupyter notebook with basic dummy authentication. To secure your cluster, it is *strong recommended* to replace | ||
this with your own secure endpoints. | ||
|
||
For more information, please take a look at the following links: | ||
* https://cloud.google.com/iap/docs/enabling-kubernetes-howto | ||
* https://cloud.google.com/endpoints/docs/openapi/get-started-kubernetes-engine | ||
* https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/authenticators-users-basics.html | ||
|
||
|
||
## Running GPT-J-6B | ||
|
||
This example is adapted from Ray AIR's examples [here](https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html). | ||
|
||
1. Open the `gpt-j-online.ipynb` notebook under `ai-on-gke/ray-on-gke/example_notebooks`. | ||
|
||
2. Open a terminal in the Jupyter session and install Ray AIR: | ||
``` | ||
pip install ray[air] | ||
``` | ||
|
||
3. Run through the notebook cells. You can change the prompt in the last cell: | ||
``` | ||
prompt = ( | ||
## Input your own prompt here | ||
) | ||
``` | ||
|
||
4. This should output a generated text response. | ||
|
||
|
||
## Logging and Monitoring | ||
|
||
This repository comes with out-of-the-box integrations with Google Cloud Logging | ||
and Managed Prometheus for monitoring. To see your Ray cluster logs: | ||
|
||
1. Open Cloud Console and open Logging | ||
2. If using Jupyter notebook for job submission, use the following query parameters: | ||
``` | ||
resource.type="k8s_container" | ||
resource.labels.cluster_name=%CLUSTER_NAME% | ||
resource.labels.pod_name=%RAY_HEAD_POD_NAME% | ||
resource.labels.container_name="fluentbit" | ||
``` | ||
|
||
To see monitoring metrics: | ||
1. Open Cloud Console and open Metrics Explorer | ||
2. In "Target", select "Prometheus Target" and then "Ray". | ||
3. Select the metric you want to view, and then click "Apply". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters