Skip to content

Commit

Permalink
Upgrade Ray version for Autopilot; shrink worker resource allocation (G…
Browse files Browse the repository at this point in the history
…oogleCloudPlatform#299)

Upgrade ray version; shrink worker resource allocation
  • Loading branch information
artemvmin authored and blackzlq committed Mar 7, 2024
1 parent 8444078 commit c3dcf91
Show file tree
Hide file tree
Showing 13 changed files with 159 additions and 336 deletions.
10 changes: 10 additions & 0 deletions applications/jupyter/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ module "namespace" {
create_namespace = true
}

# IAP Section: Enabled the IAP service
resource "google_project_service" "project_service" {
count = var.add_auth ? 1 : 0
project = var.project_id
service = "iap.googleapis.com"

disable_dependent_services = false
disable_on_destroy = false
}

# Creates jupyterhub
module "jupyterhub" {
source = "../../modules/jupyter"
Expand Down
56 changes: 23 additions & 33 deletions applications/rag/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# RAG-on-GKE Application

**NOTE:** This solution is in beta/a work in progress - please expect friction while using it.
**NOTE:** This solution is in beta. Please expect friction while using it.

This is a sample to deploy a RAG application on GKE. Retrieval Augmented Generation (RAG) is a popular approach for boosting the accuracy of LLM responses, particularly for domain specific or private data sets. The basic idea is to have a semantically searchable knowledge base (often using vector search), which is used to retrieve relevant snippets for a given prompt to provide additional context to the LLM. Augmenting the knowledge base with additional data is typically cheaper than fine tuning and is more scalable when incorporating current events and other rapidly changing data spaces.

Expand Down Expand Up @@ -32,7 +32,7 @@ CLUSTER_REGION=us-central1
```
2. Use the following instructions to create a GKE cluster. We recommend using Autopilot for a simpler setup.

##### Autopilot
##### Autopilot (recommended)

RAG requires the latest Autopilot features, available on GKE cluster version `1.29.1-gke.1575000`+
```
Expand All @@ -46,7 +46,7 @@ gcloud container clusters create-auto ${CLUSTER_NAME:?} \
--cluster-version ${CLUSTER_VERSION:?}
```

##### Standard (recommended)
##### Standard

1. To create a GKE Standard cluster using Terraform, follow the [instructions here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/infrastructure/README.md). Use the preconfigured node pools in `/infrastructure/platform.tfvars` as this solution requires T4s and L4s.

Expand Down Expand Up @@ -105,6 +105,7 @@ gcloud container clusters get-credentials ${CLUSTER_NAME:?} --location ${CLUSTER
```
kubectl port-forward -n ${NAMESPACE:?} deployment/mistral-7b-instruct 8080:8080
```

* In a new terminal, try a few prompts:
```
export USER_PROMPT="How to deploy a container on K8s?"
Expand All @@ -119,6 +120,7 @@ curl 127.0.0.1:8080/generate -X POST \
}
EOF
```

* At the end of the smoke test with the TGI server, stop port forwarding by using Ctrl-C on the original terminal.

5. Verify the frontend chat interface is setup:
Expand All @@ -145,10 +147,10 @@ This step generates the vector embeddings for your input dataset. Currently, the
1. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=${SQL_PASSWORD:?} --instance=pgvector-instance --host=%`

2. Go to the Jupyterhub service endpoint in a browser:
* IAP disable: `kubectl get services proxy-public -n $NAMESPACE --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
* IAP enabled: Read terraform output `jupyter_uri` or use commend: `kubectl get managedcertificates jupyter-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
* Remeber login GCP to check if user has role `IAP-secured Web App User`
* Waiting for domain status to be `Active`
* IAP disabled: `kubectl get services proxy-public -n $NAMESPACE --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
* IAP enabled: Read terraform output `jupyter_uri` or use command: `kubectl get managedcertificates jupyter-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
* Open Google Cloud Console IAM to verify that the user has role `IAP-secured Web App User`
* Wait for the domain status to be `Active`
3. Login with placeholder credentials [TBD: replace with instructions for IAP]:
* username: user
* password: use `terraform output jupyter_password` to fetch the password value
Expand All @@ -167,40 +169,28 @@ This step generates the vector embeddings for your input dataset. Currently, the
* `os.environ['KAGGLE_KEY']`

9. Run all the cells in the notebook. This will generate vector embeddings for the input dataset (`denizbilginn/google-maps-restaurant-reviews`) and store them in the `pgvector-instance` via a Ray job.
* Once submitted, Ray will take several minutes to create the runtime environment and optionally scale up Ray worker nodes. During this time, the job status will remain PENDING.
* When the job status is SUCCEEDED, the vector embeddings have been generated and we are ready to launch the frontend chat interface.
* If the Ray job has FAILED, re-run the cell.
* When the Ray job has SUCCEEDED, we are ready to launch the frontend chat interface.

### Launch the Frontend Chat Interface
### Access the Frontend Chat Interface

#### Accessing the Frontend with IAP Disabled
#### With IAP Disabled
1. Setup port forwarding for the frontend: `kubectl port-forward service/rag-frontend -n $NAMESPACE 8080:8080 &`

2. Go to `localhost:8080` in a browser & start chatting! This will fetch context related to your prompt from the vector embeddings in the `pgvector-instance`, augment the original prompt with the context & query the inference model (`mistral-7b`) with the augmented prompt.

#### Accessing the Frontend with IAP Enabled
1. Verify IAP is Enabled

* Ensure that IAP is enabled on Google Cloud Platform (GCP) for your application. If you encounter any errors, try re-enabling IAP.

2. Verify User Role

* Make sure you have the role `IAP-secured Web App User` assigned to your user account. This role is necessary to access the application through IAP.

3. Verify Domain is Active
* Make sure the domain is active using commend:
`kubectl get managedcertificates frontend-managed-cert -n rag --output jsonpath='{.status.domainStatus[0].status}'`

3. Retrieve the Domain

* Read terraform output `frontend_uri` or use the following command to find the domain created by IAP for accessing your service:
`kubectl get managedcertificates frontend-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`

4. Access the Frontend
#### With IAP Enabled
1. Verify that IAP is enabled on Google Cloud Platform (GCP) for your application. If you encounter any errors, try re-enabling IAP.
2. Verify that you have the role `IAP-secured Web App User` assigned to your user account. This role is necessary to access the application through IAP.
3. Verify the domain is active using command:
`kubectl get managedcertificates frontend-managed-cert -n rag --output jsonpath='{.status.domainStatus[0].status}'`
3. Read terraform output `frontend_uri` or use the following command to find the domain created by IAP for accessing your service:
`kubectl get managedcertificates frontend-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
4. Open your browser and navigate to the domain you retrieved in the previous step to start chatting!

* Open your browser and navigate to the domain you retrieved in the previous step to start chatting!
#### Prompt Examples

#### Prompts Example
3. [TODO: Add some example prompts for the dataset].
*TODO:* Add some example prompts for the dataset.

### Cleanup

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@
"id": "7ba6c3ff-a25a-4f4d-b58e-68f7fe7d33df",
"metadata": {},
"outputs": [],
"source": [
"source": [
"job_id = client.submit_job(\n",
" entrypoint=\"python test.py\",\n",
" # Path to the local directory that contains the entrypoint file.\n",
Expand All @@ -278,10 +278,9 @@
" status = client.get_job_status(job_id)\n",
" if status != prev_status:\n",
" print(\"Job status:\", status)\n",
" print(\"Job info:\", client.get_job_info(job_id).message)\n",
" prev_status = status\n",
" if status.is_terminal():\n",
" if status == 'FAILED':\n",
" print(\"Job info:\", client.get_job_info(job_id))\n",
" break\n",
" time.sleep(5)\n"
]
Expand Down
46 changes: 14 additions & 32 deletions applications/rag/frontend/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,48 +15,30 @@ data "google_project" "project" {
project_id = var.project_id
}


locals {
instance_connection_name = format("%s:%s:%s", var.project_id, var.region, var.cloudsql_instance)
}

# IAP Section: Enabled the IAP service
resource "google_project_service" "project_service" {
count = var.add_auth ? 1 : 0
project = var.project_id
service = "iap.googleapis.com"

disable_dependent_services = false
disable_on_destroy = false
}

# IAP Section: Creates the OAuth client used in IAP
resource "google_iap_client" "iap_oauth_client" {
count = var.add_auth && var.client_id == "" ? 1 : 0
display_name = "Frontend-Client"
brand = var.brand == "" ? "projects/${data.google_project.project.number}/brands/${data.google_project.project.number}" : var.brand
}

# IAP Section: Creates the GKE components
module "iap_auth" {
count = var.add_auth ? 1 : 0
source = "../../../modules/iap"

project_id = var.project_id
namespace = var.namespace
frontend_add_auth = var.add_auth
frontend_k8s_ingress_name = var.k8s_ingress_name
frontend_k8s_managed_cert_name = var.k8s_managed_cert_name
frontend_k8s_iap_secret_name = var.k8s_iap_secret_name
frontend_k8s_backend_config_name = var.k8s_backend_config_name
frontend_k8s_backend_service_name = var.k8s_backend_service_name
frontend_k8s_backend_service_port = var.k8s_backend_service_port
frontend_client_id = var.client_id != "" ? var.client_id : google_iap_client.iap_oauth_client[0].client_id
frontend_client_secret = var.client_id != "" ? var.client_secret : google_iap_client.iap_oauth_client[0].secret
frontend_url_domain_addr = var.url_domain_addr
frontend_url_domain_name = var.url_domain_name
project_id = var.project_id
namespace = var.namespace
app_name = "frontend"
brand = var.brand
k8s_ingress_name = var.k8s_ingress_name
k8s_managed_cert_name = var.k8s_managed_cert_name
k8s_iap_secret_name = var.k8s_iap_secret_name
k8s_backend_config_name = var.k8s_backend_config_name
k8s_backend_service_name = var.k8s_backend_service_name
k8s_backend_service_port = var.k8s_backend_service_port
client_id = var.client_id
client_secret = var.client_secret
url_domain_addr = var.url_domain_addr
url_domain_name = var.url_domain_name
depends_on = [
google_project_service.project_service,
kubernetes_service.rag_frontend_service
]
}
Expand Down
2 changes: 1 addition & 1 deletion applications/rag/frontend/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@
# limitations under the License.

output "frontend_uri" {
value = var.add_auth ? module.iap_auth[0].frontend_domain : (data.kubernetes_service.frontend-ingress.status != null ? (data.kubernetes_service.frontend-ingress.status[0].load_balancer != null ? "${data.kubernetes_service.frontend-ingress.status[0].load_balancer[0].ingress[0].ip}" : "") : "")
value = var.add_auth ? module.iap_auth[0].domain : (data.kubernetes_service.frontend-ingress.status != null ? (data.kubernetes_service.frontend-ingress.status[0].load_balancer != null ? "${data.kubernetes_service.frontend-ingress.status[0].load_balancer[0].ingress[0].ip}" : "") : "")
}
10 changes: 10 additions & 0 deletions applications/rag/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,16 @@ module "cloudsql" {
depends_on = [module.namespace]
}

# IAP Section: Enabled the IAP service
resource "google_project_service" "project_service" {
count = var.frontend_add_auth || var.jupyter_add_auth ? 1 : 0
project = var.project_id
service = "iap.googleapis.com"

disable_dependent_services = false
disable_on_destroy = false
}

module "jupyterhub" {
source = "../../modules/jupyter"
providers = { helm = helm.rag, kubernetes = kubernetes.rag }
Expand Down
22 changes: 11 additions & 11 deletions applications/rag/workloads.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.

project_id = "<your project ID>"
project_id = "zlq-gke-dev"

## this is required for terraform to connect to GKE master and deploy workloads
create_cluster = false # this flag will create a new standard public gke cluster in default network
cluster_name = "<cluster_name>"
cluster_name = "ml-cluster"
cluster_location = "us-central1"

## GKE environment variables
kubernetes_namespace = "rag"
create_gcs_bucket = true
gcs_bucket = "rag-data-xyzu" # Choose a globally unique bucket name.
create_gcs_bucket = false
gcs_bucket = "rag-data-gcs-zlq-gke-dev" # Choose a globally unique bucket name.

cloudsql_instance = "pgvector-instance"
## Service accounts
Expand All @@ -44,11 +44,11 @@ jupyter_service_account = "jupyter-system-account"
dataset_embeddings_table_name = "googlemaps_reviews_db"

## IAP config
brand = "projects/<prj-number>/brands/<prj-number>"
brand = "projects/553239239816/brands/553239239816"

## Jupyter IAP Settings
jupyter_add_auth = false # Set to true when using auth with IAP
jupyter_support_email = "<email>"
jupyter_add_auth = true # Set to true when using auth with IAP
jupyter_support_email = "[email protected]"
jupyter_k8s_ingress_name = "jupyter-ingress"
jupyter_k8s_managed_cert_name = "jupyter-managed-cert"
jupyter_k8s_iap_secret_name = "jupyter-iap-secret"
Expand All @@ -60,11 +60,11 @@ jupyter_url_domain_addr = ""
jupyter_url_domain_name = ""
jupyter_client_id = ""
jupyter_client_secret = ""
jupyter_members_allowlist = ["allAuthenticatedUsers", "user:<email>"]
jupyter_members_allowlist = ["allAuthenticatedUsers", "user:[email protected]"]

## Frontend IAP Settings
frontend_add_auth = false # Set to true when using auth with IAP
frontend_support_email = "<email>"
frontend_add_auth = true # Set to true when using auth with IAP
frontend_support_email = "[email protected]"
frontend_k8s_ingress_name = "frontend-ingress"
frontend_k8s_managed_cert_name = "frontend-managed-cert"
frontend_k8s_iap_secret_name = "frontend-iap-secret"
Expand All @@ -76,4 +76,4 @@ frontend_url_domain_addr = ""
frontend_url_domain_name = ""
frontend_client_id = ""
frontend_client_secret = ""
frontend_members_allowlist = ["allAuthenticatedUsers", "user:<email>"]
frontend_members_allowlist = ["allAuthenticatedUsers", "user:[email protected]"]
Loading

0 comments on commit c3dcf91

Please sign in to comment.