Merge pull request #217 from GoogleCloudPlatform/kaggle-update

Update RAG readme with instructions for kaggle & fix frontend to have configurable table name
GoogleCloudPlatform · Feb 22, 2024 · b032860 · b032860
2 parents 3fc3edf + 15cfaa0
commit b032860
Show file tree

Hide file tree

Showing 11 changed files with 53 additions and 20 deletions.
diff --git a/applications/rag/README.md b/applications/rag/README.md
@@ -38,7 +38,7 @@ gcloud container node-pools create g2-standard-24 --cluster <cluster-name> \
   --ephemeral-storage-local-ssd=count=2 \
  --enable-image-streaming \
  --num-nodes=1 --min-nodes=1 --max-nodes=2 \
- --node-locations $REGION-a,$REGION-b --region $REGION
+ --node-locations $REGION-a,$REGION-b --location=$REGION
 ```
 
 #### Setup Components
@@ -47,7 +47,9 @@ Next, set up the inference server, the `pgvector` instance, Jupyterhub, Kuberay
 
 1. `cd ai-on-gke/applications/rag`
 
-2. Edit `workloads.tfvars` with your project ID, cluster name & location. Optionally choose the k8s namespace, service account and GCS bucket to be used by the application. If not selected, these resources will be created based on the default values set.
+2. Edit `workloads.tfvars` with your project ID, cluster name, location and a GCS bucket name. 
+    * The GCS bucket name needs to be globally unique so add some random suffix to it (ensure `gcloud storage buckets describe gs://<bucketname>` returns a 404).
+    * Optionally choose the k8s namespace & service account to be used by the application. If not selected, these resources will be created based on the default values set.
 
 3. Run `terraform init`
 
@@ -79,7 +81,7 @@ This filter can auto fetch the templates in your project. Please refer to the fo
 4. Verify the inference server is setup:
 * Set up port forward
 ```
-kubectl port-forward deployment/mistral-7b-instruct 8080:8080 &
+kubectl port-forward -n <namespace> deployment/mistral-7b-instruct 8080:8080 &
 ```
 
 * Try a few prompts:
@@ -96,32 +98,38 @@ curl 127.0.0.1:8080/generate -X POST \
 }
 EOF
 ```
+* At the end of the smoke test with the TGI server, close the port forward for 8080.
 
 5. Verify the frontend chat interface is setup:
  * Verify the service exists: `kubectl get services rag-frontend -n <namespace>`
  * Verify the deployment exists: `kubectl get deployments rag-frontend -n <namespace>` & ensure the deployment is in `READY` state.
 
 ### Vector Embeddings for Dataset
 
-This step generates the vector embeddings for your input dataset. Currently, the default dataset is `wiki_dpr`. We will use a Jupyter notebook to run a Ray job that generates the embeddings & populates them into the instance `pgvector-instance` created above.
+This step generates the vector embeddings for your input dataset. Currently, the default dataset is [Google Maps Restaurant Reviews](https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews). We will use a Jupyter notebook to run a Ray job that generates the embeddings & populates them into the instance `pgvector-instance` created above.
 
-1. Download the provided Juypter notebook to generate vector embeddings from `ai-on-gke\applications\rag\example_notebooks\ray-hf-cloudsql-latest.ipynb`.
+1. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=<choose a password> --instance=pgvector-instance --host=%`
 
-2. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=${PASSWORD:?} --instance=pgvector-instance --host=%`
+2. Go to the Jupyterhub service endpoint in a browser: `kubectl get services proxy-public -n <namespace> --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
 
-3. Go to the Jupyterhub service endpoint in a browser: `kubectl get services proxy-public -n <namespace> --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
-
-4. Login with placeholder credentials [TBD: replace with instructions for IAP]:
+3. Login with placeholder credentials [TBD: replace with instructions for IAP]:
 * username: user3
 * password: use `terraform output password` to fetch the password value
 
-5. Once logged in, choose the `CPU` preset & use the Upload button to upload the notebook `ray-hf-cloudsql-latest.ipynb`. Replace the variables in the 3rd cell with the following:
+4. Once logged in, choose the `CPU` preset. Go to File -> Open From URL & upload the notebook `rag-kaggle-ray-sql.ipynb` from `https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb`. This path can also be found by going to the [notebook location](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb) and selecting `Raw`.
 
+5. Replace the variables in the 3rd cell with the following to access the database:
 * `INSTANCE_CONNECTION_NAME`: `<project_id>:<region>:pgvector-instance`
 * `DB_USER`: `rag-user-notebook`
-* `DB_PASS`: password from step 2
+* `DB_PASS`: password from step 1
+
+6. Create a Kaggle account and navigate to https://www.kaggle.com/settings/account and generate an API token. See https://www.kaggle.com/docs/api#authentication how to create one from https://kaggle.com/settings ([screenshot](https://screenshot.googleplex.com/4rj6Tjdwt5KGTRz)). This token is used in the notebook to access the [Google Maps Restaurant Reviews dataset](https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews)
+
+8. Replace the kaggle username and api token in 2nd cell with your credentials (can be found in the `kaggle.json` file created by Step 6):
+* `os.environ['KAGGLE_USERNAME']`
+* `os.environ['KAGGLE_KEY']`
 
-6. Run all the cells in the notebook. This generates vector embeddings for the input dataset (`wiki-dpr`) and stores them in the `pgvector-instance` via a Ray job.
+9. Run all the cells in the notebook. This generates vector embeddings for the input dataset (`denizbilginn/google-maps-restaurant-reviews`) and stores them in the `pgvector-instance` via a Ray job.
     * When the last cell says the job has succeeded (eg: `Job 'raysubmit_APungAw6TyB55qxk' succeeded`), the vector embeddings have been generated and we can launch the frontend chat interface.
 
 ### Launch the Frontend Chat Interface

diff --git a/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb b/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb
@@ -83,10 +83,10 @@
     "import sqlalchemy\n",
     "\n",
     "# initialize parameters\n",
-    "INSTANCE_CONNECTION_NAME = \"saikatroyc-stateful-joonix:us-central1:pgvector-instance\" # Modify the project and region based on your setting\n",
+    "INSTANCE_CONNECTION_NAME = \"<project-id>:<location>:pgvector-instance\" # Modify the project and region based on your setting\n",
     "print(f\"Your instance connection name is: {INSTANCE_CONNECTION_NAME}\")\n",
-    "DB_USER = \"rag-user\" # Modify this based on your setting\n",
-    "DB_PASS = \"test123\" # Modify this based on your setting\n",
+    "DB_USER = \"rag-user-notebook\" # Modify this based on your setting\n",
+    "DB_PASS = \"<password>\" # Modify this based on your setting\n",
     "DB_NAME = \"pgvector-database\"\n",
     "\n",
     "# initialize Connector object\n",
@@ -209,6 +209,12 @@
     "  db_conn.commit()\n",
     "  print(\"Created table=\", TABLE_NAME)\n",
     "\n",
+    "  # TODO: Fix workaround access grant for the frontend to access the table.\n",
+    "  grant_access_stmt = \"GRANT SELECT on \" + TABLE_NAME + \" to \\\"rag-user\\\";\"\n",
+    "  db_conn.execute(\n",
+    "    sqlalchemy.text(grant_access_stmt)\n",
+    "  )\n",
+    "  \n",
     "  query_text = \"INSERT INTO \" + TABLE_NAME + \" (id, text, text_embedding) VALUES (:id, :text, :text_embedding)\"\n",
     "  insert_stmt = sqlalchemy.text(query_text)\n",
     "  for output in ds_embed.iter_rows():\n",
@@ -321,7 +327,7 @@
     }
    ],
    "source": [
-    "!ray job status raysubmit_8cQxrAChfX9BYKUW  --address \"ray://example-cluster-kuberay-head-svc:10001\" "
+    "!ray job status {job_id}}  --address \"ray://example-cluster-kuberay-head-svc:10001\" "
    ]
   },
   {

diff --git a/applications/rag/frontend/container/main.py b/applications/rag/frontend/container/main.py
@@ -33,7 +33,7 @@
 app.jinja_env.trim_blocks = True
 app.jinja_env.lstrip_blocks = True
 
-TABLE_NAME = 'huggingface_db'  # CloudSQL table name
+TABLE_NAME = os.environ.get('TABLE_NAME', '')  # CloudSQL table name
 SENTENCE_TRANSFORMER_MODEL = 'intfloat/multilingual-e5-small' # Transformer to use for converting text chunks to vector embeddings
 DB_NAME = "pgvector-database"
 
@@ -126,7 +126,7 @@ def index():
 
 def fetchContext(query_text):
     with db.connect() as conn:
-        results = conn.execute(sqlalchemy.text("SELECT * FROM huggingface_db")).fetchall()
+        results = conn.execute(sqlalchemy.text("SELECT * FROM " + TABLE_NAME)).fetchall()
         log.info(f"query database results:")
         for row in results:
             print(row)

diff --git a/applications/rag/frontend/static/script.js → ...s/rag/frontend/container/static/script.js b/applications/rag/frontend/static/script.js → ...s/rag/frontend/container/static/script.js
diff --git a/applications/rag/frontend/static/styles.css → .../rag/frontend/container/static/styles.css b/applications/rag/frontend/static/styles.css → .../rag/frontend/container/static/styles.css
diff --git a/...cations/rag/frontend/templates/index.html → ...g/frontend/container/templates/index.html b/...cations/rag/frontend/templates/index.html → ...g/frontend/container/templates/index.html
diff --git a/applications/rag/frontend/main.tf b/applications/rag/frontend/main.tf
@@ -85,7 +85,7 @@ resource "kubernetes_deployment" "rag_frontend_deployment" {
       spec {
         service_account_name = var.google_service_account
         container {
-          image = "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:6ba3a1f8298d6164805dd2a039718d0d8b713ccfc2c3ab9bc8669ac5e30f89ed"
+          image = "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:e2dd85e92f42e3684455a316dee5f98f61f1f3fba80b9368bd6f48d5e2e3475e"
           name  = "rag-frontend"
 
           port {
@@ -102,6 +102,11 @@ resource "kubernetes_deployment" "rag_frontend_deployment" {
             value = data.kubernetes_service.inference_service.status.0.load_balancer.0.ingress.0.ip
           }
 
+          env {
+            name  = "TABLE_NAME"
+            value = var.dataset_embeddings_table_name
+          }
+
           env {
             name = "DB_USER"
             value_from {

diff --git a/applications/rag/frontend/variables.tf b/applications/rag/frontend/variables.tf
@@ -40,6 +40,11 @@ variable "db_secret_namespace" {
   default = "rag"
 }
 
+variable "dataset_embeddings_table_name" {
+  type        = string
+  description = "Name of the table that stores vector embeddings for input dataset"
+}
+
 variable "inference_service_name" {
   type        = string
   description = "Model inference k8s service name"

diff --git a/applications/rag/main.tf b/applications/rag/main.tf
@@ -157,4 +157,5 @@ module "frontend" {
   inference_service_namespace = module.inference-server.inference_service_namespace
   db_secret_name = module.cloudsql.db_secret_name
   db_secret_namespace = module.cloudsql.db_secret_namespace
+  dataset_embeddings_table_name = var.dataset_embeddings_table_name
 }
diff --git a/applications/rag/variables.tf b/applications/rag/variables.tf
@@ -78,6 +78,11 @@ variable "gcs_bucket" {
   description = "GCS bucket name to store dataset"
 }
 
+variable "dataset_embeddings_table_name" {
+  type        = string
+  description = "Name of the table that stores vector embeddings for input dataset"
+}
+
 variable "default_backend_service" {
   type    = string
   default = "proxy-public"

diff --git a/applications/rag/workloads.tfvars b/applications/rag/workloads.tfvars
@@ -38,7 +38,10 @@ rag_service_account        = "rag-system-account"
 create_jupyter_service_account  = true
 jupyter_service_account = "jupyter-system-account"
 
-# IAP config
+## Embeddings table name - change this to the TABLE_NAME used in the notebook.
+dataset_embeddings_table_name = "googlemaps_reviews_db"
+
+## IAP config
 add_auth                = false # Set to true when using auth with IAP
 brand                   = "projects/<prj-number>/brands/<prj-number>"
 support_email           = "<email>"