Skip to content

Commit

Permalink
Merge pull request #217 from GoogleCloudPlatform/kaggle-update
Browse files Browse the repository at this point in the history
Update RAG readme with instructions for kaggle & fix frontend to have configurable table name
  • Loading branch information
imreddy13 authored Feb 22, 2024
2 parents 3fc3edf + 15cfaa0 commit b032860
Show file tree
Hide file tree
Showing 11 changed files with 53 additions and 20 deletions.
32 changes: 20 additions & 12 deletions applications/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ gcloud container node-pools create g2-standard-24 --cluster <cluster-name> \
--ephemeral-storage-local-ssd=count=2 \
--enable-image-streaming \
--num-nodes=1 --min-nodes=1 --max-nodes=2 \
--node-locations $REGION-a,$REGION-b --region $REGION
--node-locations $REGION-a,$REGION-b --location=$REGION
```

#### Setup Components
Expand All @@ -47,7 +47,9 @@ Next, set up the inference server, the `pgvector` instance, Jupyterhub, Kuberay

1. `cd ai-on-gke/applications/rag`

2. Edit `workloads.tfvars` with your project ID, cluster name & location. Optionally choose the k8s namespace, service account and GCS bucket to be used by the application. If not selected, these resources will be created based on the default values set.
2. Edit `workloads.tfvars` with your project ID, cluster name, location and a GCS bucket name.
* The GCS bucket name needs to be globally unique so add some random suffix to it (ensure `gcloud storage buckets describe gs://<bucketname>` returns a 404).
* Optionally choose the k8s namespace & service account to be used by the application. If not selected, these resources will be created based on the default values set.

3. Run `terraform init`

Expand Down Expand Up @@ -79,7 +81,7 @@ This filter can auto fetch the templates in your project. Please refer to the fo
4. Verify the inference server is setup:
* Set up port forward
```
kubectl port-forward deployment/mistral-7b-instruct 8080:8080 &
kubectl port-forward -n <namespace> deployment/mistral-7b-instruct 8080:8080 &
```

* Try a few prompts:
Expand All @@ -96,32 +98,38 @@ curl 127.0.0.1:8080/generate -X POST \
}
EOF
```
* At the end of the smoke test with the TGI server, close the port forward for 8080.

5. Verify the frontend chat interface is setup:
* Verify the service exists: `kubectl get services rag-frontend -n <namespace>`
* Verify the deployment exists: `kubectl get deployments rag-frontend -n <namespace>` & ensure the deployment is in `READY` state.

### Vector Embeddings for Dataset

This step generates the vector embeddings for your input dataset. Currently, the default dataset is `wiki_dpr`. We will use a Jupyter notebook to run a Ray job that generates the embeddings & populates them into the instance `pgvector-instance` created above.
This step generates the vector embeddings for your input dataset. Currently, the default dataset is [Google Maps Restaurant Reviews](https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews). We will use a Jupyter notebook to run a Ray job that generates the embeddings & populates them into the instance `pgvector-instance` created above.

1. Download the provided Juypter notebook to generate vector embeddings from `ai-on-gke\applications\rag\example_notebooks\ray-hf-cloudsql-latest.ipynb`.
1. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=<choose a password> --instance=pgvector-instance --host=%`

2. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=${PASSWORD:?} --instance=pgvector-instance --host=%`
2. Go to the Jupyterhub service endpoint in a browser: `kubectl get services proxy-public -n <namespace> --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`

3. Go to the Jupyterhub service endpoint in a browser: `kubectl get services proxy-public -n <namespace> --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`

4. Login with placeholder credentials [TBD: replace with instructions for IAP]:
3. Login with placeholder credentials [TBD: replace with instructions for IAP]:
* username: user3
* password: use `terraform output password` to fetch the password value

5. Once logged in, choose the `CPU` preset & use the Upload button to upload the notebook `ray-hf-cloudsql-latest.ipynb`. Replace the variables in the 3rd cell with the following:
4. Once logged in, choose the `CPU` preset. Go to File -> Open From URL & upload the notebook `rag-kaggle-ray-sql.ipynb` from `https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb`. This path can also be found by going to the [notebook location](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb) and selecting `Raw`.

5. Replace the variables in the 3rd cell with the following to access the database:
* `INSTANCE_CONNECTION_NAME`: `<project_id>:<region>:pgvector-instance`
* `DB_USER`: `rag-user-notebook`
* `DB_PASS`: password from step 2
* `DB_PASS`: password from step 1

6. Create a Kaggle account and navigate to https://www.kaggle.com/settings/account and generate an API token. See https://www.kaggle.com/docs/api#authentication how to create one from https://kaggle.com/settings ([screenshot](https://screenshot.googleplex.com/4rj6Tjdwt5KGTRz)). This token is used in the notebook to access the [Google Maps Restaurant Reviews dataset](https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews)

8. Replace the kaggle username and api token in 2nd cell with your credentials (can be found in the `kaggle.json` file created by Step 6):
* `os.environ['KAGGLE_USERNAME']`
* `os.environ['KAGGLE_KEY']`

6. Run all the cells in the notebook. This generates vector embeddings for the input dataset (`wiki-dpr`) and stores them in the `pgvector-instance` via a Ray job.
9. Run all the cells in the notebook. This generates vector embeddings for the input dataset (`denizbilginn/google-maps-restaurant-reviews`) and stores them in the `pgvector-instance` via a Ray job.
* When the last cell says the job has succeeded (eg: `Job 'raysubmit_APungAw6TyB55qxk' succeeded`), the vector embeddings have been generated and we can launch the frontend chat interface.

### Launch the Frontend Chat Interface
Expand Down
14 changes: 10 additions & 4 deletions applications/rag/example_notebooks/rag-kaggle-ray-sql-latest.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,10 @@
"import sqlalchemy\n",
"\n",
"# initialize parameters\n",
"INSTANCE_CONNECTION_NAME = \"saikatroyc-stateful-joonix:us-central1:pgvector-instance\" # Modify the project and region based on your setting\n",
"INSTANCE_CONNECTION_NAME = \"<project-id>:<location>:pgvector-instance\" # Modify the project and region based on your setting\n",
"print(f\"Your instance connection name is: {INSTANCE_CONNECTION_NAME}\")\n",
"DB_USER = \"rag-user\" # Modify this based on your setting\n",
"DB_PASS = \"test123\" # Modify this based on your setting\n",
"DB_USER = \"rag-user-notebook\" # Modify this based on your setting\n",
"DB_PASS = \"<password>\" # Modify this based on your setting\n",
"DB_NAME = \"pgvector-database\"\n",
"\n",
"# initialize Connector object\n",
Expand Down Expand Up @@ -209,6 +209,12 @@
" db_conn.commit()\n",
" print(\"Created table=\", TABLE_NAME)\n",
"\n",
" # TODO: Fix workaround access grant for the frontend to access the table.\n",
" grant_access_stmt = \"GRANT SELECT on \" + TABLE_NAME + \" to \\\"rag-user\\\";\"\n",
" db_conn.execute(\n",
" sqlalchemy.text(grant_access_stmt)\n",
" )\n",
" \n",
" query_text = \"INSERT INTO \" + TABLE_NAME + \" (id, text, text_embedding) VALUES (:id, :text, :text_embedding)\"\n",
" insert_stmt = sqlalchemy.text(query_text)\n",
" for output in ds_embed.iter_rows():\n",
Expand Down Expand Up @@ -321,7 +327,7 @@
}
],
"source": [
"!ray job status raysubmit_8cQxrAChfX9BYKUW --address \"ray://example-cluster-kuberay-head-svc:10001\" "
"!ray job status {job_id}} --address \"ray://example-cluster-kuberay-head-svc:10001\" "
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions applications/rag/frontend/container/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
app.jinja_env.trim_blocks = True
app.jinja_env.lstrip_blocks = True

TABLE_NAME = 'huggingface_db' # CloudSQL table name
TABLE_NAME = os.environ.get('TABLE_NAME', '') # CloudSQL table name
SENTENCE_TRANSFORMER_MODEL = 'intfloat/multilingual-e5-small' # Transformer to use for converting text chunks to vector embeddings
DB_NAME = "pgvector-database"

Expand Down Expand Up @@ -126,7 +126,7 @@ def index():

def fetchContext(query_text):
with db.connect() as conn:
results = conn.execute(sqlalchemy.text("SELECT * FROM huggingface_db")).fetchall()
results = conn.execute(sqlalchemy.text("SELECT * FROM " + TABLE_NAME)).fetchall()
log.info(f"query database results:")
for row in results:
print(row)
Expand Down
File renamed without changes.
File renamed without changes.
7 changes: 6 additions & 1 deletion applications/rag/frontend/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ resource "kubernetes_deployment" "rag_frontend_deployment" {
spec {
service_account_name = var.google_service_account
container {
image = "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:6ba3a1f8298d6164805dd2a039718d0d8b713ccfc2c3ab9bc8669ac5e30f89ed"
image = "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:e2dd85e92f42e3684455a316dee5f98f61f1f3fba80b9368bd6f48d5e2e3475e"
name = "rag-frontend"

port {
Expand All @@ -102,6 +102,11 @@ resource "kubernetes_deployment" "rag_frontend_deployment" {
value = data.kubernetes_service.inference_service.status.0.load_balancer.0.ingress.0.ip
}

env {
name = "TABLE_NAME"
value = var.dataset_embeddings_table_name
}

env {
name = "DB_USER"
value_from {
Expand Down
5 changes: 5 additions & 0 deletions applications/rag/frontend/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ variable "db_secret_namespace" {
default = "rag"
}

variable "dataset_embeddings_table_name" {
type = string
description = "Name of the table that stores vector embeddings for input dataset"
}

variable "inference_service_name" {
type = string
description = "Model inference k8s service name"
Expand Down
1 change: 1 addition & 0 deletions applications/rag/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -157,4 +157,5 @@ module "frontend" {
inference_service_namespace = module.inference-server.inference_service_namespace
db_secret_name = module.cloudsql.db_secret_name
db_secret_namespace = module.cloudsql.db_secret_namespace
dataset_embeddings_table_name = var.dataset_embeddings_table_name
}
5 changes: 5 additions & 0 deletions applications/rag/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,11 @@ variable "gcs_bucket" {
description = "GCS bucket name to store dataset"
}

variable "dataset_embeddings_table_name" {
type = string
description = "Name of the table that stores vector embeddings for input dataset"
}

variable "default_backend_service" {
type = string
default = "proxy-public"
Expand Down
5 changes: 4 additions & 1 deletion applications/rag/workloads.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ rag_service_account = "rag-system-account"
create_jupyter_service_account = true
jupyter_service_account = "jupyter-system-account"

# IAP config
## Embeddings table name - change this to the TABLE_NAME used in the notebook.
dataset_embeddings_table_name = "googlemaps_reviews_db"

## IAP config
add_auth = false # Set to true when using auth with IAP
brand = "projects/<prj-number>/brands/<prj-number>"
support_email = "<email>"
Expand Down

0 comments on commit b032860

Please sign in to comment.