diff --git a/applications/rag/README.md b/applications/rag/README.md index cd8c3d016..d5d61fc8a 100644 --- a/applications/rag/README.md +++ b/applications/rag/README.md @@ -17,7 +17,7 @@ RAG uses a semantically searchable knowledge base (like vector search) to retrie 5. A [Jupyter](https://docs.jupyter.org/en/latest/) notebook running on GKE that reads the dataset using GCS fuse driver integrations and runs a Ray job to populate the vector DB. 3. A front end chat interface running on GKE that prompts the inference server with context from the vector DB. -This tutorial walks you through installing the RAG infrastructure in a GCP project, generating vector embeddings for a sample [Kaggle Netflix shows](https://www.kaggle.com/datasets/shivamb/netflix-shows) dataset and prompting the LLM with context. +This tutorial walks you through installing the RAG infrastructure in a GCP project, generating vector embeddings for a sample [Kubernetes Docs](https://github.com/dohsimpson/kubernetes-doc-pdf) dataset and prompting the LLM with context. # Prerequisites @@ -74,7 +74,7 @@ This section sets up the RAG infrastructure in your GCP project using Terraform. # Generate vector embeddings for the dataset -This section generates the vector embeddings for your input dataset. Currently, the default dataset is [Netflix shows](https://www.kaggle.com/datasets/shivamb/netflix-shows). We will use a Jupyter notebook to run a Ray job that generates the embeddings & populates them into the `pgvector` instance created above. +This section generates the vector embeddings for your input dataset. Currently, the default dataset is [Kubernetes docs](https://github.com/dohsimpson/kubernetes-doc-pdf). We will use a Jupyter notebook to generate the embeddings & populates them into the `pgvector` instance created above. Set your the namespace, cluster name and location from `workloads.tfvars`): @@ -108,30 +108,10 @@ gcloud container clusters get-credentials ${CLUSTER_NAME} --location=${CLUSTER_L 2. Load the notebook: - Once logged in to JupyterHub, choose the `CPU` preset with `Default` storage. - - Click [File] -> [Open From URL] and paste: `https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/applications/rag/example_notebooks/rag-kaggle-ray-sql-interactive.ipynb` - -3. Configure Kaggle: - - Create a [Kaggle account](https://www.kaggle.com/account/login?phase=startRegisterTab&returnUrl=%2F). - - [Generate an API token](https://www.kaggle.com/settings/account). See [further instructions](https://www.kaggle.com/docs/api#authentication). This token is used in the notebook to access the [Kaggle Netflix shows](https://www.kaggle.com/datasets/shivamb/netflix-shows) dataset. - - Replace the variables in the 1st cell of the notebook with your Kaggle credentials (can be found in the `kaggle.json` file created while generating the API token): - * `KAGGLE_USERNAME` - * `KAGGLE_KEY` - -4. Generate vector embeddings: Run all the cells in the notebook to generate vector embeddings for the Netflix shows dataset (https://www.kaggle.com/datasets/shivamb/netflix-shows) and store them in the `pgvector` CloudSQL instance via a Ray job. - * When the last cell says the job has succeeded (eg: `Job 'raysubmit_APungAw6TyB55qxk' succeeded`), the vector embeddings have been generated and we can launch the frontend chat interface. Note that running the job can take up to 10 minutes. - * Ray may take several minutes to create the runtime environment. During this time, the job will appear to be missing (e.g. `Status message: PENDING`). - * Connect to the Ray dashboard to check the job status or logs: - - If IAP is disabled (`ray_dashboard_add_auth = false`): - - `kubectl port-forward -n ${NAMESPACE} service/ray-cluster-kuberay-head-svc 8265:8265` - - Go to `localhost:8265` in a browser - - If IAP is enabled (`ray_dashboard_add_auth = true`): - - Fetch the domain: `terraform output ray-dashboard-managed-cert` - - If you used a custom domain, ensure you configured your DNS as described above. - - Verify the domain status is `Active`: - - `kubectl get managedcertificates ray-dashboard-managed-cert -n ${NAMESPACE} --output jsonpath='{.status.domainStatus[0].status}'` - - Note: This can take up to 20 minutes to propagate. - - Once the domain status is Active, go to the domain in a browser and login with your Google credentials. - - To add additional users to your frontend application, go to [Google Cloud Platform IAP](https://console.cloud.google.com/security/iap), select the `rag/ray-cluster-kuberay-head-svc` service and add principals with the role `IAP-secured Web App User`. + - Click [File] -> [Open From URL] and paste: `https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/applications/rag/example_notebooks/rag-data-ingest-with-kubernetes-docs.ipynb` + + +3. Generate vector embeddings: Run all the cells in the notebook to generate vector embeddings for the [Kubernetes documentation](https://github.com/dohsimpson/kubernetes-doc-pdf) and store them in the `pgvector` CloudSQL instance using a Ray Job. # Launch the frontend chat interface diff --git a/applications/rag/example_notebooks/rag-ray-ingest-with-kubernetes-docs.ipynb b/applications/rag/example_notebooks/rag-ray-ingest-with-kubernetes-docs.ipynb new file mode 100644 index 000000000..d50a1a1dd --- /dev/null +++ b/applications/rag/example_notebooks/rag-ray-ingest-with-kubernetes-docs.ipynb @@ -0,0 +1,291 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7e14d0f0-2573-4fe4-ba87-7a447f2f511c", + "metadata": {}, + "source": [ + "# RAG-on-GKE Application\n", + "\n", + "This is a Python notebook for generating the vector embeddings based on [Kubernetes docs](https://github.com/dohsimpson/kubernetes-doc-pdf/) used by the RAG on GKE application. \n", + "For full information, please checkout the GitHub documentation [here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/applications/rag/README.md).\n" + ] + }, + { + "cell_type": "markdown", + "id": "2cba26cf", + "metadata": {}, + "source": [ + "## Clone the kubernetes docs repo" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f9b1fad-537e-425f-a5fc-587a408b1fab", + "metadata": {}, + "outputs": [], + "source": [ + "!mkdir /data/kubernetes-docs -p\n", + "!git clone https://github.com/dohsimpson/kubernetes-doc-pdf /data/kubernetes-docs\n" + ] + }, + { + "cell_type": "markdown", + "id": "b984429c-b65a-47b7-9723-ee3ad81d61db", + "metadata": {}, + "source": [ + "## Install the required packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40e4d29d-79c6-4233-a8ed-0f8a42576656", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "!pip install langchain langchain-community sentence_transformers pypdf" + ] + }, + { + "cell_type": "markdown", + "id": "f80cc5af-a1fa-456d-a4ed-fa2ffa3b87a0", + "metadata": {}, + "source": [ + "## Writting job to be used on the Ray Cluster" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36523f3f-0c93-41da-abb9-c113bb456bc1", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a directory to package the contents that need to be downloaded in ray worker\n", + "! mkdir -p rag-app" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69d912e5-2225-4b44-80cd-651f7cc71a40", + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile rag-app/job.py\n", + "\n", + "import os\n", + "import uuid\n", + "import glob\n", + "\n", + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "from langchain.embeddings import HuggingFaceEmbeddings\n", + "from langchain_community.document_loaders import PyPDFLoader\n", + "\n", + "from google.cloud.sql.connector import Connector, IPTypes\n", + "import sqlalchemy\n", + "\n", + "from sqlalchemy.ext.declarative import declarative_base\n", + "from sqlalchemy import Column, String, Text, text, JSON\n", + "from sqlalchemy.orm import scoped_session, sessionmaker, mapped_column\n", + "from pgvector.sqlalchemy import Vector\n", + "\n", + "# initialize parameters\n", + "\n", + "INSTANCE_CONNECTION_NAME = os.environ[\"CLOUDSQL_INSTANCE_CONNECTION_NAME\"]\n", + "print(f\"Your instance connection name is: {INSTANCE_CONNECTION_NAME}\")\n", + "VECTOR_EMBEDDINGS_TABLE_NAME = \"rag_embeddings_db\"\n", + "DB_NAME = \"pgvector-database\"\n", + "\n", + "db_username_file = open(\"/etc/secret-volume/username\", \"r\")\n", + "DB_USER = db_username_file.read()\n", + "db_username_file.close()\n", + "\n", + "db_password_file = open(\"/etc/secret-volume/password\", \"r\")\n", + "DB_PASS = db_password_file.read()\n", + "db_password_file.close()\n", + "\n", + "# initialize Connector object\n", + "connector = Connector()\n", + "\n", + "# function to return the database connection object\n", + "def getconn():\n", + " conn = connector.connect(\n", + " INSTANCE_CONNECTION_NAME,\n", + " \"pg8000\",\n", + " user=DB_USER,\n", + " password=DB_PASS,\n", + " db=DB_NAME,\n", + " ip_type=IPTypes.PRIVATE\n", + " )\n", + " return conn\n", + "\n", + "# create connection pool with 'creator' argument to our connection object function\n", + "pool = sqlalchemy.create_engine(\n", + " \"postgresql+pg8000://\",\n", + " creator=getconn,\n", + ")\n", + "\n", + "Base = declarative_base()\n", + "DBSession = scoped_session(sessionmaker())\n", + "\n", + "class TextEmbedding(Base):\n", + " __tablename__ = VECTOR_EMBEDDINGS_TABLE_NAME\n", + " id = Column(String(255), primary_key=True)\n", + " text = Column(Text)\n", + " text_embedding = mapped_column(Vector(384))\n", + "\n", + "with pool.connect() as conn:\n", + " conn.execute(text(\"CREATE EXTENSION IF NOT EXISTS vector\"))\n", + " conn.commit() \n", + " \n", + "DBSession.configure(bind=pool, autoflush=False, expire_on_commit=False)\n", + "Base.metadata.drop_all(pool)\n", + "Base.metadata.create_all(pool)\n", + "\n", + "SENTENCE_TRANSFORMER_MODEL = \"intfloat/multilingual-e5-small\" # Transformer to use for converting text chunks to vector embeddings\n", + "\n", + "# the dataset has been pre-dowloaded to the GCS bucket as part of the notebook in the cell above. Ray workers will find the dataset readily mounted.\n", + "SHARED_DATASET_BASE_PATH = \"/data/kubernetes-docs/\"\n", + "\n", + "CHUNK_SIZE = 1000 # text chunk sizes which will be converted to vector embeddings\n", + "CHUNK_OVERLAP = 10\n", + "VECTOR_DIMENSION = 384 # Embeddings size\n", + "\n", + "splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP, length_function=len)\n", + "embeddings_service = HuggingFaceEmbeddings(model_name=SENTENCE_TRANSFORMER_MODEL)\n", + "\n", + "def process_pdf(file_path):\n", + " \"\"\"Loads, splits and embed a single PDF file.\"\"\"\n", + " loader = PyPDFLoader(file_path)\n", + " print(f\"Loading {file_path}\")\n", + " pages = loader.load_and_split()\n", + " \n", + " splits = splitter.split_documents(pages)\n", + "\n", + " chunks = []\n", + " for split in splits:\n", + " id = uuid.uuid4()\n", + " page_content = split.page_content\n", + " file_metadata = split.metadata\n", + " embedded_document = embeddings_service.embed_query(page_content)\n", + " split_data = {\n", + " \"langchain_id\" : id,\n", + " \"content\" : page_content,\n", + " \"embedding\" : embedded_document,\n", + " \"langchain_metadata\" : file_metadata\n", + " }\n", + " chunks.append(split_data)\n", + " return chunks\n", + "\n", + "documents_file_paths = glob.glob(f\"{SHARED_DATASET_BASE_PATH}/PDFs/*.pdf\")\n", + "for file_path in documents_file_paths:\n", + " processed_result = process_pdf(file_path)\n", + " DBSession.bulk_insert_mappings(TextEmbedding, processed_result)\n", + " \n", + "DBSession.commit()\n", + "\n", + "#Verifying the results.\n", + "\n", + "query_text = \"What's kubernetes?\" \n", + "query_emb = embeddings_service.embed_query(query_text).tolist()\n", + "query_request = \"SELECT id, text, text_embedding, 1 - ('[\" + \",\".join(map(str, query_emb)) + \"]' <=> text_embedding) AS cosine_similarity FROM \" + TABLE_NAME + \" ORDER BY cosine_similarity DESC LIMIT 5;\" \n", + "query_results = DBSession.execute(sqlalchemy.text(query_request)).fetchall()\n", + "DBSession.commit()\n", + "print(\"print query_results, the 1st one is the hit\")\n", + "for row in query_results:\n", + " print(row)\n", + "\n", + "print (\"end job\")" + ] + }, + { + "cell_type": "markdown", + "id": "6b9bc582-50cd-4d7c-b5c4-549626fd2349", + "metadata": {}, + "source": [ + "## Summiting the job into Ray Cluster:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5b6acbe-5a14-4bc8-a4ca-58a6b3dd5391", + "metadata": {}, + "outputs": [], + "source": [ + "import ray, time\n", + "from ray.job_submission import JobSubmissionClient\n", + "client = JobSubmissionClient(\"ray://ray-cluster-kuberay-head-svc:10001\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4eb8eae9-2a20-4c02-ac79-196942ae2783", + "metadata": {}, + "outputs": [], + "source": [ + "# Port forward to the Ray dashboard and go to `localhost:8265` in a browser to see job status: kubectl port-forward -n service/ray-cluster-kuberay-head-svc 8265:8265\n", + "import time\n", + "\n", + "start_time = time.time()\n", + "job_id = client.submit_job(\n", + " entrypoint=\"python job.py\",\n", + " # Path to the local directory that contains the entrypoint file.\n", + " runtime_env={\n", + " \"working_dir\": \"/home/jovyan/rag-app\", # upload the local working directory to ray workers\n", + " \"pip\": [ \n", + " \"langchain\",\n", + " \"langchain-community\",\n", + " \"sentence-transformers\",\n", + " \"pypdf\",\n", + " \"pgvector\"\n", + " ]\n", + " }\n", + ")\n", + "\n", + "# The Ray job typically takes 5m-10m to complete.\n", + "print(\"Job submitted with ID:\", job_id)\n", + "while True:\n", + " status = client.get_job_status(job_id)\n", + " print(\"Job status:\", status)\n", + " print(\"Job info:\", client.get_job_info(job_id).message)\n", + " if status.is_terminal():\n", + " break\n", + " time.sleep(30)\n", + "\n", + "end_time = time.time()\n", + "job_duration = end_time - start_time\n", + "print(f\"Job completed in {job_duration} seconds.\")\n", + "\n", + "ray.shutdown()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/applications/rag/metadata.yaml b/applications/rag/metadata.yaml index 3159e7af5..effe2ffe8 100644 --- a/applications/rag/metadata.yaml +++ b/applications/rag/metadata.yaml @@ -70,7 +70,7 @@ spec: - name: dataset_embeddings_table_name description: Name of the table that stores vector embeddings for input dataset varType: string - defaultValue: netflix_reviews_db + defaultValue: rag_embeddings_db - name: disable_ray_cluster_network_policy description: Disables Kubernetes Network Policy for Ray Clusters for this demo. Defaulting to 'true' aka disabled pending fixes to the kuberay-monitoring module. This should be defaulted to false. varType: bool diff --git a/applications/rag/tests/test_rag.py b/applications/rag/tests/test_rag.py index d7da3a0e2..fbc0e607e 100644 --- a/applications/rag/tests/test_rag.py +++ b/applications/rag/tests/test_rag.py @@ -3,160 +3,125 @@ import requests def test_prompts(prompt_url): - testcases = [ - { - "prompt": "List the cast of Squid Game", - "expected_context": "This is a TV Show in called Squid Game added at September 17, 2021 whose director is and with cast: Lee Jung-jae, Park Hae-soo, Wi Ha-jun, Oh Young-soo, Jung Ho-yeon, Heo Sung-tae, Kim Joo-ryoung, Tripathi Anupam, You Seong-joo, Lee You-mi released at 2021. Its rating is: TV-MA. Its duration is 1 Season. Its description is Hundreds of cash-strapped players accept a strange invitation to compete in children's games. Inside, a tempting prize awaits — with deadly high stakes..", - "expected_substrings": ["Lee Jung-jae", "Park Hae-soo", "Wi Ha-jun", "Oh Young-soo", "Jung Ho-yeon", "Heo Sung-tae", "Kim Joo-ryoung", "Tripathi Anupam", "You Seong-joo", "Lee You-mi"], - }, - { - "prompt": "When was Squid Game released?", - "expected_context": "This is a TV Show in called Squid Game added at September 17, 2021 whose director is and with cast: Lee Jung-jae, Park Hae-soo, Wi Ha-jun, Oh Young-soo, Jung Ho-yeon, Heo Sung-tae, Kim Joo-ryoung, Tripathi Anupam, You Seong-joo, Lee You-mi released at 2021. Its rating is: TV-MA. Its duration is 1 Season. Its description is Hundreds of cash-strapped players accept a strange invitation to compete in children's games. Inside, a tempting prize awaits — with deadly high stakes..", - "expected_substrings": ["September 17, 2021"], - }, - { - "prompt": "What is the rating of Squid Game?", - "expected_context": "This is a TV Show in called Squid Game added at September 17, 2021 whose director is and with cast: Lee Jung-jae, Park Hae-soo, Wi Ha-jun, Oh Young-soo, Jung Ho-yeon, Heo Sung-tae, Kim Joo-ryoung, Tripathi Anupam, You Seong-joo, Lee You-mi released at 2021. Its rating is: TV-MA. Its duration is 1 Season. Its description is Hundreds of cash-strapped players accept a strange invitation to compete in children's games. Inside, a tempting prize awaits — with deadly high stakes..", - "expected_substrings": ["TV-MA"], - }, - { - "prompt": "List the cast of Avatar: The Last Airbender", - "expected_context": "This is a TV Show in United States called Avatar: The Last Airbender added at May 15, 2020 whose director is and with cast: Zach Tyler, Mae Whitman, Jack De Sena, Dee Bradley Baker, Dante Basco, Jessie Flower, Mako Iwamatsu released at 2007. Its rating is: TV-Y7. Its duration is 3 Seasons. Its description is Siblings Katara and Sokka wake young Aang from a long hibernation and learn he's an Avatar, whose air-bending powers can defeat the evil Fire Nation..", - "expected_substrings": ["Zach Tyler", "Mae Whitman", "Jack De Sena", "Dee Bradley Baker", "Dante Basco", "Jessie Flower", "Mako Iwamatsu"], - }, - { - "prompt": "When was Avatar: The Last Airbender added on Netflix?", - "expected_context": "This is a TV Show in United States called Avatar: The Last Airbender added at May 15, 2020 whose director is and with cast: Zach Tyler, Mae Whitman, Jack De Sena, Dee Bradley Baker, Dante Basco, Jessie Flower, Mako Iwamatsu released at 2007. Its rating is: TV-Y7. Its duration is 3 Seasons. Its description is Siblings Katara and Sokka wake young Aang from a long hibernation and learn he's an Avatar, whose air-bending powers can defeat the evil Fire Nation..", - "expected_substrings": ["May 15, 2020"], - }, - { - "prompt": "What is the rating of Avatar: The Last Airbender?", - "expected_context": "This is a TV Show in United States called Avatar: The Last Airbender added at May 15, 2020 whose director is and with cast: Zach Tyler, Mae Whitman, Jack De Sena, Dee Bradley Baker, Dante Basco, Jessie Flower, Mako Iwamatsu released at 2007. Its rating is: TV-Y7. Its duration is 3 Seasons. Its description is Siblings Katara and Sokka wake young Aang from a long hibernation and learn he's an Avatar, whose air-bending powers can defeat the evil Fire Nation..", - "expected_substrings": ["TV-Y7"], - }, - ] - - for testcase in testcases: - prompt = testcase["prompt"] - expected_context = testcase["expected_context"] - expected_substrings = testcase["expected_substrings"] - - print(f"Testing prompt: {prompt}") - data = {"prompt": prompt} - json_payload = json.dumps(data) - - headers = {'Content-Type': 'application/json'} - response = requests.post(prompt_url, data=json_payload, headers=headers) - response.raise_for_status() - - response = response.json() - context = response['response']['context'] - text = response['response']['text'] - user_prompt = response['response']['user_prompt'] - - print(f"Reply: {text}") - - assert user_prompt == prompt, f"unexpected user prompt: {user_prompt} != {prompt}" - assert context == expected_context, f"unexpected context: {context} != {expected_context}" - - for substring in expected_substrings: - assert substring in text, f"substring {substring} not in response:\n {text}" + try: + testcases = [ + { + "prompt": "What's kubernetes?", + }, + { + "prompt": "How create a kubernetes cluster?", + }, + { + "prompt": "What's kubectl?", + } + ] + + for testcase in testcases: + prompt = testcase["prompt"] + + print(f"Testing prompt: {prompt}") + data = {"prompt": prompt} + json_payload = json.dumps(data) + + headers = {'Content-Type': 'application/json'} + response = requests.post(prompt_url, data=json_payload, headers=headers) + response.raise_for_status() + + response = response.json() + print(response) + text = response['response'].get('text') + + print(f"Reply: {text}") + + assert response != None, f"Not response found: {response}" + assert text != None, f"Not text" + except Exception as err: + print(err) + raise err def test_prompts_nlp(prompt_url): - testcases = [ - { - "prompt": "List the cast of Squid Game", - "nlpFilterLevel": "0", - "expected_context": "This is a TV Show in called Squid Game added at September 17, 2021 whose director is and with cast: Lee Jung-jae, Park Hae-soo, Wi Ha-jun, Oh Young-soo, Jung Ho-yeon, Heo Sung-tae, Kim Joo-ryoung, Tripathi Anupam, You Seong-joo, Lee You-mi released at 2021. Its rating is: TV-MA. Its duration is 1 Season. Its description is Hundreds of cash-strapped players accept a strange invitation to compete in children's games. Inside, a tempting prize awaits — with deadly high stakes..", - "expected_substrings": ["Lee Jung-jae", "Park Hae-soo", "Wi Ha-jun", "Oh Young-soo", "Jung Ho-yeon", "Heo Sung-tae", "Kim Joo-ryoung", "Tripathi Anupam", "You Seong-joo", "Lee You-mi"], - }, - { - "prompt": "Which movie has a building that blows up?", - "nlpFilterLevel": "0", - "expected_context": "This is a Movie in United States called Oklahoma City added at April 19, 2017 whose director is Barak Goodman and with cast: released at 2017. Its rating is: TV-14. Its duration is 102 min. Its description is Domestic terrorist Timothy McVeigh's bomb attack on an Oklahoma City federal building in 1995 is chronicled in this examination of the shocking event..", - "expected_substrings": ["The movie \"Oklahoma City\" released in 2017, directed by Barak Goodman, and starring a cast, depicts the bombing of an Oklahoma City federal building in 1995. The building that blows up in the movie is the Alfred P. Murrah Federal Building."], - }, - { - "prompt": "Which movie has a building that blows up?", - "nlpFilterLevel": "50", - "expected_context": "This is a Movie in United States called Oklahoma City added at April 19, 2017 whose director is Barak Goodman and with cast: released at 2017. Its rating is: TV-14. Its duration is 102 min. Its description is Domestic terrorist Timothy McVeigh's bomb attack on an Oklahoma City federal building in 1995 is chronicled in this examination of the shocking event..", - "expected_substrings": ["The response is deemed inappropriate for display."], - }, - { - "prompt": "List the cast of Squid Game", - "nlpFilterLevel": "100", - "expected_context": "This is a TV Show in called Squid Game added at September 17, 2021 whose director is and with cast: Lee Jung-jae, Park Hae-soo, Wi Ha-jun, Oh Young-soo, Jung Ho-yeon, Heo Sung-tae, Kim Joo-ryoung, Tripathi Anupam, You Seong-joo, Lee You-mi released at 2021. Its rating is: TV-MA. Its duration is 1 Season. Its description is Hundreds of cash-strapped players accept a strange invitation to compete in children's games. Inside, a tempting prize awaits — with deadly high stakes..", - "expected_substrings": ["The response is deemed inappropriate for display."], - } - ] - - for testcase in testcases: - prompt = testcase["prompt"] - nlpFilterLevel = testcase["nlpFilterLevel"] - expected_context = testcase["expected_context"] - expected_substrings = testcase["expected_substrings"] - - print(f"Testing prompt: {prompt}") - data = {"prompt": prompt, "nlpFilterLevel": nlpFilterLevel} - json_payload = json.dumps(data) - - headers = {'Content-Type': 'application/json'} - response = requests.post(prompt_url, data=json_payload, headers=headers) - response.raise_for_status() - - response = response.json() - context = response['response']['context'] - text = response['response']['text'] - user_prompt = response['response']['user_prompt'] - - print(f"Reply: {text}") - - assert user_prompt == prompt, f"unexpected user prompt: {user_prompt} != {prompt}" - assert context == expected_context, f"unexpected context: {context} != {expected_context}" - - for substring in expected_substrings: - assert substring in text, f"substring {substring} not in response:\n {text}" + try: + testcases = [ + { + "prompt": "What's kubernetes?", + "nlpFilterLevel": "0", + }, + { + "prompt": "What's kubernetes?", + "nlpFilterLevel": "100", + }, + { + "prompt": "How create a kubernetes cluster?", + "nlpFilterLevel": "0", + }, + { + "prompt": "What's kubectl?", + "nlpFilterLevel": "50", + } + ] + + for testcase in testcases: + prompt = testcase["prompt"] + nlpFilterLevel = testcase["nlpFilterLevel"] + + print(f"Testing prompt: {prompt}") + data = {"prompt": prompt, "nlpFilterLevel": nlpFilterLevel} + json_payload = json.dumps(data) + + headers = {'Content-Type': 'application/json'} + response = requests.post(prompt_url, data=json_payload, headers=headers) + response.raise_for_status() + + response = response.json() + + text = response['response']['text'] + + + print(f"Reply: {text}") + + assert response != None, f"Not response found: {response}" + assert text != None, f"Not text" + except Exception as err: + print(err) + raise err def test_prompts_dlp(prompt_url): - testcases = [ - { - "prompt": "who worked with Robert De Niro and name one film they collaborated?", - "inspectTemplate": "projects/gke-ai-eco-dev/locations/global/inspectTemplates/DO-NOT-DELETE-e2e-test-inspect-template", - "deidentifyTemplate": "projects/gke-ai-eco-dev/locations/global/deidentifyTemplates/DO-NOT-DELETE-e2e-test-de-identify-template", - "expected_context": "This is a Movie in United States called GoodFellas added at January 1, 2021 whose director is Martin Scorsese and with cast: Robert De Niro, Ray Liotta, Joe Pesci, Lorraine Bracco, Paul Sorvino, Frank Sivero, Tony Darrow, Mike Starr, Frank Vincent, Chuck Low released at 1990. Its rating is: R. Its duration is 145 min. Its description is Former mobster Henry Hill recounts his colorful yet violent rise and fall in a New York crime family – a high-rolling dream turned paranoid nightmare..", - "expected_substrings": ["[PERSON_NAME] has worked with many talented actors and directors throughout his career. One film he collaborated with [PERSON_NAME] is \"GoodFellas,\" which was released in 1990. In this movie, [PERSON_NAME] played the role of [PERSON_NAME], a former mobster who recounts his rise and fall in a New York crime family."], - }, - ] - - for testcase in testcases: - prompt = testcase["prompt"] - inspectTemplate = testcase["inspectTemplate"] - deidentifyTemplate = testcase["deidentifyTemplate"] - expected_context = testcase["expected_context"] - expected_substrings = testcase["expected_substrings"] - - print(f"Testing prompt: {prompt}") - data = {"prompt": prompt, "inspectTemplate": inspectTemplate, "deidentifyTemplate": deidentifyTemplate} - json_payload = json.dumps(data) - - headers = {'Content-Type': 'application/json'} - response = requests.post(prompt_url, data=json_payload, headers=headers) - response.raise_for_status() - - response = response.json() - context = response['response']['context'] - text = response['response']['text'] - user_prompt = response['response']['user_prompt'] - - print(f"Reply: {text}") - - assert user_prompt == prompt, f"unexpected user prompt: {user_prompt} != {prompt}" - assert context == expected_context, f"unexpected context: {context} != {expected_context}" - - for substring in expected_substrings: - assert substring in text, f"substring {substring} not in response:\n {text}" - -prompt_url = sys.argv[1] -test_prompts(prompt_url) -test_prompts_nlp(prompt_url) -test_prompts_dlp(prompt_url) + try: + testcases = [ + { + "prompt": "What's kubernetes?", + "inspectTemplate": "projects/gke-ai-eco-dev/locations/global/inspectTemplates/DO-NOT-DELETE-e2e-test-inspect-template", + "deidentifyTemplate": "projects/gke-ai-eco-dev/locations/global/deidentifyTemplates/DO-NOT-DELETE-e2e-test-de-identify-template", + }, + ] + + for testcase in testcases: + prompt = testcase["prompt"] + inspectTemplate = testcase["inspectTemplate"] + deidentifyTemplate = testcase["deidentifyTemplate"] + + print(f"Testing prompt: {prompt}") + data = {"prompt": prompt, "inspectTemplate": inspectTemplate, "deidentifyTemplate": deidentifyTemplate} + json_payload = json.dumps(data) + + headers = {'Content-Type': 'application/json'} + response = requests.post(prompt_url, data=json_payload, headers=headers) + response.raise_for_status() + + response = response.json() + text = response['response']['text'] + + + print(f"Reply: {text}") + + assert response != None, f"Not response found: {response}" + assert text != None, f"Not text" + except Exception as err: + print(err) + raise err + +if __name__ == "__main__": + prompt_url = sys.argv[1] + test_prompts(prompt_url) + test_prompts_nlp(prompt_url) + test_prompts_dlp(prompt_url) diff --git a/applications/rag/variables.tf b/applications/rag/variables.tf index b8cef7bd5..abe74439e 100644 --- a/applications/rag/variables.tf +++ b/applications/rag/variables.tf @@ -99,7 +99,7 @@ variable "gcs_bucket" { variable "dataset_embeddings_table_name" { type = string description = "Name of the table that stores vector embeddings for input dataset" - default = "netflix_reviews_db" + default = "rag_embeddings_db" } variable "create_brand" { diff --git a/applications/rag/workloads.tfvars b/applications/rag/workloads.tfvars index 8c4bbf5d5..07d6623b7 100644 --- a/applications/rag/workloads.tfvars +++ b/applications/rag/workloads.tfvars @@ -47,7 +47,7 @@ rag_service_account = "rag-sa" jupyter_service_account = "jupyter-rag-sa" ## Embeddings table name - change this to the TABLE_NAME used in the notebook. -dataset_embeddings_table_name = "netflix_reviews_db" +dataset_embeddings_table_name = "rag_embeddings_db" ############################################################################################################## # If you don't want to enable IAP authenticated access for your endpoints, ignore everthing below this line. # diff --git a/cloudbuild.yaml b/cloudbuild.yaml index 9903b1e4f..5da2813f1 100644 --- a/cloudbuild.yaml +++ b/cloudbuild.yaml @@ -196,7 +196,6 @@ steps: - id: 'test rag' name: 'gcr.io/$PROJECT_ID/terraform' entrypoint: 'sh' - secretEnv: ['KAGGLE_USERNAME', 'KAGGLE_KEY'] args: - '-c' - | @@ -262,15 +261,13 @@ steps: echo "pass" > /workspace/rag_frontend_result.txt cd /workspace/ - sed -i "s//$$KAGGLE_USERNAME/g" ./applications/rag/example_notebooks/rag-kaggle-ray-sql-interactive.ipynb - sed -i "s//$$KAGGLE_KEY/g" ./applications/rag/example_notebooks/rag-kaggle-ray-sql-interactive.ipynb - gsutil cp ./applications/rag/example_notebooks/rag-kaggle-ray-sql-interactive.ipynb gs://gke-aieco-rag-$SHORT_SHA-$_BUILD_ID/ + gsutil cp ./applications/rag/example_notebooks/rag-ray-ingest-with-kubernetes-docs.ipynb gs://gke-aieco-rag-$SHORT_SHA-$_BUILD_ID/ kubectl exec -it -n rag-$SHORT_SHA-$_BUILD_ID $(kubectl get pod -l app=jupyterhub,component=hub -n rag-$SHORT_SHA-$_BUILD_ID -o jsonpath="{.items[0].metadata.name}") -- jupyterhub token admin --log-level=CRITICAL | xargs python3 ./applications/rag/notebook_starter.py # Wait for jupyterhub to trigger notebook pod startup sleep 5s kubectl wait --for=condition=Ready pod/jupyter-admin -n rag-$SHORT_SHA-$_BUILD_ID --timeout=500s - kubectl exec -it -n rag-$SHORT_SHA-$_BUILD_ID jupyter-admin -c notebook -- jupyter nbconvert --to script /data/rag-kaggle-ray-sql-interactive.ipynb - kubectl exec -it -n rag-$SHORT_SHA-$_BUILD_ID jupyter-admin -c notebook -- ipython /data/rag-kaggle-ray-sql-interactive.py + kubectl exec -it -n rag-$SHORT_SHA-$_BUILD_ID jupyter-admin -c notebook -- jupyter nbconvert --to python /data/rag-ray-ingest-with-kubernetes-docs.ipynb + kubectl exec -it -n rag-$SHORT_SHA-$_BUILD_ID jupyter-admin -c notebook -- ipython /data/rag-ray-ingest-with-kubernetes-docs.py python3 ./applications/rag/tests/test_rag.py "http://127.0.0.1:8081/prompt" echo "pass" > /workspace/rag_prompt_result.txt @@ -391,10 +388,4 @@ logsBucket: gs://ai-on-gke-build-logs options: substitutionOption: "ALLOW_LOOSE" machineType: "E2_HIGHCPU_8" -timeout: 5400s -availableSecrets: - secretManager: - - versionName: projects/gke-ai-eco-dev/secrets/cloudbuild-kaggle-username/versions/latest - env: "KAGGLE_USERNAME" - - versionName: projects/gke-ai-eco-dev/secrets/cloudbuild-kaggle-key/versions/latest - env: "KAGGLE_KEY" +timeout: 5400s \ No newline at end of file