databricks · s-udhaya · Dec 4, 2024 · Dec 5, 2024 · Dec 5, 2024 · Dec 6, 2024
diff --git a/langgraph_agent_app_sample_code/01_data_pipeline.ipynb b/langgraph_agent_app_sample_code/01_data_pipeline.ipynb
diff --git a/langgraph_agent_app_sample_code/02_agent_setup.ipynb b/langgraph_agent_app_sample_code/02_agent_setup.ipynb
@@ -0,0 +1,303 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "d0640741-6d84-482a-aa79-f87b04d04023",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "## 👉 START HERE: How to use this notebook\n",
+    "\n",
+    "### Step 1: Agent storage configuration\n",
+    "\n",
+    "This notebook initializes a `AgentStorageConfig` Pydantic class to define the locations where the Agent's code/config and its supporting data & metadata is stored in the Unity Catalog:\n",
+    "- **Unity Catalog Model:** Stores staging/production versions of the Agent's code/config\n",
+    "- **MLflow Experiment:** Stores every development version of the Agent's code/config, each version's associated quality/cost/latency evaluation results, and any MLflow Traces from your development & evaluation processes\n",
+    "- **Evaluation Set Delta Table:** Stores the Agent's evaluation set\n",
+    "\n",
+    "This notebook does the following:\n",
+    "1. Validates the provided locations exist.\n",
+    "2. Serializes this configuration to `config/agent_storage_config.yaml` so other notebooks can use it"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "7702011a-84dd-4281-bba1-ea9e2b5e551d",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "**Important note:** Throughout this notebook, we indicate which cells you:\n",
+    "- ✅✏️ *should* customize - these cells contain config settings to change\n",
+    "- 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration\n",
+    "\n",
+    "*Cells that don't require customization still need to be run!*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "f8963d6e-3123-4095-bb92-9d508c52ed41",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Install Python libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {
+      "byteLimit": 2048000,
+      "rowLimit": 10000
+     },
+     "inputWidgets": {},
+     "nuid": "0a145c3b-d3d9-4b95-b7f6-22e1d8e991c6",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%pip install -qqqq -U -r requirements.txt\n",
+    "%restart_python"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "18c01bb2-81d1-41aa-a694-cc939c20652f",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Connect to Databricks\n",
+    "\n",
+    "If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow.  If this running in a Databricks Notebook, these values are already set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "992cadf0-292e-4ecb-a815-903fe378612d",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from mlflow.utils import databricks_utils as du\n",
+    "\n",
+    "if not du.is_in_databricks_notebook():\n",
+    "    from databricks.connect import DatabricksSession\n",
+    "    import os\n",
+    "\n",
+    "    spark = DatabricksSession.builder.getOrCreate()\n",
+    "    os.environ[\"MLFLOW_TRACKING_URI\"] = \"databricks\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "a9feb28c-c72b-49b2-bbc4-a9bd4721a7cd",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Get current user info to set default values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {
+      "byteLimit": 2048000,
+      "rowLimit": 10000
+     },
+     "inputWidgets": {},
+     "nuid": "7824cc0a-1b29-4cf9-a974-2c5ef885979f",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.databricks_utils import get_current_user_info\n",
+    "\n",
+    "user_email, user_name, default_catalog = get_current_user_info(spark)\n",
+    "\n",
+    "print(f\"User email: {user_email}\")\n",
+    "print(f\"User name: {user_name}\")\n",
+    "print(f\"Default UC catalog: {default_catalog}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "4b684188-d4eb-4944-86ae-9942a68308c2",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### ✅✏️ Configure your Agent's storage locations\n",
+    "\n",
+    "Either review & accept the default values or enter your preferred location."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {
+      "byteLimit": 2048000,
+      "rowLimit": 10000
+     },
+     "inputWidgets": {},
+     "nuid": "64682c1f-7e61-430e-84c9-4fb9cad8152b",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.config.shared.agent_storage_location import AgentStorageConfig\n",
+    "from cookbook.databricks_utils import get_mlflow_experiment_url\n",
+    "import mlflow\n",
+    "import os\n",
+    "\n",
+    "# Default values below for `AgentStorageConfig` \n",
+    "agent_name = \"my_agent_2\"\n",
+    "uc_catalog_name = f\"shared\"\n",
+    "uc_schema_name = f\"cookbook_langgraph_udhay\"\n",
+    "\n",
+    "# Agent storage configuration\n",
+    "agent_storage_config = AgentStorageConfig(\n",
+    "    uc_model_name=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}\",  # UC model to store staging/production versions of the Agent's code/config\n",
+    "    evaluation_set_uc_table=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}_eval_set\",  # UC table to store the evaluation set\n",
+    "    mlflow_experiment_name=f\"/Users/{user_email}/{agent_name}_mlflow_experiment\",  # MLflow Experiment to store development versions of the Agent and their associated quality/cost/latency evaluation results + MLflow Traces\n",
+    ")\n",
+    "\n",
+    "# Validate the UC catalog and schema for the Agent'smodel & evaluation table\n",
+    "is_valid, msg = agent_storage_config.validate_catalog_and_schema()\n",
+    "if not is_valid:\n",
+    "    raise Exception(msg)\n",
+    "\n",
+    "# Set the MLflow experiment, validating the path is valid\n",
+    "experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)\n",
+    "# If running in a local IDE, set the MLflow experiment name as an environment variable\n",
+    "os.environ[\"MLFLOW_EXPERIMENT_NAME\"] = agent_storage_config.mlflow_experiment_name\n",
+    "\n",
+    "print(f\"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "7a49117d-f136-41fa-807d-8be60b863fa9",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Save the configuration for use by other notebooks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {
+      "byteLimit": 2048000,
+      "rowLimit": 10000
+     },
+     "inputWidgets": {},
+     "nuid": "6dd99015-5b0d-420b-8a3e-067d84b84dc7",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.config import serializable_config_to_yaml_file\n",
+    "\n",
+    "serializable_config_to_yaml_file(agent_storage_config, \"./configs/agent_storage_config.yaml\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "application/vnd.databricks.v1+notebook": {
+   "computePreferences": null,
+   "dashboards": [],
+   "environmentMetadata": null,
+   "language": "python",
+   "notebookMetadata": {
+    "pythonIndentUnit": 4
+   },
+   "notebookName": "02_agent_setup",
+   "widgets": {}
+  },
+  "kernelspec": {
+   "display_name": "genai-cookbook-T2SdtsNM-py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}