Refined cookbook (#45)

* Agent config * Global config * Global config * Data pipeline v1 * Eric's updates to refine cookbook * Move function calling agent config into its own folder * RAG Only agent * config folder * remove eric hardcoded vs endpoint * respect token config in data pipeline * Configs to a single cell * Fix print of config * Speed up parsing by caching the parsed table * Fix error handling in create VS index * REmove debug code * Clean up uses new cookbook config * Refactor the agent notebooks * REmove extra init * fix data pipeline mlflow tag * Default to llama * Improve debug of failed records * Add debug code * Simplifications Signed-off-by: Sid Murching <[email protected]> * WIP updates, made it partway through agent notebooks Signed-off-by: Sid Murching <[email protected]> * WIP debugging the function_calling_agent_mlflow_sdk.py Signed-off-by: Sid Murching <[email protected]> * WIP. Current state: 1) Dogfood seems to have an issue where tool calling is returning a nonexistent function name. See trace in https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468680?o=6051921418418893#command/397386389372146 2) For RAG only agent, getting 'AttributeError: 'LLMConfig' object has no attribute 'tools'' because RAG agent config object doesn't have a tools field. See https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468746?o=6051921418418893#command/397386389372955 Signed-off-by: Sid Murching <[email protected]> * WIP Signed-off-by: Sid Murching <[email protected]> * WIP updating pydantic config structure Signed-off-by: Sid Murching <[email protected]> * More progress. Remaining issues: * utils modules are not importable when agent notebooks are run directly * Need to fix logging util * Autologged traces and manual traces in langchain don't interleave properly Signed-off-by: Sid Murching <[email protected]> * WIP Signed-off-by: Sid Murching <[email protected]> * WIP, more progress, remaining items: * Restore ModelConfig across examples * Switch back to OAI SDK * Document tools class Signed-off-by: Sid Murching <[email protected]> * WIP, moving away from ModelConfig due to several devex issues Signed-off-by: Sid Murching <[email protected]> * WIP, got most of the code to work Signed-off-by: Sid Murching <[email protected]> * Clean up traces for langchain Signed-off-by: Sid Murching <[email protected]> * Update documentation Signed-off-by: Sid Murching <[email protected]> * Switch back to printing doc string, since it's cleaner than help() Signed-off-by: Sid Murching <[email protected]> * Data pipeline - refactor all utils for ease of use & ability to run in local IDE. * Update git ignore * locally tested tool code * working func call agent * remove mlflow agent * databricks utils * data pipeline bugs + switch to content from doc_content * working data pipeline (except for install of utils) * refactor agent configs to make room for genie * removed unused code * Genie agent * commit for serializable model with all the old code in comments (just in case) * remove old code from serializable model * shared config loader * ignore tmp configs * tmp config readme * initial multi agent * multi agent works w/ function calling not tested with genie * datapipline configs suport serializedmodel * all agents work locally * clean up * add UC tool * part 1 refactor * tools refactor pt 2 * make function call agent work with new dirs * agent logging works * add tmp files to see full diff * fc agent actually works, remove dead code agent that wasnt refactored * rename fc agent * rename mutli agent * rename common to shared in agents * move config base to __init__ * move get_current_user_info to db utils * missing imports * mutli agent works locally * fix bug where is the fc agent is called by supervisor directly after another agent, it switched the last message from role=assistant to role=user * local funcs work * multi agent refactor for better trace & ease of understanding * mlflow traces show supervisor COT * improve mlflow traces * print response vs messages info * bug fixes in agent supervisor * local func tracing + dict for uc tool result * uc tool parsing code for spark exceptions * uc tool is pytest-able * sample sku tool works * refactor stragglers * fc test code * remove rag only agent * simplify errors * sku translator sample tests * tools nb * temp nbs * pytest * tests for sample tools * sample code exec tool * data pipeline config dumps the actual uc locations * fix set_model * remove dependency on index's source table in vector search tool * remove commented out code * fix set_model and debug * load_config refactor * Tools creation notebook * Genie agent uses new config loader * multi agent supervisor works locally * Tool calling agent * multi agent * clear data pipeline outputs * read me updates * Multi-agent works with endpoint * tool calling agent nb * load config doc string * update noteboks * update gitignnore * clean up readme * remove dead code * clean up readme * move new openai sdk to seperate folder * restore existing agent code * mlflow tracing disabled hacks * sample tools * notebook tweaks * load_config tweaks * poetry env * Tools deployment works * Deployment logic same on all agents --------- Signed-off-by: Sid Murching <[email protected]> Co-authored-by: epec254 <epec254> Co-authored-by: Sid Murching <[email protected]>
databricks · Nov 19, 2024 · c3f12c7 · c3f12c7
1 parent 09c476b
commit c3f12c7
Showing 68 changed files with 14,795 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,11 @@ __pycache__
 
 # Exclude `databricks sync` CLI command snapshots
 .databricks
+openai_sdk_agent_app_sample_code/configs/*/*.yaml
+openai_sdk_agent_app_sample_code/configs/*.yaml
+dist/
+mlruns/
+
+_scratch_pad/
+openai_sdk_agent_app_sample_code/_scratch_pad/
+.vscode/
diff --git a/openai_sdk_agent_app_sample_code/01_data_pipeline.ipynb b/openai_sdk_agent_app_sample_code/01_data_pipeline.ipynb
diff --git a/openai_sdk_agent_app_sample_code/02_agent_setup.ipynb b/openai_sdk_agent_app_sample_code/02_agent_setup.ipynb
@@ -0,0 +1,282 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "d0640741-6d84-482a-aa79-f87b04d04023",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "## 👉 START HERE: How to use this notebook\n",
+    "\n",
+    "### Step 1: Agent storage configuration\n",
+    "\n",
+    "This notebook initializes a `AgentStorageConfig` Pydantic class to define the locations where the Agent's code/config and its supporting data & metadata is stored in the Unity Catalog:\n",
+    "- **Unity Catalog Model:** Stores staging/production versions of the Agent's code/config\n",
+    "- **MLflow Experiment:** Stores every development version of the Agent's code/config, each version's associated quality/cost/latency evaluation results, and any MLflow Traces from your development & evaluation processes\n",
+    "- **Evaluation Set Delta Table:** Stores the Agent's evaluation set\n",
+    "\n",
+    "This notebook does the following:\n",
+    "1. Validates the provided locations exist.\n",
+    "2. Serializes this configuration to `config/agent_storage_config.yaml` so other notebooks can use it"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "7702011a-84dd-4281-bba1-ea9e2b5e551d",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "**Important note:** Throughout this notebook, we indicate which cells you:\n",
+    "- ✅✏️ *should* customize - these cells contain config settings to change\n",
+    "- 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration\n",
+    "\n",
+    "*Cells that don't require customization still need to be run!*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "f8963d6e-3123-4095-bb92-9d508c52ed41",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Install Python libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "0a145c3b-d3d9-4b95-b7f6-22e1d8e991c6",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# %pip install -qqqq -U -r requirements.txt\n",
+    "# %restart_python"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🚫✏️ Connect to Databricks\n",
+    "\n",
+    "If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow.  If this running in a Databricks Notebook, these values are already set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from mlflow.utils import databricks_utils as du\n",
+    "\n",
+    "if not du.is_in_databricks_notebook():\n",
+    "    from databricks.connect import DatabricksSession\n",
+    "    import os\n",
+    "\n",
+    "    spark = DatabricksSession.builder.getOrCreate()\n",
+    "    os.environ[\"MLFLOW_TRACKING_URI\"] = \"databricks\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "a9feb28c-c72b-49b2-bbc4-a9bd4721a7cd",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Get current user info to set default values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "7824cc0a-1b29-4cf9-a974-2c5ef885979f",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.databricks_utils import get_current_user_info\n",
+    "\n",
+    "user_email, user_name, default_catalog = get_current_user_info(spark)\n",
+    "\n",
+    "print(f\"User email: {user_email}\")\n",
+    "print(f\"User name: {user_name}\")\n",
+    "print(f\"Default UC catalog: {default_catalog}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "4b684188-d4eb-4944-86ae-9942a68308c2",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### ✅✏️ Configure your Agent's storage locations\n",
+    "\n",
+    "Either review & accept the default values or enter your preferred location."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "64682c1f-7e61-430e-84c9-4fb9cad8152b",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.config.shared.agent_storage_location import AgentStorageConfig\n",
+    "from cookbook.databricks_utils import get_mlflow_experiment_url\n",
+    "import mlflow\n",
+    "\n",
+    "# Default values below for `AgentStorageConfig` \n",
+    "agent_name = \"my_agent_2\"\n",
+    "uc_catalog_name = f\"{default_catalog}\"\n",
+    "uc_schema_name = f\"{user_name}_agents\"\n",
+    "uc_catalog_name = f\"ep\"\n",
+    "uc_schema_name = f\"cookbook_local_test\"\n",
+    "\n",
+    "# Agent storage configuration\n",
+    "agent_storage_config = AgentStorageConfig(\n",
+    "    uc_model_name=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}\",  # UC model to store staging/production versions of the Agent's code/config\n",
+    "    evaluation_set_uc_table=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}_eval_set\",  # UC table to store the evaluation set\n",
+    "    mlflow_experiment_name=f\"/Users/{user_email}/{agent_name}_mlflow_experiment\",  # MLflow Experiment to store development versions of the Agent and their associated quality/cost/latency evaluation results + MLflow Traces\n",
+    ")\n",
+    "\n",
+    "# Validate the UC catalog and schema for the Agent'smodel & evaluation table\n",
+    "is_valid, msg = agent_storage_config.validate_catalog_and_schema()\n",
+    "if not is_valid:\n",
+    "    raise Exception(msg)\n",
+    "\n",
+    "# Set the MLflow experiment, validating the path is valid\n",
+    "experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)\n",
+    "# If running in a local IDE, set the MLflow experiment name as an environment variable\n",
+    "os.environ[\"MLFLOW_EXPERIMENT_NAME\"] = agent_storage_config.mlflow_experiment_name\n",
+    "\n",
+    "print(f\"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "7a49117d-f136-41fa-807d-8be60b863fa9",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "source": [
+    "### 🚫✏️ Save the configuration for use by other notebooks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "application/vnd.databricks.v1+cell": {
+     "cellMetadata": {},
+     "inputWidgets": {},
+     "nuid": "6dd99015-5b0d-420b-8a3e-067d84b84dc7",
+     "showTitle": false,
+     "tableResultSettingsMap": {},
+     "title": ""
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from cookbook.config import serializable_config_to_yaml_file\n",
+    "\n",
+    "serializable_config_to_yaml_file(agent_storage_config, \"./configs/agent_storage_config.yaml\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "application/vnd.databricks.v1+notebook": {
+   "dashboards": [],
+   "environmentMetadata": null,
+   "language": "python",
+   "notebookMetadata": {
+    "pythonIndentUnit": 2
+   },
+   "notebookName": "00_shared_config",
+   "widgets": {}
+  },
+  "kernelspec": {
+   "display_name": "genai-cookbook-T2SdtsNM-py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}