Skip to content

Commit

Permalink
Refined cookbook (#45)
Browse files Browse the repository at this point in the history
* Agent config

* Global config

* Global config

* Data pipeline v1

* Eric's updates to refine cookbook

* Move function calling agent config into its own folder

* RAG Only agent

* config folder

* remove eric hardcoded vs endpoint

* respect token config in data pipeline

* Configs to a single cell

* Fix print of config

* Speed up parsing by caching the parsed table

* Fix error handling in create VS index

* REmove debug code

* Clean up uses new cookbook config

* Refactor the agent notebooks

* REmove extra init

* fix data pipeline mlflow tag

* Default to llama

* Improve debug of failed records

* Add debug code

* Simplifications

Signed-off-by: Sid Murching <[email protected]>

* WIP updates, made it partway through agent notebooks

Signed-off-by: Sid Murching <[email protected]>

* WIP debugging the function_calling_agent_mlflow_sdk.py

Signed-off-by: Sid Murching <[email protected]>

* WIP. Current state:
1) Dogfood seems to have an issue where tool calling is returning a nonexistent function name. See trace in https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468680?o=6051921418418893#command/397386389372146
2) For RAG only agent, getting 'AttributeError: 'LLMConfig' object has no attribute 'tools'' because RAG agent config object doesn't have a tools field. See https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468746?o=6051921418418893#command/397386389372955

Signed-off-by: Sid Murching <[email protected]>

* WIP

Signed-off-by: Sid Murching <[email protected]>

* WIP updating pydantic config structure

Signed-off-by: Sid Murching <[email protected]>

* More progress. Remaining issues:
* utils modules are not importable when agent notebooks are run directly
* Need to fix logging util
* Autologged traces and manual traces in langchain don't interleave properly

Signed-off-by: Sid Murching <[email protected]>

* WIP

Signed-off-by: Sid Murching <[email protected]>

* WIP, more progress, remaining items:
* Restore ModelConfig across examples
* Switch back to OAI SDK
* Document tools class

Signed-off-by: Sid Murching <[email protected]>

* WIP, moving away from ModelConfig due to several devex issues

Signed-off-by: Sid Murching <[email protected]>

* WIP, got most of the code to work

Signed-off-by: Sid Murching <[email protected]>

* Clean up traces for langchain

Signed-off-by: Sid Murching <[email protected]>

* Update documentation

Signed-off-by: Sid Murching <[email protected]>

* Switch back to printing doc string, since it's cleaner than help()

Signed-off-by: Sid Murching <[email protected]>

* Data pipeline - refactor all utils for ease of use & ability to run in local IDE.

* Update git ignore

* locally tested tool code

* working func call agent

* remove mlflow agent

* databricks utils

* data pipeline bugs + switch to content from doc_content

* working data pipeline (except for install of utils)

* refactor agent configs to make room for genie

* removed unused code

* Genie agent

* commit for serializable model with all the old code in comments (just in case)

* remove old code from serializable model

* shared config loader

* ignore tmp configs

* tmp config readme

* initial multi agent

* multi agent works w/ function calling not tested with genie

* datapipline configs suport serializedmodel

* all agents work locally

* clean up

* add UC tool

* part 1 refactor

* tools refactor pt 2

* make function call agent work with new dirs

* agent logging works

* add tmp files to see full diff

* fc agent actually works, remove dead code agent that wasnt refactored

* rename fc agent

* rename mutli agent

* rename common to shared in agents

* move config base to __init__

* move get_current_user_info to db utils

* missing imports

* mutli agent works locally

* fix bug where is the fc agent is called by supervisor directly after another agent, it switched the last message from role=assistant to role=user

* local funcs work

* multi agent refactor for better trace & ease of understanding

* mlflow traces show supervisor COT

* improve mlflow traces

* print response vs messages info

* bug fixes in agent supervisor

* local func tracing + dict for uc tool result

* uc tool parsing code for spark exceptions

* uc tool is pytest-able

* sample sku tool works

* refactor stragglers

* fc test code

* remove rag only agent

* simplify errors

* sku translator sample tests

* tools nb

* temp nbs

* pytest

* tests for sample tools

* sample code exec tool

* data pipeline config dumps the actual uc locations

* fix set_model

* remove dependency on index's source table in vector search tool

* remove commented out code

* fix set_model and debug

* load_config  refactor

* Tools creation notebook

* Genie agent uses new config loader

* multi agent supervisor works locally

* Tool calling agent

* multi agent

* clear data pipeline outputs

* read me updates

* Multi-agent works with endpoint

* tool calling agent nb

* load config doc string

* update noteboks

* update gitignnore

* clean up readme

* remove dead code

* clean up readme

* move new openai sdk to seperate folder

* restore existing agent code

* mlflow tracing disabled hacks

* sample tools

* notebook tweaks

* load_config tweaks

* poetry env

* Tools deployment works

* Deployment logic same on all agents

---------

Signed-off-by: Sid Murching <[email protected]>
Co-authored-by: epec254 <epec254>
Co-authored-by: Sid Murching <[email protected]>
  • Loading branch information
epec254 and smurching authored Nov 19, 2024
1 parent 09c476b commit c3f12c7
Showing 68 changed files with 14,795 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -7,3 +7,11 @@ __pycache__

# Exclude `databricks sync` CLI command snapshots
.databricks
openai_sdk_agent_app_sample_code/configs/*/*.yaml
openai_sdk_agent_app_sample_code/configs/*.yaml
dist/
mlruns/

_scratch_pad/
openai_sdk_agent_app_sample_code/_scratch_pad/
.vscode/
947 changes: 947 additions & 0 deletions openai_sdk_agent_app_sample_code/01_data_pipeline.ipynb

Large diffs are not rendered by default.

282 changes: 282 additions & 0 deletions openai_sdk_agent_app_sample_code/02_agent_setup.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "d0640741-6d84-482a-aa79-f87b04d04023",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"## 👉 START HERE: How to use this notebook\n",
"\n",
"### Step 1: Agent storage configuration\n",
"\n",
"This notebook initializes a `AgentStorageConfig` Pydantic class to define the locations where the Agent's code/config and its supporting data & metadata is stored in the Unity Catalog:\n",
"- **Unity Catalog Model:** Stores staging/production versions of the Agent's code/config\n",
"- **MLflow Experiment:** Stores every development version of the Agent's code/config, each version's associated quality/cost/latency evaluation results, and any MLflow Traces from your development & evaluation processes\n",
"- **Evaluation Set Delta Table:** Stores the Agent's evaluation set\n",
"\n",
"This notebook does the following:\n",
"1. Validates the provided locations exist.\n",
"2. Serializes this configuration to `config/agent_storage_config.yaml` so other notebooks can use it"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "7702011a-84dd-4281-bba1-ea9e2b5e551d",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"**Important note:** Throughout this notebook, we indicate which cells you:\n",
"- ✅✏️ *should* customize - these cells contain config settings to change\n",
"- 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration\n",
"\n",
"*Cells that don't require customization still need to be run!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "f8963d6e-3123-4095-bb92-9d508c52ed41",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"### 🚫✏️ Install Python libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "0a145c3b-d3d9-4b95-b7f6-22e1d8e991c6",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"# %pip install -qqqq -U -r requirements.txt\n",
"# %restart_python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🚫✏️ Connect to Databricks\n",
"\n",
"If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow. If this running in a Databricks Notebook, these values are already set."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from mlflow.utils import databricks_utils as du\n",
"\n",
"if not du.is_in_databricks_notebook():\n",
" from databricks.connect import DatabricksSession\n",
" import os\n",
"\n",
" spark = DatabricksSession.builder.getOrCreate()\n",
" os.environ[\"MLFLOW_TRACKING_URI\"] = \"databricks\""
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "a9feb28c-c72b-49b2-bbc4-a9bd4721a7cd",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"### 🚫✏️ Get current user info to set default values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "7824cc0a-1b29-4cf9-a974-2c5ef885979f",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"from cookbook.databricks_utils import get_current_user_info\n",
"\n",
"user_email, user_name, default_catalog = get_current_user_info(spark)\n",
"\n",
"print(f\"User email: {user_email}\")\n",
"print(f\"User name: {user_name}\")\n",
"print(f\"Default UC catalog: {default_catalog}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "4b684188-d4eb-4944-86ae-9942a68308c2",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"### ✅✏️ Configure your Agent's storage locations\n",
"\n",
"Either review & accept the default values or enter your preferred location."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "64682c1f-7e61-430e-84c9-4fb9cad8152b",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"from cookbook.config.shared.agent_storage_location import AgentStorageConfig\n",
"from cookbook.databricks_utils import get_mlflow_experiment_url\n",
"import mlflow\n",
"\n",
"# Default values below for `AgentStorageConfig` \n",
"agent_name = \"my_agent_2\"\n",
"uc_catalog_name = f\"{default_catalog}\"\n",
"uc_schema_name = f\"{user_name}_agents\"\n",
"uc_catalog_name = f\"ep\"\n",
"uc_schema_name = f\"cookbook_local_test\"\n",
"\n",
"# Agent storage configuration\n",
"agent_storage_config = AgentStorageConfig(\n",
" uc_model_name=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}\", # UC model to store staging/production versions of the Agent's code/config\n",
" evaluation_set_uc_table=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}_eval_set\", # UC table to store the evaluation set\n",
" mlflow_experiment_name=f\"/Users/{user_email}/{agent_name}_mlflow_experiment\", # MLflow Experiment to store development versions of the Agent and their associated quality/cost/latency evaluation results + MLflow Traces\n",
")\n",
"\n",
"# Validate the UC catalog and schema for the Agent'smodel & evaluation table\n",
"is_valid, msg = agent_storage_config.validate_catalog_and_schema()\n",
"if not is_valid:\n",
" raise Exception(msg)\n",
"\n",
"# Set the MLflow experiment, validating the path is valid\n",
"experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)\n",
"# If running in a local IDE, set the MLflow experiment name as an environment variable\n",
"os.environ[\"MLFLOW_EXPERIMENT_NAME\"] = agent_storage_config.mlflow_experiment_name\n",
"\n",
"print(f\"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "7a49117d-f136-41fa-807d-8be60b863fa9",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"source": [
"### 🚫✏️ Save the configuration for use by other notebooks"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "6dd99015-5b0d-420b-8a3e-067d84b84dc7",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"from cookbook.config import serializable_config_to_yaml_file\n",
"\n",
"serializable_config_to_yaml_file(agent_storage_config, \"./configs/agent_storage_config.yaml\")"
]
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"dashboards": [],
"environmentMetadata": null,
"language": "python",
"notebookMetadata": {
"pythonIndentUnit": 2
},
"notebookName": "00_shared_config",
"widgets": {}
},
"kernelspec": {
"display_name": "genai-cookbook-T2SdtsNM-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading

0 comments on commit c3f12c7

Please sign in to comment.