update genai_cookbook eval content

Signed-off-by: Prithvi Kannan <[email protected]>
databricks · Oct 2, 2024 · c6cac49 · c6cac49
1 parent 288faff
commit c6cac49
Show file tree

Hide file tree

Showing 5 changed files with 10 additions and 58 deletions.
diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-generation.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-generation.md
@@ -1,4 +1,3 @@
-<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 #### Debugging generation quality
 
 ##### Debugging generation quality
@@ -11,7 +10,7 @@ The following is a step-by-step process to address **generation quality** issues
 
 
 
-1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
+1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook
 
 2. Use the queries to load MLflow traces of the records that retrieval quality issues.
 

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-retrieval.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-retrieval.md
@@ -1,4 +1,3 @@
-<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 #### Debugging retrieval quality
 
 ##### How to debug retrieval quality
@@ -12,7 +11,7 @@ Retrieval quality is arguably the most important component of a RAG application.
 
 Here's a step-by-step process to address **retrieval quality** issues:
 
-1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
+1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook
 
 2. Use the queries to load MLflow traces of the records that retrieval quality issues.
 

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1.md
@@ -1,4 +1,3 @@
-<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 ### **Step 5:** Identify the root cause of quality issues
 
 ```{image} ../images/5-hands-on/workflow_iterate.png
@@ -32,7 +31,7 @@ Each row your evaluation set will be tagged as follows:
 
 The approach depends on if your evaluation set contains the ground-truth responses to your questions - stored in `expected_response`.  If you have `expected_response` available, use the first table below.  Otherwise, use the second table.
 
-1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
+1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook
 2. Run the cells that are relevant to your use case e.g., if you do or don't have `expected_response`
 3. Review the output tables to determine the most frequent root cause in your application
 4. For each root cause, follow the steps below to further debug and identify potential fixes:

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-2-data-pipeline.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-2-data-pipeline.md
@@ -1,50 +1,8 @@
-<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 # **![Data pipeline](../images/5-hands-on/data_pipeline.png)** Implement data pipeline fixes
 
-Follow these steps to modify your data pipeline and run it to:
-1. Create a new Vector Index 
-2. Create an MLflow Run with the data pipeline's metadata
-
-The resulting MLflow Run will be reference by the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook.
-
-There are two approaches to modifying the data pipeline:
-1. [**Implement a single fix at a time:**](#approach-1-implement-a-single-fix-at-a-time) In this approach, you configure and run a single data pipeline at once.  This mode is best if you want to try a single embedding model, test out a single new parser, etc.  We suggest starting here to get familiar with these notebooks.
-2. [**Implement multiple fix at once:**](#approach-2-implement-multiple-fix-at-once) In this approach, also called a sweep, you, in parallel, run multiple data pipelines that each have a different configuration.  This mode is best if you want to "sweep" across many different strategies, for example, evaluate 3 PDF parsers or evaluate many different chunk sizes.
-
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
-```
-
-### Approach 1: Implement a single fix at a time
-
-1. Open the [`B_quality_iteration/data_pipeline_fixes/single_fix/00_config`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix/00_config.py) Notebook
-2. Either:
-    - Follow the instructions there to implement a [new configuration](#configuration-settings-deep-dive) provided by this Cookbook
-    - Follow these [steps](#implementing-a-custom-parserchunker) to implement custom code for a parsing or chunking.
-3. Run the pipeline, by either:
-    - Opening & running the [00_Run_Entire_Pipeline](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix/00_Run_Entire_Pipeline.py) Notebook
-    - Following these [steps](#running-the-pipeline-manually) to run each step of the pipeline manually
-4. Add the name of the resulting MLflow Run that is outputted to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable in [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook
-
-
-```{note}
-The data preparation pipeline employs Spark Structured Streaming to incrementally load and process files. This entails that files already loaded and prepared are tracked in checkpoints and won't be reprocessed. Only newly added files will be loaded, prepared, and appended to the corresponding tables.
-
-Therefore, if you wish to __rerun the entire pipeline from scratch__ and reprocess all documents, you need to delete the checkpoints and tables. You can accomplish this by using the [reset_tables_and_checkpoints](./reset_tables_and_checkpoints.py) notebook.
-```
-
-### Approach 2: Implement multiple fix at once
-
-1. Open the [`B_quality_iteration/data_pipeline_fixes/multiple_fixes/00_Run_Multiple_Pipelines`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/multiple_fixes/00_Run_Multiple_Pipelines.py) Notebook
-2. Follow the instructions in the Notebook to add 2+ configurations of the data pipeline to run
-3. Run the Notebook to execute these pipelines
-4. Add the names of the resulting MLflow Runs that are outputted to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable in [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook
-
-### Appendix
-
-```{note}
-You can find the notebooks referenced below in the [`single_fix`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix) and [`multiple_fixes`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/multiple_fixes) directories depending on whether you are implementing a single fix or multiple fixes at a time.
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code).
 ```
 
 #### Configuration settings deep dive

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-2.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-2.md
@@ -1,4 +1,3 @@
-<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 ### **Step 6:** Iteratively implement & evaluate quality fixes
 
 ```{image} ../images/5-hands-on/workflow_iterate.png
@@ -67,19 +66,17 @@ As a reminder, there are 3 types of potential fixes:
 -->
 
 #### Instructions
-For all types, you will use the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to evaluate the resulting chain versus your baseline configuration (at first, this is your POC) and pick a "winner".  This notebook will help you pick the winning experiment and deploy it to the Review App or a production-ready, scalable REST API.
+Based on which type of fix you want to make, modify the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/00_global_config.py), [`02_data_pipeline`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/02_data_pipeline.py), or the [`03_agent_proof_of_concept`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/03_agent_proof_of_concept.py). Use the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) notebook to evaluate the resulting chain versus your baseline configuration (at first, this is your POC) and pick a "winner".  This notebook will help you pick the winning experiment and deploy it to the Review App or a production-ready, scalable REST API.
 
 1. Open the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook
 2. Based on the type of fix you are implementing:
       - **![Data pipeline](../images/5-hands-on/data_pipeline.png)**
-         1. Follow these [instructions](./5-hands-on-improve-quality-step-2-data-pipeline.md) to create the new data pipeline & get the name of the resulting MLflow Run.
-         2. Add the run name(s) to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable
+         1. Follow these [instructions](./5-hands-on-improve-quality-step-2-data-pipeline.md) to create the new data pipeline.
       - **![Chain config](../images/5-hands-on/chain_config.png)** 
-         1. Follow the instructions in the `Chain configuration` section of the [`02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to add chain configuration fixes to the `CHAIN_CONFIG_FIXES` variable.
+         1. Modify the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/00_global_config.py) .
       - **![Chain code](../images/5-hands-on/chain_code.png)**
-         1. Create a modified chain code file and save it to the [`B_quality_iteration/chain_code_fixes`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/chain_code_fixes) folder. Alternatively, select one of the provided chain code fixes from that folder.
-         2. Follow the instructions in the `Chain code` section of the [`02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to add the chain code file and any additional chain configuration that is required to the `CHAIN_CODE_FIXES` variable
-3. Run the notebook from the `Run evaluation` cell to
+         1. Create a modified chain code file similar to [`agents/function_calling_agent_w_retriever_tool`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/agents/function_calling_agent_w_retriever_tool.py) and reference it from it to the [`03_agent_proof_of_concept`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/03_agent_proof_of_concept.py) notebook.
+3. Run the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) notebook and use MLflow to
       - Evaluate each fix
       - Determine the fix with the best quality/cost/latency metrics
       - Deploy the best one to the Review App and a production-ready REST API to get stakeholder feedback