Skip to content

Commit

Permalink
update genai_cookbook eval content
Browse files Browse the repository at this point in the history
Signed-off-by: Prithvi Kannan <[email protected]>
  • Loading branch information
prithvikannan committed Oct 2, 2024
1 parent 288faff commit c6cac49
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 58 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
#### Debugging generation quality

##### Debugging generation quality
Expand All @@ -11,7 +10,7 @@ The following is a step-by-step process to address **generation quality** issues



1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook

2. Use the queries to load MLflow traces of the records that retrieval quality issues.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
#### Debugging retrieval quality

##### How to debug retrieval quality
Expand All @@ -12,7 +11,7 @@ Retrieval quality is arguably the most important component of a RAG application.

Here's a step-by-step process to address **retrieval quality** issues:

1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook

2. Use the queries to load MLflow traces of the records that retrieval quality issues.

Expand Down
3 changes: 1 addition & 2 deletions genai_cookbook/nbs/5-hands-on-improve-quality-step-1.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
### **Step 5:** Identify the root cause of quality issues

```{image} ../images/5-hands-on/workflow_iterate.png
Expand Down Expand Up @@ -32,7 +31,7 @@ Each row your evaluation set will be tagged as follows:

The approach depends on if your evaluation set contains the ground-truth responses to your questions - stored in `expected_response`. If you have `expected_response` available, use the first table below. Otherwise, use the second table.

1. Open the [`B_quality_iteration/01_root_cause_quality_issues`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/01_root_cause_quality_issues.py) Notebook
1. Open the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) Notebook
2. Run the cells that are relevant to your use case e.g., if you do or don't have `expected_response`
3. Review the output tables to determine the most frequent root cause in your application
4. For each root cause, follow the steps below to further debug and identify potential fixes:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,50 +1,8 @@
<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
# **![Data pipeline](../images/5-hands-on/data_pipeline.png)** Implement data pipeline fixes

Follow these steps to modify your data pipeline and run it to:
1. Create a new Vector Index
2. Create an MLflow Run with the data pipeline's metadata

The resulting MLflow Run will be reference by the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook.

There are two approaches to modifying the data pipeline:
1. [**Implement a single fix at a time:**](#approach-1-implement-a-single-fix-at-a-time) In this approach, you configure and run a single data pipeline at once. This mode is best if you want to try a single embedding model, test out a single new parser, etc. We suggest starting here to get familiar with these notebooks.
2. [**Implement multiple fix at once:**](#approach-2-implement-multiple-fix-at-once) In this approach, also called a sweep, you, in parallel, run multiple data pipelines that each have a different configuration. This mode is best if you want to "sweep" across many different strategies, for example, evaluate 3 PDF parsers or evaluate many different chunk sizes.

```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code)
:class: tip
You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
```

### Approach 1: Implement a single fix at a time

1. Open the [`B_quality_iteration/data_pipeline_fixes/single_fix/00_config`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix/00_config.py) Notebook
2. Either:
- Follow the instructions there to implement a [new configuration](#configuration-settings-deep-dive) provided by this Cookbook
- Follow these [steps](#implementing-a-custom-parserchunker) to implement custom code for a parsing or chunking.
3. Run the pipeline, by either:
- Opening & running the [00_Run_Entire_Pipeline](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix/00_Run_Entire_Pipeline.py) Notebook
- Following these [steps](#running-the-pipeline-manually) to run each step of the pipeline manually
4. Add the name of the resulting MLflow Run that is outputted to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable in [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook


```{note}
The data preparation pipeline employs Spark Structured Streaming to incrementally load and process files. This entails that files already loaded and prepared are tracked in checkpoints and won't be reprocessed. Only newly added files will be loaded, prepared, and appended to the corresponding tables.
Therefore, if you wish to __rerun the entire pipeline from scratch__ and reprocess all documents, you need to delete the checkpoints and tables. You can accomplish this by using the [reset_tables_and_checkpoints](./reset_tables_and_checkpoints.py) notebook.
```

### Approach 2: Implement multiple fix at once

1. Open the [`B_quality_iteration/data_pipeline_fixes/multiple_fixes/00_Run_Multiple_Pipelines`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/multiple_fixes/00_Run_Multiple_Pipelines.py) Notebook
2. Follow the instructions in the Notebook to add 2+ configurations of the data pipeline to run
3. Run the Notebook to execute these pipelines
4. Add the names of the resulting MLflow Runs that are outputted to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable in [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook

### Appendix

```{note}
You can find the notebooks referenced below in the [`single_fix`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/single_fix) and [`multiple_fixes`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/data_pipeline_fixes/multiple_fixes) directories depending on whether you are implementing a single fix or multiple fixes at a time.
You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code).
```

#### Configuration settings deep dive
Expand Down
13 changes: 5 additions & 8 deletions genai_cookbook/nbs/5-hands-on-improve-quality-step-2.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
### **Step 6:** Iteratively implement & evaluate quality fixes

```{image} ../images/5-hands-on/workflow_iterate.png
Expand Down Expand Up @@ -67,19 +66,17 @@ As a reminder, there are 3 types of potential fixes:
-->

#### Instructions
For all types, you will use the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to evaluate the resulting chain versus your baseline configuration (at first, this is your POC) and pick a "winner". This notebook will help you pick the winning experiment and deploy it to the Review App or a production-ready, scalable REST API.
Based on which type of fix you want to make, modify the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/00_global_config.py), [`02_data_pipeline`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/02_data_pipeline.py), or the [`03_agent_proof_of_concept`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/03_agent_proof_of_concept.py). Use the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) notebook to evaluate the resulting chain versus your baseline configuration (at first, this is your POC) and pick a "winner". This notebook will help you pick the winning experiment and deploy it to the Review App or a production-ready, scalable REST API.

1. Open the [`B_quality_iteration/02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook
2. Based on the type of fix you are implementing:
- **![Data pipeline](../images/5-hands-on/data_pipeline.png)**
1. Follow these [instructions](./5-hands-on-improve-quality-step-2-data-pipeline.md) to create the new data pipeline & get the name of the resulting MLflow Run.
2. Add the run name(s) to the `DATA_PIPELINE_FIXES_RUN_NAMES` variable
1. Follow these [instructions](./5-hands-on-improve-quality-step-2-data-pipeline.md) to create the new data pipeline.
- **![Chain config](../images/5-hands-on/chain_config.png)**
1. Follow the instructions in the `Chain configuration` section of the [`02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to add chain configuration fixes to the `CHAIN_CONFIG_FIXES` variable.
1. Modify the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/00_global_config.py) .
- **![Chain code](../images/5-hands-on/chain_code.png)**
1. Create a modified chain code file and save it to the [`B_quality_iteration/chain_code_fixes`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/B_quality_iteration/chain_code_fixes) folder. Alternatively, select one of the provided chain code fixes from that folder.
2. Follow the instructions in the `Chain code` section of the [`02_evaluate_fixes`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/B_quality_iteration/02_evaluate_fixes.py) Notebook to add the chain code file and any additional chain configuration that is required to the `CHAIN_CODE_FIXES` variable
3. Run the notebook from the `Run evaluation` cell to
1. Create a modified chain code file similar to [`agents/function_calling_agent_w_retriever_tool`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/agents/function_calling_agent_w_retriever_tool.py) and reference it from it to the [`03_agent_proof_of_concept`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/03_agent_proof_of_concept.py) notebook.
3. Run the [`05_evaluate_poc_quality`](https://github.com/databricks/genai-cookbook/blob/v0.2.0/agent_app_sample_code/05_evaluate_poc_quality.py) notebook and use MLflow to
- Evaluate each fix
- Determine the fix with the best quality/cost/latency metrics
- Deploy the best one to the Review App and a production-ready REST API to get stakeholder feedback

0 comments on commit c6cac49

Please sign in to comment.