Skip to content

Commit

Permalink
Update learn section of genai_cookbook site to Agents (#33)
Browse files Browse the repository at this point in the history
* Update learn section of cookbook to Agents

Signed-off-by: Prithvi Kannan <[email protected]>

* link to db docs

Signed-off-by: Prithvi Kannan <[email protected]>

* fix

Signed-off-by: Prithvi Kannan <[email protected]>

* agent with tool

Signed-off-by: Prithvi Kannan <[email protected]>

* rag to agents

Signed-off-by: Prithvi Kannan <[email protected]>

* fix

Signed-off-by: Prithvi Kannan <[email protected]>

* fix

Signed-off-by: Prithvi Kannan <[email protected]>

* fix

Signed-off-by: Prithvi Kannan <[email protected]>

---------

Signed-off-by: Prithvi Kannan <[email protected]>
  • Loading branch information
prithvikannan authored Oct 9, 2024
1 parent 4669f46 commit 6e3b334
Show file tree
Hide file tree
Showing 14 changed files with 82 additions and 69 deletions.
2 changes: 1 addition & 1 deletion genai_cookbook/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ parts:
- caption: "Learn"
numbered: true
chapters:
- file: nbs/1-introduction-to-rag
- file: nbs/1-introduction-to-agents
- file: nbs/2-fundamentals-unstructured
sections:
- file: nbs/2-fundamentals-unstructured-data-pipeline
Expand Down
2 changes: 1 addition & 1 deletion genai_cookbook/index-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The RAG cookbook is divided into 2 sections:
## Table of contents
<!--
**Table of contents**
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
1. [RAG overview](./nbs/1-introduction-to-agents): Understand how RAG works at a high-level
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling
Expand Down
2 changes: 1 addition & 1 deletion genai_cookbook/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The RAG cookbook is divided into 2 sections:
## Table of contents
<!--
**Table of contents**
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
1. [RAG overview](./nbs/1-introduction-to-agents): Understand how RAG works at a high-level
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# RAG overview
# Agents overview

This section provides an overview of Retrieval-augmented generation (RAG): what it is, how it works, and key concepts.
This section provides an overview of Agents: what it is, how it works, and key concepts.

## What are AI agents and tools?

AI agents are systems where models make decisions, often using tools like Databricks' Unity Catalog functions toperform tasks such as retrieving data or interacting with external services.

See Databricks docs ([AWS](https://docs.databricks.com/en/generative-ai/ai-agents.html)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/ai-agents)) for more info.

## What is retrieval-augmented generation?

Expand All @@ -10,28 +16,34 @@ For example, suppose you are building a question-and-answer chatbot to help empl

RAG addresses this issue by first retrieving relevant information from the company documents based on a user’s query, and then providing the retrieved information to the LLM as additional context. This allows the LLM to generate a more accurate response by drawing from the specific details found in the relevant documents. In essence, RAG enables the LLM to “consult” the retrieved information to formulate its answer.

## Core components of a RAG application
An agent with a retriever tool is one pattern for RAG, and has the advantage of deciding when to it needs to perform retrieval. This cookbook will describe how to build such an agent.

A RAG application is an example of a [compound AI system](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/): it expands on the language capabilities of the model alone by combining it with other tools and procedures.
## Core components of an agent application

An agent application is an example of a [compound AI system](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/): it expands on the language capabilities of the model alone by combining it with other tools and procedures.

When using a standalone LLM, a user submits a request, such as a question, to the LLM, and the LLM responds with an answer based solely on its training data.

In its most basic form, the following steps happen in a RAG application:
In its most basic form, the following steps happen in an agent application:

1. **User query understanding**: First the agent needs to use an LLM to understand the user's query. This step may also consider the previous steps of the conversation if provided.

2. **Tool selection**: The agent will use an LLM to determine if it should use a retriever tool. In the case of a vector search retriever, the LLM will create a retriever query, which will help retriever relevant chunks from the vector database. If no tool is selected, the agent will skip to step 4 and generate the final response.

1. **Retrieval:** The **user's request** is used to query some outside source of information. This might mean querying a vector store, conducting a keyword search over some text, or querying a SQL database. The goal of the retrieval step is to obtain **supporting data** that will help the LLM provide a useful response.
3. **Tool execution**: The agent will then execute the tool with the parameters determined by the LLM and return the output.

2. **Augmentation:** The **supporting data** from the retrieval step is combined with the **user's request**, often using a template with additional formatting and instructions to the LLM, to create a **prompt**.
4. **LLM Generation**: The LLM will then generate the final response.

3. **Generation:** The resulting **prompt** is passed to the LLM, and the LLM generates a response to the **user's request**.
The image below demonstrates a RAG agent where a retrieval tool is selected.

```{image} ../images/1-introduction-to-rag/1_img.png
```{image} ../images/1-introduction-to-agents/1_img.png
:alt: RAG process
:align: center
```

<br>

This is a simplified overview of the RAG process, but it's important to note that implementing a RAG application involves a number of complex tasks. Preprocessing source data to make it suitable for use in RAG, effectively retrieving data, formatting the augmented prompt, and evaluating the generated responses all require careful consideration and effort. These topics will be covered in greater detail in later sections of this guide.
This is a simplified overview of the RAG process, but it's important to note that implementing an agent application involves a number of complex tasks. Preprocessing source data to make it suitable for retrieval, formatting the augmented prompt, and evaluating the generated responses all require careful consideration and effort. These topics will be covered in greater detail in later sections of this guide.

## Why use RAG?

Expand All @@ -53,4 +65,4 @@ The RAG architecture can work with 2 types of **supporting data**:
| **Definition** | Tabular data arranged in rows & columns with a specific schema e.g., tables in a database. | Data without a specific structure or organization, e.g., documents that include text and images or multimedia content such as audio or videos. |
| **Example data sources** | - Customer records in a BI or Data Warehouse system<br>- Transaction data from a SQL database<br>- Data from application APIs (e.g., SAP, Salesforce, etc) | - PDFs<br>- Google/Office documents<br>- Wikis<br>- Images<br>- Videos |

Which data you use with RAG depends on your use case. The remainder of this guide focuses on RAG for unstructured data.
Which data you use for your retriever depends on your use case. The remainder of this guide focuses on agents that use a retriever tool for unstructured data.
21 changes: 10 additions & 11 deletions genai_cookbook/nbs/2-fundamentals-unstructured-chain.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
## Retrieval, augmentation, and generation (aka RAG Chain)
## Retrieval, augmentation, and generation (aka RAG Agent)

Once the data has been processed by the data pipeline, it is suitable for use in the RAG application. This section describes the process that occurs once the user submits a request to the RAG application in an online setting. The series, or *chain* of steps that are invoked at inference time is commonly referred to as the RAG chain.
Once the data has been processed by the data pipeline, it is suitable for use in a retriever tool. This section describes the process that occurs once the user submits a request to the agent application in an online setting.

<!-- TODO (prithvi): add this back in once updated to agents
```{image} ../images/2-fundamentals-unstructured/3_img.png
:align: center
```
``` -->
<br/>

1. **(Optional) User query preprocessing:** In some cases, the user's query is preprocessed to make it more suitable for querying the vector database. This can involve formatting the query within a template, using another model to rewrite the request, or extracting keywords to aid retrieval. The output of this step is a *retrieval query* which will be used in the subsequent retrieval step.
1. **User query understanding**: First the agent needs to use an LLM to understand the user's query. This step may also consider the previous steps of the conversation if provided.

2. **Retrieval:** To retrieve supporting information from the vector database, the retrieval query is translated into an embedding using *the same embedding model* that was used to embed the document chunks during data preparation. These embeddings enable comparison of the semantic similarity between the retrieval query and the unstructured text chunks, using measures like cosine similarity. Next, chunks are retrieved from the vector database and ranked based on how similar they are to the embedded request. The top (most similar) results are returned.
2. **Tool selection**: The agent will use an LLM to determine if it should use a retriever tool. In the case of a vector search retriever, the LLM will create a retriever query, which will help retriever relevant chunks from the vector database. If no tool is selected, the agent will skip to step 4 and generate the final response.

3. **Prompt augmentation:** The prompt that will be sent to the LLM is formed by augmenting the user's query with the retrieved context, in a template that instructs the model how to use each component, often with additional instructions to control the response format. The process of iterating on the right prompt template to use is referred to as [prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering).
3. **Tool execution**: The agent will then execute the tool with the parameters determined by the LLM and return the output.

4. **LLM Generation**: The LLM takes the augmented prompt, which includes the user's query and retrieved supporting data, as input. It then generates a response that is grounded on the additional context.
4. **LLM Generation**: The LLM will then generate the final response.

5. **(Optional) Post-processing:** The LLM's response may be processed further to apply additional business logic, add citations, or otherwise refine the generated text based on predefined rules or constraints.
As with the retriever data pipeline, there are numerous consequential engineering decisions that can affect the quality of the agent. For example, determining how many chunks to retrieve in and when to select the retriever tool can both significantly impact the model's ability to generate quality responses.

As with the RAG application data pipeline, there are numerous consequential engineering decisions that can affect the quality of the RAG chain. For example, determining how many chunks to retrieve in (2) and how to combine them with the user's query in (3) can both significantly impact the model's ability to generate quality responses.

Throughout the chain, various guardrails may be applied to ensure compliance with enterprise policies. This might involve filtering for appropriate requests, checking user permissions before accessing data sources, and applying content moderation techniques to the generated responses.
Throughout the agent, various guardrails may be applied to ensure compliance with enterprise policies. This might involve filtering for appropriate requests, checking user permissions before accessing data sources, and applying content moderation techniques to the generated responses.
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Data pipeline

Throughout this guide we will focus on preparing unstructured data for use in RAG applications. *Unstructured* data refers to data without a specific structure or organization, such as PDF documents that might include text and images, or multimedia content such as audio or videos.
Throughout this guide we will focus on preparing unstructured data for use in agent applications. *Unstructured* data refers to data without a specific structure or organization, such as PDF documents that might include text and images, or multimedia content such as audio or videos.

Unstructured data lacks a predefined data model or schema, making it impossible to query on the basis of structure and metadata alone. As a result, unstructured data requires techniques that can understand and extract semantic meaning from raw text, images, audio, or other content.

During data preparation, the RAG application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)
During data preparation, the agent application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)

```{image} ../images/2-fundamentals-unstructured/2_img.png
:align: center
Expand All @@ -17,7 +17,7 @@ Semantic search is one of several approaches that can be taken when implementing



The following are the typical steps of a data pipeline in a RAG application using unstructured data:
The following are the typical steps of a data pipeline in an agent application using unstructured data:

1. **Parse the raw documents:** The initial step involves transforming raw data into a usable format. This can include extracting text, tables, and images from a collection of PDFs or employing optical character recognition (OCR) techniques to extract text from images.

Expand All @@ -31,6 +31,6 @@ The following are the typical steps of a data pipeline in a RAG application usin

The process of computing similarity can be computationally expensive. Vector indexes, such as [Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html), speed this process up by providing a mechanism for efficiently organizing and navigating embeddings, often via sophisticated approximation methods. This enables rapid ranking of the most relevant results without comparing each embedding to the user's query individually.

Each step in the data pipeline involves engineering decisions that impact the RAG application's quality. For example, choosing the right chunk size in step (3) ensures the LLM receives specific yet contextualized information, while selecting an appropriate embedding model in step (4) determines the accuracy of the chunks returned during retrieval.
Each step in the data pipeline involves engineering decisions that impact the agent application's quality. For example, choosing the right chunk size in step (3) ensures the LLM receives specific yet contextualized information, while selecting an appropriate embedding model in step (4) determines the accuracy of the chunks returned during retrieval.

This data preparation process is referred to as *offline* data preparation, as it occurs before the system answers queries, unlike the *online* steps triggered when a user submits a query.
10 changes: 5 additions & 5 deletions genai_cookbook/nbs/2-fundamentals-unstructured-eval.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Evaluation & monitoring

Evaluation and monitoring are critical components to understand if your RAG application is performing to the *quality*, *cost*, and *latency* requirements dictated by your use case. Technically, **evaluation** happens during development and **monitoring** happens once the application is deployed to production, but the fundamental components are similar.
Evaluation and monitoring are critical components to understand if your agent application is performing to the *quality*, *cost*, and *latency* requirements dictated by your use case. Technically, **evaluation** happens during development and **monitoring** happens once the application is deployed to production, but the fundamental components are similar.

RAG over unstructured data is a complex system with many components that impact the application's quality. Adjusting any single element can have cascading effects on the others. For instance, data formatting changes can influence the retrieved chunks and the LLM's ability to generate relevant responses. Therefore, it's crucial to evaluate each of the application's components in addition to the application as a whole in order to iteratively refine it based on those assessments.
Often, an agent is a complex system with many components that impact the application's quality. Adjusting any single element can have cascading effects on the others. For instance, data formatting changes can influence the retrieved chunks and the LLM's ability to generate relevant responses. Therefore, it's crucial to evaluate each of the application's components in addition to the application as a whole in order to iteratively refine it based on those assessments.

Evaluation and monitoring of Generative AI applications, including RAG, differs from classical machine learning in several ways:
Evaluation and monitoring of Generative AI applications, including agents, differs from classical machine learning in several ways:

| | Classical ML | Generative AI |
|---------|---------|---------|
Expand All @@ -19,9 +19,9 @@ Effectively evaluating and monitoring application quality, cost and latency requ
```
<br/>

- **Evaluation set:** To rigorously evaluate your RAG application, you need a curated set of evaluation queries (and ideally outputs) that are representative of the application's intended use. These evaluation examples should be challenging, diverse, and updated to reflect changing usage and requirements.
- **Evaluation set:** To rigorously evaluate your agent application, you need a curated set of evaluation queries (and ideally outputs) that are representative of the application's intended use. These evaluation examples should be challenging, diverse, and updated to reflect changing usage and requirements.

- **Metric definitions**: You can't manage what you don't measure. In order to improve RAG quality, it is essential to define what quality means for your use case. Depending on the application, important metrics might include response accuracy, latency, cost, or ratings from key stakeholders. You'll need metrics that measure each component, how the components interact with each other, and the overall system.
- **Metric definitions**: You can't manage what you don't measure. In order to improve agent quality, it is essential to define what quality means for your use case. Depending on the application, important metrics might include response accuracy, latency, cost, or ratings from key stakeholders. You'll need metrics that measure each component, how the components interact with each other, and the overall system.

- **LLM judges**: Given the open ended nature of LLM responses, it is not feasible to read every single response each time you evaluate to determine if the output is correct. Using an additional, different LLM to review outputs can help scale your evaluation and compute additional metrics such as the groundedness of a response to 1,000s of tokens of context, that would be infeasible for human raters to effectively asses at scale.

Expand Down
Loading

0 comments on commit 6e3b334

Please sign in to comment.