Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for OpenSearch Flow OSD plugin #9101

Merged
merged 16 commits into from
Jan 31, 2025
Merged
132 changes: 132 additions & 0 deletions _automating-configurations/workflow-builder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
layout: default
title: Build workflows
nav_order: 15
---

# Build workflows with OpenSearch Flow

The OpenSearch Flow plugin in OpenSearch Dashboards lets you iteratively build and test use cases leveraging ingest and search pipelines. It aims to simplify the complexity around building different AI/ML use cases using ML inference processors, such as vector search and retrieval-augmented generation (RAG). Behind the scenes, the plugin uses the [Flow Framework OpenSearch Plugin]({{site.url}}{{site.baseurl}}/automating-configurations/index/) for resource management. Once you have built out a use case to your satisfaction, you can export the [Workflow Template]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/) to re-create identical resources across multiple clusters.

## Background: Ingest pipelines, search pipelines, and ML models

[Ingest pipelines]({{site.url}}{{site.baseurl}}/ingest-pipelines/) and [search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) are used for configuring data transforms at different stages of ingest and search operations using individual processors, all run within OpenSearch. An _ingest pipeline_ is composed of a sequence of _ingest processors_, while a _search pipeline_ is composed of a sequence of _search request processors_ and/or _search response processors_. You can stitch together processors and build custom pipelines to fit your data processing needs.
ohltyler marked this conversation as resolved.
Show resolved Hide resolved

In general, these pipelines allow for modifying data at 3 different stages.

1. **Ingest**: transform your documents before they are ingested in an index.
2. **Search request**: transform the search request before executing search against an index.
3. **Search response**: transform the search response (and documents) after search is executed, but before the response is returned to the user.

The [ML Commons OpenSearch Plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/) lets you [integrate models hosted on third-party platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/), and use their inference capabilities directly within OpenSearch. Both ingest and search pipelines offer [ML inference processors]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/ml-inference/), allowing you to leverage remote model inference within your pipelines at both ingest and search time. This plugin helps in building, testing, and visualizing ingest and search pipelines for your particular use case, including the use of ML inference processors.

ohltyler marked this conversation as resolved.
Show resolved Hide resolved
## Getting started

To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **OpenSearch Flow**.

## Preset templates

From the plugin home page, navigate to the **New workflow** tab by clicking directly, or clicking the **Create workflow** button on the right-hand side. Here, you will see a variety of preset templates offered, targeting different use cases. Each will have a unique set of preconfigured ingest and search processors. These templates serve two main purposes:

1. **Quickly proving out basic AI/ML solutions**. If you have deployed models with defined interfaces, you can get a basic use case up and running in your cluster in a few clicks. An end-to-end [example](#example-semantic-search-with-rag) is provided later.
2. **A starting point for your custom/advanced solution**. Each template provides a good skeleton and pattern for building out a different use case. Use them as a starting point for your custom use case.

## Workflow editor

This is where you will actually build and test your ingest and search flows for your use case. There are three main sections of this page. Each one may be expanded horizontally or vertically, or collapsed altogether, depending on what you want to focus on.

1. **Form**. The multi-stepped form where you will fill in all of the details for your ingest and search flows, including selecting and configuring different ingest and search processors for your ingest and search pipelines, respectively. Optionally skip ingestion configuration if you already have a populated index and are only looking to build a search flow.
2. **Preview flows**. The readonly view representing how data flows through your ingest and search flows at a high level. As you make changes in the **Form**, the components in these flows are automatically updated. Click on the **JSON** tab in to view the low-level template configuration.

Check failure on line 39 in _automating-configurations/workflow-builder.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: readonly. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: readonly. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_automating-configurations/workflow-builder.md", "range": {"start": {"line": 39, "column": 27}}}, "severity": "ERROR"}
3. **Inspect flows**. Different tabbed content for interacting with your workflow in different ways:
- **Test flow**: Execute your search flow (with or without any configured search pipeline) and view results in tabular or raw JSON format.
- **Ingest response**: After updating your ingest flows, the API response will show up here.
- **Errors**: The latest error will appear here. Errors may come from updating your ingest or search flows, running ingest, or running search. The plugin will automatically open this tab when a new error occurs, similar to an IDE.
- **Resources**: The list of associated OpenSearch resources for this particular workflow. Will include up to one ingest pipeline, one index, and one search pipeline. Click the **Inspect** action for each to view more details for each.

## Example: Semantic Search with RAG

The following example leverages a deployed [Titan Text Embedding](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) model and [Bedrock-hosted Anthropic Claude](https://aws.amazon.com/bedrock/claude/) model to build out an [ingest pipeline]({{site.url}}{{site.baseurl}}/ingest-pipelines/), [index]({{site.url}}{{site.baseurl}}/getting-started/intro/#index), and [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) for performing vector search and retrieval-augmented generation (RAG).

It is strongly recommended to have deployed models with interfaces. A library of example configurations is available [here](https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/models.md).
{: .note}

1. On the **Workflows** page, navigate to the **New workflow** tab.

![Workflows list page]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/workflows-list-page.png)

2. Click **Create** on the **RAG with Vector Retrieval** template.
3. In the pop-up, provide some basic details.
- A unique workflow name and description
- The embedding model to generate vector embeddings
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The large language model (LLM) to perform RAG
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- Click on **Optional configuration** for more options, such as the names of the text fields and vector fields that will be persisted in the index. This can always be updated later.

![Quick configure modal]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/quick-configure-modal.png)

4. Click **Create**, which will pre-populate some configuration, and automatically navigate to the **Workflow Details** page, starting with configuring your ingest flow.
5. Provide some sample data by clicking **Import**. In the pop-up, you can manually input data, upload a local `.json` file, or fetch some sample documents from an existing index. The form expects an array of JSON objects, each representing a standalone sample [document]({{site.url}}{{site.baseurl}}/getting-started/intro/#document). You may think of this similar to the [bulk ingest operation]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/#bulk-indexing). Click **Save** once complete.

![Import data modal]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/import-data-modal.png)

6. Click the **ML Inference Processor** accordion under **Transform data** to see its details. This will be pre-populated with the processor configurations to map data _to_ the expected model input, and _from_ the expected model output. In this particular example, under **Inputs**, it is configured to map the target document field to the model input field, such that vector embeddings will be generated for that field. Under **Outputs**, it is configured to map the model output field, to a new field that will be persisted in the index. For more information about the different transform types to accommodate for more complex data schemas and/or model interfaces, see [Advanced data transformations](#advanced-data-transformations).

![Transform data]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/transform-data.png)

7. Click the **Advanced settings** accordion under **Ingest data** to see its details. This will also be pre-populated with the index configurations required for this particular use case, such as the `knn_vector` field mapping for the vector embedding field, and the index setting `index.knn: true`. You may tune these as you like.

Check failure on line 75 in _automating-configurations/workflow-builder.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SubstitutionsError] Use 'k-NN' instead of 'knn'. Raw Output: {"message": "[OpenSearch.SubstitutionsError] Use 'k-NN' instead of 'knn'.", "location": {"path": "_automating-configurations/workflow-builder.md", "range": {"start": {"line": 75, "column": 202}}}, "severity": "ERROR"}

![Ingest data]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/ingest-data.png)

8. Click **Update** in the bottom bar to build out the configured ingest pipeline and index, as well as ingest the provided sample data. You may then navigate to **Test flow** under **Inspect flows** to search against the newly-created index and verify the transformed documents look as expected. For this example, you would want to ensure that the vector embeddings were generated for each ingested document.

![Test ingest flow]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/ingest-test-flow.png)

9. Click **Next: Search pipeline** to begin configuring your search flow. Similar to ingest, the form will be pre-populated with some initial configuration.
ohltyler marked this conversation as resolved.
Show resolved Hide resolved
10. Click the **ML Inference Processor** accordion under **Transform query** to see its details. The purpose of this processor is to parse out inputs in the search query that you want to generate vector embeddings for. In this particular example, it is passing the value in `query.term.item_text.value` to the embedding model. Additionally, it is performing a query rewrite, to generate a `knn` query using the produced vector embedding from the model. Click on **Rewrite** to see its details. This is one way to abstract out complex query details, and provide a simple query interface, leveraging search pipelines for performing the advanced query generation.

![Transform query]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/transform-query.png)

![Rewrite query]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/rewrite-query.png)

11. Click the **ML Inference Processor** accordion under **Transform response** to see its details. This is where the Claude LLM is leveraged to collect the returned results, and return some human-readable response. Under **Inputs**, click the pencil icon on the right-hand side of the `prompt` entry. This will display a pop-up with a preconfigured prompt template, aimed at summarizing the returned documents. You may tune this as you like, and there are several presets provided as starting points. You may add/update/remove the **Input variables**, which contain the data in the returned documents you want to dynamically inject as contextual information to the LLM. The preconfigured option shows collecting all of the `item_text` data, and summarizing the results. Click **Save** or **Cancel** to exit the pop-up window.
ohltyler marked this conversation as resolved.
Show resolved Hide resolved

![Transform response]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/transform-response.png)

![Configure prompt]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/configure-prompt.png)

12. Click **Update**, to build out the search pipeline. The **Inspect flows** section will automatically navigate to **Test flow**. From here, you may test out different queries and execute search. Variables wrapped in `{{ }}` allow for rapidly testing out different query values, without changing the base query.

![Test search flow]({{site.url}}{{site.baseurl}}/images/dashboards-flow-framework/search-test-flow.png)

13. View results. After clicking **Search**, view the results as either a formatted list of hits, or the raw JSON search response.
14. Tune your configurations to satisfactorily fit your use case. There are many ways this can be done:
- Try different query parameters
- Try different queries
- Update existing processors under **Enhance query request** or **Enhance query response**
- Add or remove processors under **Enhance query request** or **Enhance query response**
15. Export your workflow. Click on **Export** in the header to bring up the pop-up window. The displayed data represents the [Workflow Template]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/) containing the end-to-end configuration for building out the OpenSearch resources; in this case, the ingest pipeline, index, and search pipeline. You may download in JSON or YAML format from the button on the right-hand side.
ohltyler marked this conversation as resolved.
Show resolved Hide resolved

## Advanced data transformations

ML inference processors offer several different ways for flexibly transforming input data _to_ the model input(s), and _from_ the model output(s).

**Inputs** is where you configure different parameters passed _to_ the model. Each input parameter can be one of four different transformation types:

1. `Data field`: use an existing field from your data as the model input field..
2. `JSONPath expression`: extract data from a JSON structure and map the extracted data to the model input field using [JSONPath](https://en.wikipedia.org/wiki/JSONPath).
3. `Prompt`: a constant value, with the ability to inject dynamic variables within it. You may think of it as a mix of `String` and `Data field`/`JSONPath expression` transformation types. Helpful in prompt building for LLMs.
4. `Custom string`: a constant string value.

**Outputs** is where you configure different values passed _from_ the model. Each output parameter can be one of three different transformation types:

1. `Data field`: copy the model output into a new document field.
2. `JSONPath Expression`: extract data from a JSON structure and map the extracted data to one or more new document fields using [JSONPath](https://en.wikipedia.org/wiki/JSONPath).
3. `No transformation`: leave the model output field as-is, including its name and value.

## Next steps

For suggested models and model interfaces to use within OpenSearch Flow, see [here](https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/models.md).

For more tutorials on how to leverage OpenSearch Flow to build out different AI/ML use cases, see [here](https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/tutorial.md).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note I deleted would be more appropriate to a blog post as opposed to the documentation.

_Notice a missing ingest or search processor you'd like to see in the plugin? Consider opening an [issue](https://github.com/opensearch-project/dashboards-flow-framework/issues) or contributing by opening a [pull request](https://github.com/opensearch-project/dashboards-flow-framework/pulls)._
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
{: .note}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading