Skip to content

Adding files to deploy FinanceAgent application on ROCm vLLM #1890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
fd3824d
Add Config for vLLM
artem-astafev Apr 25, 2025
ee74383
Update compose_vllm.yaml
artem-astafev Apr 25, 2025
6c50388
Update compose_vllm.yaml
artem-astafev Apr 25, 2025
f762e43
Update example config
artem-astafev Apr 25, 2025
277a698
Update set_env_vllm.sh
artem-astafev Apr 25, 2025
c5e3ab2
Update set_env_vllm.sh
artem-astafev Apr 25, 2025
14c2fbb
Update set_env_vllm.sh
artem-astafev Apr 25, 2025
a4f154f
Update compose_vllm.yaml
artem-astafev Apr 25, 2025
8cf4025
Update compose_vllm.yaml
artem-astafev Apr 25, 2025
b38ec3e
Refactor FinanceAgent for rocm
artem-astafev Apr 28, 2025
aaf7f86
adjust rocm example
artem-astafev Apr 28, 2025
b9c3b45
Update test_compose_on_vllm_rocm.sh
artem-astafev Apr 29, 2025
31371e3
Adjust example config
artem-astafev Apr 29, 2025
5176581
Update compose.yaml
artem-astafev Apr 29, 2025
29447e2
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev Apr 29, 2025
490ec13
Add README.md for AMD ROCm deployment
artem-astafev Apr 29, 2025
e421ca3
Merge branch 'feature/FinaceAgent-on-AMD-ROCm-example' of https://git…
artem-astafev Apr 29, 2025
6569374
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2025
77a8e85
Update README.md for AMD ROCm
artem-astafev Apr 29, 2025
4fa8093
Merge branch 'feature/FinaceAgent-on-AMD-ROCm-example' of https://git…
artem-astafev Apr 29, 2025
ec64548
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2025
f4763d9
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev May 5, 2025
d2d4172
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev May 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions FinanceAgent/docker_compose/amd/gpu/rocm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Example Finance Agent deployments on AMD GPU (ROCm)

This document outlines the deployment process for a Finance Agent application utilizing OPEA components on an AMD GPU server.

This example includes the following sections:

- [Finance Agent Quick Start Deployment](#finance-agent-quick-start-deployment): Demonstrates how to quickly deploy a Finance Agent application/pipeline on AMD GPU platform.
- [Finance Agent Docker Compose Files](#finance-agent-docker-compose-files): Describes some example deployments and their docker compose files.
- [How to interact with the agent system with UI](#how-to-interact-with-the-agent-system-with-ui): Guideline for UI usage

## Finance Agent Quick Start Deployment

This section describes how to quickly deploy and test the Finance Agent service manually on an AMD GPU platform. The basic steps are:

1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Test the Pipeline](#test-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)

### Access the Code

Clone the GenAIExample repository and access the ChatQnA AMD GPU platform Docker Compose files and supporting scripts:

```
mkdir /path/to/your/workspace/
export WORKDIR=/path/to/your/workspace/
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```

Checkout a released version, such as v1.4:

```
git checkout v1.4
```

### Generate a HuggingFace Access Token

Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).

### Deploy the Services Using Docker Compose

#### 3.1 Launch vllm endpoint

Below is the command to launch a vllm endpoint on Gaudi that serves `meta-llama/Llama-3.3-70B-Instruct` model on AMD ROCm platform.

```bash
cd $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm
bash launch_vllm.sh
```

#### 3.2 Prepare knowledge base

The commands below will upload some example files into the knowledge base. You can also upload files through UI.

First, launch the redis databases and the dataprep microservice.

```bash
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm
bash launch_dataprep.sh
```

Validate datat ingest data and retrieval from database:

```bash
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option ingest
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option get
```

#### 3.3 Launch the multi-agent system

The command below will launch 3 agent microservices, 1 docsum microservice, 1 UI microservice.

```bash
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm
bash launch_agents.sh
```

#### 3.4 Check the Deployment Status

After running docker compose, check if all the containers launched via docker compose have started:

```
docker ps -a
```

For the default deployment, the following 5 containers should have started:

```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7e61978c3d75 opea/dataprep:latest "sh -c 'python $( [ …" 31 seconds ago Up 19 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server-finance
0fee87aca791 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 3 hours ago Up 3 hours (healthy) 0.0.0.0:6380->6379/tcp, [::]:6380->6379/tcp, 0.0.0.0:8002->8001/tcp, [::]:8002->8001/tcp redis-kv-store
debd549045f8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 3 hours ago Up 3 hours (healthy) 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
9cff469364d3 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "/bin/sh -c 'apt-get…" 3 hours ago Up 3 hours (healthy) 0.0.0.0:10221->80/tcp, [::]:10221->80/tcp tei-embedding-serving
13f71e678dbd opea/vllm-rocm:latest "python3 /workspace/…" 3 hours ago Up 3 hours (healthy) 0.0.0.0:8086->8011/tcp, [::]:8086->8011/tcp vllm-service
e5a219a77c95 opea/llm-docsum:latest "bash entrypoint.sh" 3 hours ago Up 2 seconds 0.0.0.0:33218->9000/tcp, [::]:33218->9000/tcp docsum-llm-server
```

### 3.5 Validate agents

FinQA Agent:

```bash
export agent_port="9095"
prompt="What is Gap's revenue in 2024?"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port
```

Research Agent:

```bash
export agent_port="9096"
prompt="generate NVDA financial research report"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port --tool_choice "get_current_date" --tool_choice "get_share_performance"
```

Supervisor Agent single turns:

```bash
export agent_port="9090"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --stream
```

Supervisor Agent multi turn:

```bash
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --multi-turn --stream

```

### Cleanup the Deployment

To stop the containers associated with the deployment, execute the following commands:

```
docker compose -f compose.yaml down
docker compose -f compose_vllm.yaml down
docker compose -f dataprep_compose.yaml down
```

All the Finance Agent containers will be stopped and then removed on completion of the "down" command.

## Finance Agent Docker Compose Files

In the context of deploying a Finance Agent pipeline on an AMD GPU platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application.

| File | Description |
| ------------------------------------------------ | ------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose to run agent service |
| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM Service serving framework is vLLM. |
| [dataprep_compose.yaml](./dataprep_compose.yaml) | Compose file to run Data Prep service such as Redis vector DB, Re-rancer and Embedder |

## How to interact with the agent system with UI

The UI microservice is launched in the previous step with the other microservices.
To see the UI, open a web browser to `http://${ip_address}:5175` to access the UI. Note the `ip_address` here is the host IP of the UI microservice.

1. Create Admin Account with a random value

2. Enter the endpoints in the `Connections` settings

First, click on the user icon in the upper right corner to open `Settings`. Click on `Admin Settings`. Click on `Connections`.

Then, enter the supervisor agent endpoint in the `OpenAI API` section: `http://${ip_address}:9090/v1`. Enter the API key as "empty". Add an arbitrary model id in `Model IDs`, for example, "opea_agent". The `ip_address` here should be the host ip of the agent microservice.

Then, enter the dataprep endpoint in the `Icloud File API` section. You first need to enable `Icloud File API` by clicking on the button on the right to turn it into green and then enter the endpoint url, for example, `http://${ip_address}:6007/v1`. The `ip_address` here should be the host ip of the dataprep microservice.

You should see screen like the screenshot below when the settings are done.

![opea-agent-setting](../../../../assets/ui_connections_settings.png)

3. Upload documents with UI

Click on the `Workplace` icon in the top left corner. Click `Knowledge`. Click on the "+" sign to the right of `Icloud Knowledge`. You can paste an url in the left hand side of the pop-up window, or upload a local file by click on the cloud icon on the right hand side of the pop-up window. Then click on the `Upload Confirm` button. Wait till the processing is done and the pop-up window will be closed on its own when the data ingestion is done. See the screenshot below.

Note: the data ingestion may take a few minutes depending on the length of the document. Please wait patiently and do not close the pop-up window.

![upload-doc-ui](../../../../assets/upload_doc_ui.png)

4. Test agent with UI

After the settings are done and documents are ingested, you can start to ask questions to the agent. Click on the `New Chat` icon in the top left corner, and type in your questions in the text box in the middle of the UI.

The UI will stream the agent's response tokens. You need to expand the `Thinking` tab to see the agent's reasoning process. After the agent made tool calls, you would also see the tool output after the tool returns output to the agent. Note: it may take a while to get the tool output back if the tool execution takes time.

![opea-agent-test](../../../../assets/opea-agent-test.png)
132 changes: 132 additions & 0 deletions FinanceAgent/docker_compose/amd/gpu/rocm/compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0

services:
worker-finqa-agent:
image: opea/agent:latest
container_name: finqa-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9095:9095"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${TEMPERATURE}
max_new_tokens: ${MAX_TOKENS}
stream: false
tools: /home/user/tools/finqa_agent_tools.yaml
custom_prompt: /home/user/prompts/finqa_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL_VECTOR: $REDIS_URL_VECTOR
REDIS_URL_KV: $REDIS_URL_KV
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT
port: 9095

worker-research-agent:
image: opea/agent:latest
container_name: research-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9096:9096"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: false
recursion_limit: 25
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
stream: false
tools: /home/user/tools/research_agent_tools.yaml
custom_prompt: /home/user/prompts/research_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
FINNHUB_API_KEY: ${FINNHUB_API_KEY}
FINANCIAL_DATASETS_API_KEY: ${FINANCIAL_DATASETS_API_KEY}
port: 9096

supervisor-react-agent:
image: opea/agent:latest
container_name: supervisor-agent-endpoint
depends_on:
- worker-finqa-agent
- worker-research-agent
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9090:9090"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: true
recursion_limit: ${recursion_limit_supervisor}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${TEMPERATURE}
max_new_tokens: ${MAX_TOKENS}
stream: true
tools: /home/user/tools/supervisor_agent_tools.yaml
custom_prompt: /home/user/prompts/supervisor_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
WORKER_FINQA_AGENT_URL: $WORKER_FINQA_AGENT_URL
WORKER_RESEARCH_AGENT_URL: $WORKER_RESEARCH_AGENT_URL
DOCSUM_ENDPOINT: $DOCSUM_ENDPOINT
REDIS_URL_VECTOR: $REDIS_URL_VECTOR
REDIS_URL_KV: $REDIS_URL_KV
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT
port: 9090
docsum-llm-textgen:
image: ${REGISTRY:-opea}/llm-docsum:${TAG:-latest}
container_name: docsum-llm-server
ports:
- "${DOCSUM_LLM_SERVER_PORT}:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LLM_ENDPOINT: ${LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
LLM_MODEL_ID: ${LLM_MODEL_ID}
DocSum_COMPONENT_NAME: "DocSum_COMPONENT_NAME:-OpeaDocSumvLLM"
LOGFLAG: ${LOGFLAG:-False}
restart: unless-stopped

agent-ui:
image: opea/agent-ui:latest
container_name: agent-ui
environment:
host_ip: ${host_ip}
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
ports:
- "5175:8080"
ipc: host
39 changes: 39 additions & 0 deletions FinanceAgent/docker_compose/amd/gpu/rocm/compose_vllm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0

services:
vllm-service:
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
container_name: vllm-service
ports:
- "${FINANCEAGENT_VLLM_SERVICE_PORT:-8081}:8011"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
VLLM_USE_TRITON_FLASH_ATTENTION: 0
PYTORCH_JIT: 0
healthcheck:
test: [ "CMD-SHELL", "curl -f http://${HOST_IP}:${FINANCEAGENT_VLLM_SERVICE_PORT:-8081}/health || exit 1" ]
interval: 10s
timeout: 10s
retries: 100
volumes:
- "${MODEL_CACHE:-./data}:/data"
shm_size: 20G
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
- apparmor=unconfined
command: "--model ${LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\""
ipc: host
Loading
Loading