Name		Name	Last commit message	Last commit date
parent directory ..
app_chain		app_chain
evaluation		evaluation
images		images
src		src
.env		.env
.gitignore		.gitignore
README.md		README.md
docker-compose-nim-ms.yaml		docker-compose-nim-ms.yaml
docker-compose-vectordb.yaml		docker-compose-vectordb.yaml
docker-compose.yaml		docker-compose.yaml
prompt.yaml		prompt.yaml

README.md

Medical Device Training Assistant

Background

Medical devices, such as surgical robots, are complex machines that require extensive training prior to safe use by clinicians and other Operating Room (OR) staff. Timely and accurate answers to medical device questions during and after training can improve the trust and satisfaction of device operators and improve efficiency and effectiveness of training. The medical device's "Instructions For Use" (IFU) is a manual that contains information about the usage of the device, but it is often difficult to find answers to questions efficiently due to its verbosity and technical language.

In this application, we show the capability of using Retrieval Augmented Generation (RAG) powered by NVIDIA NIMs to enhance that training experience by enabling clinicians and other OR staff to ask the IFU directly with their questions at any hour of any day.

Overview

This RAG (Retrieval Augmented Generation) application is based on https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RetrievalAugmentedGeneration/examples/multi_turn_rag.

With this application, you will be able to use a rag-playground UI to interact with a chain server of file loader, text splitter, embedding model, vector database, and LLM. You can first upload your IFU (Instructions-For-Use) in the Knowledge Base:

Then use voice command to ask queries to the RAG regarding information in your IFU and get the answer back through the speakers with no interruption to your routine:

The components involved in ingesting documents into the knowledge base:

The components involved in creating an answer from a query:

Example Features

This example showcases conversational AI in a RAG pipeline. The chain server stores the conversation history and knowledge base in a vector database and retrieves them at runtime to understand contextual queries.

The example supports ingestion of PDF and text files. The documents are ingested in a dedicated document vector store. The prompt for the example is tuned to act as a document chat bot. To maintain the conversation history, the chain server stores the previously asked query and the model's generated answer as a text entry in a different and dedicated vector store for conversation history, conv_store. Both of these vector stores are part of a LangChain LCEL chain as LangChain Retrievers. When the chain is invoked with a query, the query passes through both the retrievers. The retriever retrieves context from the document vector store and the closest-matching conversation history from conversation history vector store. Afterward, the chunks are added into the LLM prompt as part of the chain.

Model	Embedding	Ranking (Optional)	Framework	Vector Database	File Types
meta/llama3-70b-instruct	nvidia/nv-embedqa-e5-v5	nvidia/nv-rerankqa-mistral-4b-v3	LangChain	Milvus	TXT, PDF, MD

Prerequisites

API Keys

Complete the common prerequisites. This will give you two API keys for the next section, one for NVIDIA_API_KEY and one for NGC_API_KEY.

RIVA Setup

Assuming you would like to enable RIVA ASR and TTS to use your voice for interacting with the RAG, refer to the Local RIVA Server section on running RIVA containers on the local machine's compute resources, or refer to Riva API Endpoint on NVIDIA API Catalog to access RIVA speech models on NVIDIA API Catalog instead of hosting them locally. Refer to the rest of the documentation on RIVA to configure the docker-compose.yaml file.

Hardware Requirement

The Milvus Vector Database is GPU-enabled by default in this IFU RAG application, which is specified in docker-compose-vectordb.yaml. Please refer to the section Configuring Milvus with GPU Acceleration to see how to set GPU acceleration on and off.

Build and Start the Containers

Export the API keys from the common prerequisites above as environment variables:
```
export NVIDIA_API_KEY="nvapi-<...>"
export NGC_API_KEY="<...>"
```
Depending on whether you're running RIVA TTS and ASR locally or using Riva API Endpoint on NVIDIA API Catalog, modify the docker-compose.yaml file in rag-playground's environment: section. Only one set of env vars should be uncommented. By default, it is set to use the Riva API Endpoint on NVIDIA API Catalog.
```
# if locally hosting Riva:
RIVA_API_URI: <riva-ip-address>:50051
TTS_SAMPLE_RATE: 48000
```
or
```
# if using Riva API Endpoint on NVIDIA API Catalog
RIVA_API_URI: grpc.nvcf.nvidia.com:443
NVIDIA_API_KEY: ${NVIDIA_API_KEY}
RIVA_ASR_FUNCTION_ID: 1598d209-5e27-4d3c-8079-4751568b1081
RIVA_TTS_FUNCTION_ID: 0149dedb-2be8-4195-b9a0-e57e0e14f972
TTS_SAMPLE_RATE: 48000
```
Note that the RIVA_ASR_FUNCTION_ID and RIVA_TTS_FUNCTION_ID values are from --metadata function-id field shown in https://build.nvidia.com/nvidia/parakeet-ctc-1_1b-asr/api and https://build.nvidia.com/nvidia/fastpitch-hifigan-tts/api. If there are other models available on https://build.nvidia.com/explore/speech of interest, you can try them as well by changing these two environment variables.

Start the containers:

cd industries/healthcare/medical-device-training-assistant
docker compose up -d --build

Example Output

✔ Network nvidia-rag                               Created
✔ Container milvus-etcd                            Running
✔ Container milvus-minio                           Running
✔ Container milvus-standalone                      Running
✔ Container chain-server                           Started
✔ Container rag-playground                         Started

Confirm the containers are running:

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

Example Output

CONTAINER ID   NAMES               STATUS
dd4fc3da6c9c   rag-playground      Up About a minute
ac1f039a1db8   chain-server        Up About a minute
cd0a57ee20e0   milvus-standalone   Up 2 hours
a36370e7ed75   milvus-minio        Up 2 hours (healthy)
a796a4e59b68   milvus-etcd         Up 2 hours (healthy)

Open a web browser and access http://localhost:8090 to use the RAG Playground.

Refer to Using the Sample Web Application for information about uploading documents and using the web interface.

Troubleshooting

If you see any errors while navigating the web UI, you could use docker logs chain-server or docker logs rag-playground etc to look into the logs, as the errors shown on the web UI can't described the entire trace.

Next Steps

Vector Database Customizations to disable GPU acceleration for the Milvus vector database.
Stop the containers by running docker compose down.
If you're interested in adapting this application to use on premises machines for NVIDIA Inference Microservices (NIMs), for the LLM, embedding model and reranking model, please refer to local-deploy README for the requirement and configuration.

Evaluating the IFU RAG

In evaluation/, you will find the process of evaluating the IFU RAG with pairs of questions and answers as ground truth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

medical-device-training-assistant

medical-device-training-assistant

README.md

Medical Device Training Assistant

Background

Overview

Example Features

Prerequisites

API Keys

RIVA Setup

Hardware Requirement

Build and Start the Containers

Troubleshooting

Next Steps

Evaluating the IFU RAG

Files

medical-device-training-assistant

Directory actions

More options

Directory actions

More options

Latest commit

History

medical-device-training-assistant

Folders and files

parent directory

README.md

Medical Device Training Assistant

Background

Overview

Example Features

Prerequisites

API Keys

RIVA Setup

Hardware Requirement

Build and Start the Containers

Troubleshooting

Next Steps

Evaluating the IFU RAG