Amazon MemoryDB for Redis

Retrieval Augmented Generation with MemoryDB as VectorStore

Large language models are prone to hallucination, which is just a fancy word for making up a response. To correctly and consistently answer questions, we need to ensure that the model has real information available to support its responses. We use the Retrieval-Augmented Generation (RAG) pattern to make this happen.

With Retrieval-Augmented Generation, we first pass a user's prompt to a data store. This might be in the form of a query to Amazon Kendra . We could also create a numerical representation of the prompt using Amazon Titan Embeddings to pass to a vector database. We then retrieve the most relevant content from the data store to support the large language model's response.

In this lab, we will use an in-memory FAISS database to demonstrate the RAG pattern. In a real-world scenario, you will most likely want to use a persistent data store like Amazon Kendra or the vector engine for Amazon OpenSearch Serverless .

will walk you through the steps to deploy a Python chatbot application using Streamlit on Cloud9. This is the architecture we will be implementing today.

The application is contained in the ragmm_app.py file, and it requires specific packages listed in requirements.txt.

Prerequisites

Before you proceed, make sure you have the following prerequisites in place:

An AWS Cloud9 development environment set up.
We will be using Amazon Bedrock to access foundation models in this workshop.
Enable Foundation models such as Claude, as shown below:
Python and pip installed in your Cloud9 environment.
Internet connectivity to download packages.

Installation

Clone this repository to your Cloud9 environment:

git clone https://github.com/aws-samples/amazon-memorydb-for-redis-samples
cd tutorials/memorydb-rag

Install the required packages using pip:

pip3 install -r requirements.txt -U

Install langchain vectorstore plugin for MemoryDB

git clone https://github.com/aws-samples/amazon-memorydb-for-redis-samples
cd tutorials/langchain-memorydb
pip install .

To use the MemoryDB VectorStore, import the class MemoryDB

from langchain_memorydb import MemoryDB as Redis

Configure environment variables.

export BWB_ENDPOINT_URL=https://bedrock-runtime.us-east-1.amazonaws.com
export MemoryDB_ENDPOINT_URL=redis://CLUSTER_ENDPOINT:PORT
export BWB_PROFILE_NAME=IF_YOU_NEED_TO_USE_AN_AWS_CLI_PROFILE_IT_GOES_HERE
export BWB_REGION_NAME=REGION_NAME_GOES_HERE_IF_YOU_NEED_TO_OVERRIDE_THE_DEFAULT_REGION

You can run the following commands to confirm:

echo $BWB_ENDPOINT_URL
echo $BWB_PROFILE_NAME
echo $BWB_REGION_NAME

Running the application

streamlit run `ragmm_app.py' --server.port 8080}

Features

Vector Store creation

If the index is not created and data is not loaded into MemoryDB then you can select this radio button.

Using Vector database for RAG

If the index is already created below appear when we first load the application.

Testing context based learning and retriever capabilities

The vector database has MemoryDB developer guide.

For more detailed information, refer to the MemoryDB Developer Guide.

Here are a few sample questions we can ask

What is MemoryDB for Redis?
How do you create a MemoryDB cluster?
What are some reasons a highly regulated industry should pick MemoryDB?

Langchain framework for building Chatbot with Amazon Bedrock

LangChain provides easy ways to incorporate modular utilities into chains. It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.

Building Chatbot with Context - Key Elements

Chatbot with Context

In this use case we will ask the Chatbot to answer question from some external corpus. To do this we apply a pattern called RAG (Retrieval Augmented Generation): the idea is to index the corpus in chunks, then look up which sections of the corpus might be relevant to provide an answer by using semantic similarity between the chunks and the question. Finally the most relevant chunks are aggregated and passed as context to the ConversationChain, similar to providing a history.

We will take a PDF file and use Titan Embeddings Model to create vectors. This vector is then stored in Amazon MemoryDB, in-memory vector datbase.

When the chatbot is asked a question, we query MemoryDB with the question and retrieve the text which is semantically closest. This will be our answer.

Similarity search

To see what the input prompt is to the LLM we can execute this search directly on the document store which runs a VSS