What is Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) is a method used in artificial intelligence, particularly in natural language processing, to generate text responses that are both contextually relevant and rich in content using AI models.
At its core, RAG involves two main components:
-
Retriever: Think "like a search engine", finding relevant information from a knowledgebase, usually a vector database. In this sample, we're using Azure Cosmos DB for NoSQL as our vector database.
-
Generator: Acts like a writer, taking the prompt and information retrieved to create a response. We're using here a Large Language Model (LLM) for this task.
How can we upload additional documents without redeploying everything?
To upload more documents, first put your PDF document in the data/
folder, then use one of these commands depending on your environment.
Make sure your API is started by running npm run start:api
from the root of the project. Then you can use one of the following commands to upload a new PDF document:
# If you're using a POSIX shell
curl -F "file=@data/<your-document.pdf>" http://localhost:7071/api/documents
# If you're using PowerShell
Invoke-RestMethod -Uri "http://localhost:7071/api/documents" -Method Post -InFile "./data/<your-document.pdf>"
You can also use the following command to reupload all PDFs file in the /data
folder at once:
npm run upload:docs
First you need to find the URL of the deployed function. You can either look at the packages/api/.env
file and search for the API_URI
variable, or run this command to get the URL:
azd env get-values | grep API_URI
Then you can use the one of the following commands to upload a new PDF document:
# If you're using a POSIX shell
curl -F "file=@data/<your-document.pdf>" <your_api_url>/api/documents
# If you're using PowerShell
Invoke-RestMethod -Uri "<your_api_url>/api/documents" -Method Post -InFile "./data/<your-document.pdf>"
You can also use the following command to reupload all PDFs file in the /data
folder at once:
node scripts/upload-documents.js <your_api_url>
Why do we need to break up the documents into chunks?
Chunking allows us to limit the amount of information we send to the LLM due to token limits. By breaking up the content, it allows us to easily find potential chunks of text that we can inject and improve the relevance of the results. The method of chunking we use leverages a sliding window of text such that sentences that end one chunk will start the next. This allows us to reduce the chance of losing the context of the text.
How do you change the models used in this sample?
You can use the environment variables to change the chat and embeddings models used in this sample when deployed. Run these commands:
azd env set AZURE_OPENAI_API_MODEL gpt-4
azd env set AZURE_OPENAI_API_MODEL_VERSION 0125-preview
azd env set AZURE_OPENAI_API_EMBEDDINGS_MODEL text-embedding-3-large
azd env set AZURE_OPENAI_API_EMBEDDINGS_MODEL_VERSION 1
You may also need to adjust the capacity in infra/main.bicep
file, depending on how much TPM your account is allowed.
To change the local models used by Ollama, you can edit the file packages/api/src/constants.ts
:
export const ollamaEmbeddingsModel = 'all-minilm:l6-v2';
export const ollamaChatModel = 'mistral:v0.2';
You can see the complete list of available models at https://ollama.ai/models.
After changing the models, you also need to fetch the new models by running the command:
ollama pull <model-name>
What does the azd up
command do?
The azd up
command comes from the Azure Developer CLI, and takes care of both provisioning the Azure resources and deploying code to the selected Azure hosts.
The azd up
command uses the azure.yaml
file combined with the infrastructure-as-code .bicep
files in the infra/
folder. The azure.yaml
file for this project declares several "hooks" for the prepackage step and postprovision steps. The up
command first runs the prepackage
hook which installs Node dependencies and builds the TypeScript files. It then packages all the code (both frontend and backend services) into a zip file which it will deploy later.
Next, it provisions the resources based on main.bicep
and main.parameters.json
. At that point, since there is no default value for the OpenAI resource location, it asks you to pick a location from a short list of available regions. Then it will send requests to Azure to provision all the required resources. With everything provisioned, it runs the postprovision
hook to process the local data and add it to an Azure Cosmos DB index.
Finally, it looks at azure.yaml
to determine the Azure host (Functions and Static Web Apps, in this case) and uploads the zip to Azure. The azd up
command is now complete, but it may take some time for the app to be fully available and working after the initial deploy.
Related commands are azd provision
for just provisioning (if infra files change) and azd deploy
for just deploying updated app code.