This repository is offered to demonstrate a set of resources that will allow you to leverage Azure AI Document Intelligence for high throughput of processing documents stored in Azure Blob Storage to extract text. It then utilized Semantic Kernel, Azure OpenAI and Azure AI Search to index the contents of these documents. The solution can be used to process documents in a variety of formats, including Office documents, PDF, PNG, and JPEG.
IMPORTANT! In addition to leveraging the solution below with multiple Document Intelligence instances, it will be beneficial to request a transaction limit increase for your Document Intelligence Accounts. Instructions for how to do this can be found in the Azure AI Document Intelligence Documentation
This solution leverages the following Azure services:
-
Azure AI Document Intelligence - the Azure AI Service API that will perform the document intelligence, extraction and processing.
-
Azure OpenAI - the Azure AI Service API that will perform the semantic embedding calculations of the extracted text.
-
Azure API Management - used to load balance across multiple Azure OpenAI instances
-
Azure AI Search - the Azure AI Service that will index the extracted text for search and analysis.
-
Azure Blob Storage with three containers
documents
- starting location to perform your bulk upload of documents to be processedprocessresults
- the extracted text output from the Document Intelligence servicecompleted
- location where the original documents are moved to once successfully processed by Document Intelligence
-
Azure Service Bus with three queues
docqueue
- this contains the messages for the files that need to be processed by the Document Intelligence servicetoindexqueue
- this contains the messages for the files that have been processed by the Document Intelligence service and the reults are ready to be indexed by Azure AI Searchprocessedqueue
- this contains the messages for the files that have been processed by the Document Intelligence service and are ready to be moved to thecompleted
blob container
-
DocumentQueueing
- identifies the files in thedocument
blob container and send a claim check message (containing the file name) to thedocqueue
queue. This function is triggered by an HTTP call, but could also be modified to use a Blob TriggerDocumentIntelligence
- processes the message indocqueue
to Document Intelligence, then updates Blob metadata as "processed" and create new message intoindexqueue
andprocessedqueue
This function employs scale limiting and Polly retries with back off for Document Intelligence (too many requests) replies to balance maximum throughput and overloading the API endpointAiSearcIndexing
- processes messages in thetoindexqueue
to get embeddings of the extracted text from Azure Open AI and saves those embeddings to Azure AI SearchFileMover
- processes messages in theprocessedqueue
to move files fromdocument
tocompleted
blobAskQuestions
- simple HTTP function to demonstrate RAG retrieval by allowing you to ask questions on the indexed documents
To further allow for high throughput, the DocumentIntelligence
function can distribute processing between 1-10 separate Document Intelligence accounts. This is managed by the docqueue
funtion automatically adding a RecognizerIndex
value of 0-9 when queueing the files for processing.
The DocumentIntelligence function will distribute the files to the appropriate account (regardless of the number of Document Intelligence accounts actually provisioned).
To configure multiple Document Intelligence accounts with the script below, add a value between 1-10 for the -docIntelligenceInstanceCount
(default is 1). To configure manually, you will need to add all of the Document Intelligence account keys to the Azure Key Vault's DOCUMENT-INTELLIGENCE-KEY
secret -- pipe separated
Assumption: all instances of the Document Intelligence share the same URL (such as: https://eastus.api.cognitive.microsoft.com/)
In a similar way with Document Intelligence, to ensure high throughput, you can deploy multiple Azure OpenAI accounts. To assist in load balancing, the accounts are front-ended with Azure API Management which handled the load balancing and circuit breaker should an instance get overloaded.
To try out the sample end-to-end process, you will need:
- An Azure subscription that you have privileges to create resources.
- Have the Azure CLI installed.
-
IMPORTANT: Open and edit the
main.bicepparam
file found in theinfra
folder. This file will contain the information needed to properly deploy the API Management and Azure OpenAI accounts:-
APIM settings
apiManagementPublisherEmail
- set this to your email or the email address of the APIM ownerapiManagementPublisherName
- your name or the name of the APIM owner
-
Azure OpenAI model settings
azureOpenAIEmbeddingModel
- embedding model you will use to generate the embeddingsembeddingModelVersion
- the version of the embedding model to useembeddingMaxTokens
- the maximum 'chunk' size you want to used to split up large documents for embedding and indexing. Be sure it does not exceed the limit of the model you have chosen.azureOpenAIChatModel
- the chat/completions model to usechatModelVersion
- the versio of the chat model
-
Azure OpenAI deployment settings
For each deployment you want to create, add an object type type as per the example below (note
name
is optional). The value ofprority
is only used if$loadBalancingType
is set topriority
(vs.round-robin
)var eastUs = { name: '' location: 'eastus' suffix: 'eastus' priority: 1 }
then, add that object variable to the
openAIInstances
parameter value such as:param openAIInstances = [ eastUs eastus2 canadaEast ]
-
-
Login to the Azure CLI:
az login
-
Run the deployment command
.\deploy.ps1 -appName "<less than 6 characters>" -location "<azure region>" -docIntelligenceInstanceCount "<number needed>" -loadBalancingType "<priority or round-robin>"
These scripts will create all of the Azure resources and RBAC role assignments needed for the demonstration.
To exercise the code and run the demo, follow these steps:
-
Upload sample file to the storage account's
documents
container. To help with this, you can try the supplied PowerShell scriptBulkUploadAndDuplicate.ps1
. This script will take a directory of local files and upload them to the storage container. Then, based on your settings, duplicate them to help you easily create a large library of files to process.\BulkUploadAndDuplicate.ps1 -path "<path to dir with sample file>" -storageAccountName "<storage account name>" -containerName "incoming" -counterStart 0 -duplicateCount 10
The sample script above would would upload all of the files found in the
-path
directory, then create copies of them prefixed with 000000 through 000010. You can of course upload the files any way you see fit. -
In the Azure portal, navigate to the resource group that was created and locate the function with the
Queueing
in the name. Then select the Functions list and select the function methodDocumentQueueing
. In the "Code + Test" link, select Test/Run and hit "Run" (no query parameters are needed). This will kick off the queueing process for all of the files in thedocuments
storage container. The output will be the number of files that were queued. -
Once messages start getting queued, the
DocumentIntelligence
function will start picking up the messages and begin processing. You should see the number of messages in thedocqueue
queue go down as they are successfully processed. You will also see new files getting created in theprocessresults
container. -
Simultaneously, as the
DocumentIntelligence
function completes it's processing and queues messages in thedocqueue
queue, theAiSearchIndexing
function will start picking up messages in thetoindexqueue
and sent the extracted text in theprocessresults
container to Azure OpenAI for embedding calculation and then Azure AI Search for indexing. Also theMover
function will begin picking up those messages and moving the processed files from theprocessed
container into thecompleted
container. -
You can review the execution and timings of the end to end process
-
Use the
AskQuestions
function to demonstrate RAG retrieval of the index documents.
- Smart load balancing for OpenAI endpoints and Azure API Management (priority load balancing)
- Using Azure API Management Circuit Breaker and Load balancing with Azure OpenAI Service (round robin load balancing)
- Azure OpenAI Service Load Balancing with Azure API Management (round robin load balancing)