This repository now includes an example of integrating GPT-4 Turbo with Vision with Azure AI Search. This feature enables indexing and searching images and graphs, such as financial documents, in addition to text-based content.
- Document Handling: Source documents are split into pages and saved as PNG files in blob storage. Each file's name and page number are embedded for reference.
- Data Extraction: Text data is extracted using OCR.
- Data Indexing: Text and image embeddings, generated using Azure AI Vision (Azure AI Vision Embeddings), are indexed in Azure AI Search along with the raw text.
- Search and Response: Searches can be conducted using vectors or hybrid methods. Responses are generated by GPT-4 Turbo with Vision based on the retrieved content.
-
Create a Computer Vision account in Azure Portal first, so that you can agree to the Responsible AI terms for that resource. You can delete that account after agreeing.
-
The ability to deploy a GPT-4 Turbo with Vision model in the supported regions. If you're not sure, try to create a deployment from your Azure OpenAI deployments page. You should be able to select:
Model Version gpt-4
vision-preview
-
Ensure that you can deploy the Azure OpenAI resource group in a region where all required components are available:
- Azure Open AI models
- gpt-35-turbo
- text-embedding-ada-002
- gpt-4v
- Azure AI Vision
- Azure Open AI models
-
Update repository: Pull the latest changes.
-
Enable GPT-4 Turbo with Vision: Set the environment variable with
azd env set USE_GPT4V true
. This flag is used to deploy necessary components for vision fuctionality and to toggle UI components. -
Clean old deployments (optional): Run
azd down --purge
for a fresh setup. -
Start the application: Execute
azd up
to build, provision, deploy, and initiate document preparation. -
- Access the developer options in the web app and select "Use GPT-4 Turbo with Vision".
- Sample questions will be updated for testing.
- Interact with the questions to view responses.
- The 'Thought Process' tab shows the retrieved data and its processing by GPT-4 Turbo with Vision.
Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.