Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Nov 26, 2024

Description of the Change

In #559, we switched over to using the Azure AI Vision v3.2 API for all Features relying on that. We decided not to switch to the v4.0 of that API as it was still in public preview and had some breaking changes.

That API seems to be more stable now so this PR switches over to that for the following Features:

  • Descriptive Text Generator
  • Image Tags Generator
  • Image Text Extraction

It does not change the following Features:

  • Image Cropping: the v4.0 API is fairly different (doesn't actually return a cropped image but returns the image coordinates that need cropped) and will require additional work
  • PDF Text Extraction: this doesn't exist in the v4.0 API but has been moved to an entirely new API, Azure AI Document Intelligence, so will look to tackle that in a separate PR

Things to note:

  • the v4.0 API supports images up to 20MB, up from previous of 4MB and larger dimensions, up to 16000x16000px
  • Image Text Extraction (OCR) used to be two separate API requests. That can all be done in the v4.0 API so the code for this has been simplified (we've removed the OCR class entirely)
  • The v4.0 API has less regions supported, in particular for the captions feature, which is used for the Descriptive Text Generator Feature. See https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview-image-analysis?tabs=4-0#region-availability for that list
  • Also seems the v4.0 API has fixed confidence scores. In v3.0, we recommended a threshold of 70-75%. In v3.2, we saw scores drop to 50-55% and used that as our recommendation. In testing multiple images, it seems 70% is again a good default so that has been updated

In addition, we now output an error message if a valid caption is returned but the confidence score is lower than our threshold. Previously we would just silently discard that, which can lead to people thinking things aren't working. We still don't save that caption but we show an error letting the user know what happened.

Partially closes #827

For some tests, here's some results I got:

Image v3.2 Caption v3.2 Confidence Score v4.0 Caption v4.0 Confidence Score
A scientist with a microscope a woman wearing a white coat and white lab coat sitting at a desk 36.85% a woman in a lab coat and gloves holding a pen and looking at a microscope 70.76%
Stop sign a stop sign with a cloudy sky 49.51% a stop sign with clouds in the background 82.06%
A dog a dog with its mouth open 57.06% a dog with its mouth open 83.40%

You could argue on if these captions are better or not but they are definitely not worse and the confidence scores are back to being more realistic, which is great as that's an issue that trips up a lot of people

How to test the Change

  1. Setup the Descriptive Text Generator Feature with Azure AI. Ensure it works as expected
  2. Setup the Image Tags Generator Feature with Azure AI. Ensure it works as expected
  3. Setup the Image Text Extraction Feature with Azure AI. Ensure it works as expected

Changelog Entry

Changed - Migrate from the Azure AI Vision v3.2 API to the v4.0 API

Credits

Props @dkotter, @jeffpaul

Checklist:

…rgest image based on filesize and dimensions. Remove the OCR class as it is no longer needed
…hreshold, output an error message instead of just discarding silently. Ensure the caption we save has the first letter uppercased. Ensure the values we want exist before using them
@dkotter dkotter added this to the 3.2.0 milestone Nov 26, 2024
@dkotter dkotter self-assigned this Nov 26, 2024
@dkotter dkotter requested review from jeffpaul and a team as code owners November 26, 2024 20:22
@github-actions github-actions bot added the needs:code-review This requires code review. label Nov 26, 2024
@dkotter dkotter changed the title Feature/827 Update from v3.2 to v4.0 of the Azure AI Vision API Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Azure AI Vision from 3.2 to 4.0
1 participant