Update from v3.2 to v4.0 of the Azure AI Vision API #829

dkotter · 2024-11-26T20:22:44Z

Description of the Change

In #559, we switched over to using the Azure AI Vision v3.2 API for all Features relying on that. We decided not to switch to the v4.0 of that API as it was still in public preview and had some breaking changes.

That API seems to be more stable now so this PR switches over to that for the following Features:

Descriptive Text Generator
Image Tags Generator
Image Text Extraction

It does not change the following Features:

Image Cropping: the v4.0 API is fairly different (doesn't actually return a cropped image but returns the image coordinates that need cropped) and will require additional work
PDF Text Extraction: this doesn't exist in the v4.0 API but has been moved to an entirely new API, Azure AI Document Intelligence, so will look to tackle that in a separate PR

Things to note:

the v4.0 API supports images up to 20MB, up from previous of 4MB and larger dimensions, up to 16000x16000px
Image Text Extraction (OCR) used to be two separate API requests. That can all be done in the v4.0 API so the code for this has been simplified (we've removed the OCR class entirely)
The v4.0 API has less regions supported, in particular for the captions feature, which is used for the Descriptive Text Generator Feature. See https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview-image-analysis?tabs=4-0#region-availability for that list
Also seems the v4.0 API has fixed confidence scores. In v3.0, we recommended a threshold of 70-75%. In v3.2, we saw scores drop to 50-55% and used that as our recommendation. In testing multiple images, it seems 70% is again a good default so that has been updated

In addition, we now output an error message if a valid caption is returned but the confidence score is lower than our threshold. Previously we would just silently discard that, which can lead to people thinking things aren't working. We still don't save that caption but we show an error letting the user know what happened.

Partially closes #827

For some tests, here's some results I got:

Image	v3.2 Caption	v3.2 Confidence Score	v4.0 Caption	v4.0 Confidence Score
A scientist with a microscope	a woman wearing a white coat and white lab coat sitting at a desk	36.85%	a woman in a lab coat and gloves holding a pen and looking at a microscope	70.76%
Stop sign	a stop sign with a cloudy sky	49.51%	a stop sign with clouds in the background	82.06%
A dog	a dog with its mouth open	57.06%	a dog with its mouth open	83.40%

You could argue on if these captions are better or not but they are definitely not worse and the confidence scores are back to being more realistic, which is great as that's an issue that trips up a lot of people

How to test the Change

Setup the Descriptive Text Generator Feature with Azure AI. Ensure it works as expected
Setup the Image Tags Generator Feature with Azure AI. Ensure it works as expected
Setup the Image Text Extraction Feature with Azure AI. Ensure it works as expected

Changelog Entry

Changed - Migrate from the Azure AI Vision v3.2 API to the v4.0 API

Credits

Props @dkotter, @jeffpaul

Checklist:

I agree to follow this project's Code of Conduct.
I have updated the documentation accordingly.
I have added Critical Flows, Test Cases, and/or End-to-End Tests to cover my change.
All new and existing tests pass.

…rgest image based on filesize and dimensions. Remove the OCR class as it is no longer needed

…hreshold, output an error message instead of just discarding silently. Ensure the caption we save has the first letter uppercased. Ensure the values we want exist before using them

…ow. Add E2E test fixes from 815

dkotter added 6 commits November 22, 2024 11:58

Migrate the Descriptive Text feature to Azure v4.0

a44c9f6

Migrate the Image Tagging feature to Azure v4.0

85c3325

Migrate the Image Text Extraction feature to Azure v4.0

5f65b93

Only process if an image matches the supported mime types. Get the la…

5d87ad6

…rgest image based on filesize and dimensions. Remove the OCR class as it is no longer needed

Change our recommended threshold from 55 back to 70

52087fe

If a caption is returned but the confidence score is lower than our t…

e2348de

…hreshold, output an error message instead of just discarding silently. Ensure the caption we save has the first letter uppercased. Ensure the values we want exist before using them

dkotter added this to the 3.2.0 milestone Nov 26, 2024

dkotter self-assigned this Nov 26, 2024

dkotter requested review from jeffpaul and a team as code owners November 26, 2024 20:22

github-actions bot added the needs:code-review This requires code review. label Nov 26, 2024

dkotter changed the title ~~Feature/827~~ Update from v3.2 to v4.0 of the Azure AI Vision API Nov 26, 2024

dkotter added 4 commits November 26, 2024 13:50

Lock johnbillion/wp-compat to the v0.x branch to avoid conflict for n…

ebc9855

…ow. Add E2E test fixes from 815

Fix unit tests with new threshold

e3970f6

Update the test data

0b2c3d9

Fix extra spaces

fdd80b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Update from v3.2 to v4.0 of the Azure AI Vision API #829

dkotter commented Nov 26, 2024 •

edited

Loading

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Are you sure you want to change the base?

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Conversation

dkotter commented Nov 26, 2024 • edited Loading

Description of the Change

How to test the Change

Changelog Entry

Credits

Checklist:

dkotter commented Nov 26, 2024 •

edited

Loading