Update Azure AI Vision from 3.2 to 4.0 #827

jeffpaul · 2024-11-21T17:32:43Z

Is your enhancement related to a problem? Please describe.

Related to #826 and alongside adding OpenAI to alt text generation, let's look to update the Azure API version in ClassifAI.

While there are some differences from the features that are / are not available in 4.0 (which we'll want to validate ClassifAI features that are available in v4), let's look to update to 4.0 where that version includes coverage for specific ClassifAI image processing features using Azure.

Version 4.0 features available: Read text, Captions, Dense captions, Tags, Object detection, Custom image classification / object detection, People, Smart crop

Better models; use version 4.0 if it supports your use case.

Version 3.2 features available: Tags, Objects, Descriptions, Brands, Faces, Image type, Color scheme, Landmarks, Celebrities, Adult content, Smart crop

Wider range of features; use version 3.2 if your use case is not yet supported in version 4.0

Additional context from Azure:

We recommend you use the Image Analysis 4.0 API if it supports your use case. Use version 3.2 if your use case is not yet supported by 4.0.

You'll also need to use version 3.2 if you want to do image captioning and your Vision resource is outside the supported Azure regions. The image captioning feature in Image Analysis 4.0 is only supported in certain Azure regions. Image captioning in version 3.2 is available in all Azure AI Vision regions. See Region availability.

Designs

No response

Describe alternatives you've considered

No response

Code of Conduct

I agree to follow this project's Code of Conduct

dkotter · 2024-11-21T18:29:14Z

Note a lot of research was done into this in #553, though some things have changed since then. From what I recall, the way the v4 API works is pretty different and will require changes to how we currently do things (seems there were asynchronous vs synchronous differences)

dkotter · 2024-11-22T22:15:58Z

Status update (mostly to remind myself where I left off for next week):

Started work on this and have successfully migrated the Descriptive Text Generator, Image Tags Generator and Image Text Extraction Features over to the v4.0 API with no real challenges. Image Text Extraction required the most changes as we used to make two separate API requests and now that can be done in one.

Still left to fully look into are Image Cropping (which at a glance looks to be a fairly straight forward change) and PDF Text Extraction (which is a new API, Azure AI Document Intelligence, so will probably require more work)

dkotter · 2024-11-26T20:28:58Z

In doing more research, found that Image Cropping isn't quite as cut and dry to move over. In v3.2, we send an image URL plus the dimensions we want the final cropped image to be and Azure sends back the cropped image, which we then store.

In v4.0, you send an image URL plus the aspect ratio you want to maintain, and Azure sends back a bounding box within the image representing what they recommend be cropped. You then have to crop the image yourself. The main concern I have is how to crop images smaller using that bounding box. As an example, if the full size image is 1024x768 and I want a cropped 300x300, it seems the bounding box returned is for the full image size, so not sure how to translate that down into the 300x300 size. Because of the extra effort here, recommending we look into that in a separate PR.

For the PDF Text Extraction, this doesn't exist in the v4.0 API, it now lives in a new API, Azure AI Document Intelligence. I don't think it will be too hard to integrate this but because this issue is focused on migrating from v3.2 to v4.0, I'd suggest we look into that in a different Issue so we don't block the other updates.

jeffpaul added help wanted Extra attention is needed type:enhancement New feature or request. labels Nov 21, 2024

jeffpaul added this to the 3.2.0 milestone Nov 21, 2024

jeffpaul added this to Open Source Practice Nov 21, 2024

github-project-automation bot moved this to Incoming in Open Source Practice Nov 21, 2024

jeffpaul moved this from Incoming to To Do in Open Source Practice Nov 21, 2024

dkotter self-assigned this Nov 22, 2024

dkotter linked a pull request Nov 26, 2024 that will close this issue

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Azure AI Vision from 3.2 to 4.0 #827

Update Azure AI Vision from 3.2 to 4.0 #827

jeffpaul commented Nov 21, 2024

dkotter commented Nov 21, 2024

dkotter commented Nov 22, 2024 •

edited

Loading

dkotter commented Nov 26, 2024

Update Azure AI Vision from 3.2 to 4.0 #827

Update Azure AI Vision from 3.2 to 4.0 #827

Comments

jeffpaul commented Nov 21, 2024

Is your enhancement related to a problem? Please describe.

Designs

Describe alternatives you've considered

Code of Conduct

dkotter commented Nov 21, 2024

dkotter commented Nov 22, 2024 • edited Loading

dkotter commented Nov 26, 2024

dkotter commented Nov 22, 2024 •

edited

Loading