-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Azure AI Vision from 3.2 to 4.0 #827
Comments
Note a lot of research was done into this in #553, though some things have changed since then. From what I recall, the way the v4 API works is pretty different and will require changes to how we currently do things (seems there were asynchronous vs synchronous differences) |
Status update (mostly to remind myself where I left off for next week): Started work on this and have successfully migrated the Descriptive Text Generator, Image Tags Generator and Image Text Extraction Features over to the v4.0 API with no real challenges. Image Text Extraction required the most changes as we used to make two separate API requests and now that can be done in one. Still left to fully look into are Image Cropping (which at a glance looks to be a fairly straight forward change) and PDF Text Extraction (which is a new API, Azure AI Document Intelligence, so will probably require more work) |
In doing more research, found that Image Cropping isn't quite as cut and dry to move over. In v3.2, we send an image URL plus the dimensions we want the final cropped image to be and Azure sends back the cropped image, which we then store. In v4.0, you send an image URL plus the aspect ratio you want to maintain, and Azure sends back a bounding box within the image representing what they recommend be cropped. You then have to crop the image yourself. The main concern I have is how to crop images smaller using that bounding box. As an example, if the full size image is 1024x768 and I want a cropped 300x300, it seems the bounding box returned is for the full image size, so not sure how to translate that down into the 300x300 size. Because of the extra effort here, recommending we look into that in a separate PR. For the PDF Text Extraction, this doesn't exist in the v4.0 API, it now lives in a new API, Azure AI Document Intelligence. I don't think it will be too hard to integrate this but because this issue is focused on migrating from v3.2 to v4.0, I'd suggest we look into that in a different Issue so we don't block the other updates. |
Is your enhancement related to a problem? Please describe.
Related to #826 and alongside adding OpenAI to alt text generation, let's look to update the Azure API version in ClassifAI.
While there are some differences from the features that are / are not available in 4.0 (which we'll want to validate ClassifAI features that are available in v4), let's look to update to 4.0 where that version includes coverage for specific ClassifAI image processing features using Azure.
Version 4.0 features available: Read text, Captions, Dense captions, Tags, Object detection, Custom image classification / object detection, People, Smart crop
Version 3.2 features available: Tags, Objects, Descriptions, Brands, Faces, Image type, Color scheme, Landmarks, Celebrities, Adult content, Smart crop
Additional context from Azure:
Designs
No response
Describe alternatives you've considered
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: