Add OpenAI as a Provider for Descriptive Text Generation #828

dkotter · 2024-11-21T22:46:27Z

Description of the Change

In #785, we updated to GPT-4o mini in our OpenAI ChatGPT Provider. This model is multi-modal, which means you can do things with images, video, or audio, not just text.

So far we haven't take advantage of that but this PR brings OpenAI as a Provider for the Descriptive Text Generator Feature. Currently this Feature only runs on the Azure AI Vision Provider, so this brings a second option for that Feature.

Making requests to this model is the same as all of our text generation requests, other than we send the image URL in that request. We have a default prompt that is used and that can be modified from the settings screen, as needed. I tried to keep this prompt fairly generic but open to suggestions on improvements there. It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:

You are an assistant that generates descriptions of images that are used on a website. You will be provided with an image and will describe the main item you see in the image, giving details but staying concise. There is no need to say "the image contains" or similar, just describe what is actually in the image. This text will be important for screen readers, so make sure it is descriptive and accurate but not overly verbose

OpenAI requires images to be at least 512x512, so we return an error message if any image below that threshold is used. It also supports passing in the full image URL or a base64 encoded version of the image. For now I've used the image URL but we could look to go the encoded route, which would make things work in environments where images are publicly accessible (like locally). The downside here is it's slower and more expensive, as it uses more tokens.

Closes #826

Descriptive Text Generator settings screen

How to test the Change

Go to Tools > ClassifAI > Image Processing > Descriptive Text Generator
Select OpenAI as your Provider and add proper credentials
Ensure at least one Descriptive text fields is turned on
Go to your Media Library and choose an image without alt text and run the descriptive text scan. Ensure the text is saved properly
Try this from the single attachment page, using the metabox and ensure this works
Upload a new image and ensure alt text is added during that process
Test other methods as desired, like bulk processing on the Media Library list view or the WP-CLI command
Can also test adding custom prompts to ensure they work

Changelog Entry

Added - Add OpenAI ChatGPT as a Provider for the Descriptive Text Generator Feature.

Credits

Props @dkotter, @jeffpaul

Checklist:

I agree to follow this project's Code of Conduct.
I have updated the documentation accordingly.
I have added Critical Flows, Test Cases, and/or End-to-End Tests to cover my change.
All new and existing tests pass.

iamdharmesh

Thanks for adding this @dkotter. code looks good and it tests well.

It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:

Yes, I noticed the same. Since users can directly modify the prompt from the settings and each site caters to a different niche, the best results can be achieved by customizing the prompt as per their requirements, so this is fine.

However, if we want to provide some initial help, we could add separate sample prompts for each (alt text, caption, and description) somewhere in our documentation and include a link in the settings. This way, users who need only one of these can directly copy it into the custom prompt and start using it. What do you think?

…y the default prompt a bit

dkotter · 2024-12-12T18:03:56Z

However, if we want to provide some initial help, we could add separate sample prompts for each (alt text, caption, and description) somewhere in our documentation and include a link in the settings. This way, users who need only one of these can directly copy it into the custom prompt and start using it. What do you think?

@iamdharmesh Good idea. I've added a new page to our documentation where we can add prompt examples and I've added three prompts for this Feature:

Generate just alt text
Generate just image captions
Generate just image descriptions

I then link to this documentation beneath the custom prompt settings.

Not needed here but probably worth a followup to do this same thing anywhere we add prompts, adding new examples to our docs and linking to those.

dkotter added 6 commits November 21, 2024 13:51

Allow ChatGPT to be used as a Provider for Descriptive Text Generation

f180a0a

Add a route to handle the descriptive text generation request

71f6993

Allow customizing the prompt in the settings. Fix typo

eac68e8

Modify the default prompt a bit

41b1c61

Add E2E tests

51ce65a

Set detail to auto

02fc829

dkotter added this to the 3.2.0 milestone Nov 21, 2024

dkotter self-assigned this Nov 21, 2024

dkotter requested review from jeffpaul and a team as code owners November 21, 2024 22:46

github-actions bot added the needs:code-review This requires code review. label Nov 21, 2024

dkotter added 2 commits November 21, 2024 15:50

Remove test that isn't needed

e76d989

Bring over test fixes from 815

1860aba

jeffpaul requested review from a team and faisal-alvi and removed request for a team and jeffpaul November 26, 2024 15:28

dkotter and others added 2 commits December 10, 2024 12:42

Merge branch 'develop' into feature/826

94e6dab

Merge branch 'develop' of github.com:10up/classifai into feature/826

8d5a707

iamdharmesh previously approved these changes Dec 12, 2024

View reviewed changes

github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Dec 12, 2024

Merge branch 'develop' into feature/826

ed3c6bd

dkotter dismissed iamdharmesh’s stale review via ed3c6bd December 12, 2024 15:57

github-actions bot removed the needs:refresh This requires a refreshed PR to resolve. label Dec 12, 2024

dkotter added 3 commits December 12, 2024 09:09

Merge branch 'develop' into feature/826

39b3db8

Add a new hookdoc page to hold example prompts

5bb00bc

Add a link to our example prompts doc from the settings screen. Modif…

4972e65

…y the default prompt a bit

dkotter requested a review from iamdharmesh December 12, 2024 18:04

iamdharmesh approved these changes Dec 13, 2024

View reviewed changes

iamdharmesh merged commit 8ebd7d6 into develop Dec 13, 2024
18 checks passed

iamdharmesh deleted the feature/826 branch December 13, 2024 05:22

dkotter mentioned this pull request Jan 31, 2025

Add Ollama as a Provider, allowing some Features to work with locally hosted LLMs #845

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI as a Provider for Descriptive Text Generation #828

Add OpenAI as a Provider for Descriptive Text Generation #828

dkotter commented Nov 21, 2024 •

edited

Loading

iamdharmesh left a comment

dkotter commented Dec 12, 2024

Add OpenAI as a Provider for Descriptive Text Generation #828

Add OpenAI as a Provider for Descriptive Text Generation #828

Conversation

dkotter commented Nov 21, 2024 • edited Loading

Description of the Change

How to test the Change

Changelog Entry

Credits

Checklist:

iamdharmesh left a comment

Choose a reason for hiding this comment

dkotter commented Dec 12, 2024

dkotter commented Nov 21, 2024 •

edited

Loading