Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI as a Provider for Descriptive Text Generation #828

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Nov 21, 2024

Description of the Change

In #785, we updated to GPT-4o mini in our OpenAI ChatGPT Provider. This model is multi-modal, which means you can do things with images, video, or audio, not just text.

So far we haven't take advantage of that but this PR brings OpenAI as a Provider for the Descriptive Text Generator Feature. Currently this Feature only runs on the Azure AI Vision Provider, so this brings a second option for that Feature.

Making requests to this model is the same as all of our text generation requests, other than we send the image URL in that request. We have a default prompt that is used and that can be modified from the settings screen, as needed. I tried to keep this prompt fairly generic but open to suggestions on improvements there. It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:

You are an assistant that generates descriptions of images that are used on a website. You will be provided with an image and will describe the main item you see in the image, giving details but staying concise. There is no need to say "the image contains" or similar, just describe what is actually in the image. This text will be important for screen readers, so make sure it is descriptive and accurate but not overly verbose

OpenAI requires images to be at least 512x512, so we return an error message if any image below that threshold is used. It also supports passing in the full image URL or a base64 encoded version of the image. For now I've used the image URL but we could look to go the encoded route, which would make things work in environments where images are publicly accessible (like locally). The downside here is it's slower and more expensive, as it uses more tokens.

Closes #826

Descriptive Text Generator settings screen

How to test the Change

  1. Go to Tools > ClassifAI > Image Processing > Descriptive Text Generator
  2. Select OpenAI as your Provider and add proper credentials
  3. Ensure at least one Descriptive text fields is turned on
  4. Go to your Media Library and choose an image without alt text and run the descriptive text scan. Ensure the text is saved properly
  5. Try this from the single attachment page, using the metabox and ensure this works
  6. Upload a new image and ensure alt text is added during that process
  7. Test other methods as desired, like bulk processing on the Media Library list view or the WP-CLI command
  8. Can also test adding custom prompts to ensure they work

Changelog Entry

Added - Add OpenAI ChatGPT as a Provider for the Descriptive Text Generator Feature.

Credits

Props @dkotter, @jeffpaul

Checklist:

@dkotter dkotter added this to the 3.2.0 milestone Nov 21, 2024
@dkotter dkotter self-assigned this Nov 21, 2024
@dkotter dkotter requested review from jeffpaul and a team as code owners November 21, 2024 22:46
@github-actions github-actions bot added the needs:code-review This requires code review. label Nov 21, 2024
@jeffpaul jeffpaul requested review from a team and faisal-alvi and removed request for a team and jeffpaul November 26, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenAI provider for Image Processing > Descriptive Text Generator (aka image alt text)
1 participant