Passing image data from custom function as UserMessage media #1321

henningSaulCM · 2024-09-06T20:39:25Z

henningSaulCM
Sep 6, 2024

Hi,

I have a custom function that is retrieving image data from a 3rd party system.
The next step in the prompt would be to have the LLM/OpenAI generate metadata for it.
I understand the binary data could be passed as Media with the UserMessage.

I am wondering how this next step could be implemented:

Is there a way to somehow pass the image data via prompting?
Could/should I write another custom function that uses the image data to construct the UserMessage with the image data as Media and issue that request?
Any other option?

I hope this makes some sense. Any feedback is much appreciated!
--Henning

RodrigoDiasDeOliveira · 2024-09-06T23:08:50Z

RodrigoDiasDeOliveira
Sep 6, 2024

Hello wahts up!!?? i hpoe it can help you!

When working with image data and OpenAI's models, there are a few strategies to handle the process effectively. Here's a structured approach to achieving this:

1. Directly Passing Image Data via Prompting

If you want to pass image data to an OpenAI model to generate metadata or perform image analysis, you'll need to handle it in a way that the model can interpret. However, as of my last update, OpenAI’s GPT-3.5 and GPT-4 models primarily handle text input and don't directly process binary image data.

2. Custom Function to Handle Image Data

Since OpenAI models generally don't process raw image data directly, you would need to perform a few steps to handle image data:

Extract Metadata or Image Features Using a Different Tool

Use an image processing library or service to extract metadata or features from the image. Libraries like OpenCV, PIL (Python Imaging Library), or dedicated services like Google Vision API, Amazon Rekognition, or similar can help extract text-based descriptions or metadata from images.
Format the Extracted Information for the LLM

Once you have the image's metadata or features, you can format this data into a text-based prompt that can be passed to an OpenAI model. This might include:
- Descriptive text of the image.
- Keywords or tags related to the content.
- Any other relevant information or context.

Construct a UserMessage with Metadata

Create a prompt that includes the extracted text-based data and ask the model to generate or enhance metadata. Here's an example in Python using the OpenAI API:

import openai

# Assuming you have the image metadata or description as a string
image_description = "A sunny beach with palm trees and clear water."

# Your OpenAI API key
openai.api_key = 'YOUR_OPENAI_API_KEY'

# Generate metadata based on the description
response = openai.Completion.create(
    model="text-davinci-003",
    prompt=f"Generate metadata for the following image description: {image_description}",
    max_tokens=150
)

print(response.choices[0].text.strip())

3. Creating a Custom Function for Integration

If you're integrating this process into a broader application:

Write a Function to Extract Image Features

This function would use an image processing library or external API to analyze the image and extract relevant text-based information.
Use the Extracted Data in a Prompt

Construct a prompt or user message including the extracted information and send it to the OpenAI model for further processing or metadata generation.

Here’s a simple workflow example:

from PIL import Image
import openai

def extract_image_description(image_path):
    # Placeholder for actual image processing logic
    # For example, use OCR or image analysis API
    return "Description of the image."

def generate_metadata(description):
    openai.api_key = 'YOUR_OPENAI_API_KEY'
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=f"Generate metadata for the following image description: {description}",
        max_tokens=150
    )
    return response.choices[0].text.strip()

def main(image_path):
    description = extract_image_description(image_path)
    metadata = generate_metadata(description)
    print(metadata)

if __name__ == "__main__":
    main("path_to_your_image.jpg")

4. Alternative Approaches

Use OpenAI’s Fine-Tuned Models: If your use case is specialized, you can consider fine-tuning a model to handle specific text-based descriptions related to images.
External Image-to-Text Services: Integrate with services that can convert images to text descriptions before passing them to OpenAI’s models.

In summary, since direct image processing is not supported, extracting descriptive text or metadata from the image using other tools or services and then using that information with OpenAI’s text-based models is the recommended approach.

i really hope it may help

0 replies

henningSaulCM · 2024-09-12T14:04:29Z

henningSaulCM
Sep 12, 2024
Author

So I ended up with a single "GenerateImageMetadataTool" function that does the following in its apply method:

Retrieve the image data from the 3rd party system
Issue a new, separate Chat request to generate the metadata (see below)

private Map<String, Object> generateMetadata(ImageData imageData, GenerateImageMetadataTool.Request request) {
    ChatClient chatClient = ChatClient.builder(chatModel).build();
    ChatClient.ChatClientRequest chatClientRequest = chatClient.prompt();
    Resource image = new InputStreamResource(imageData.getInputStream());
    Message userMessage = new UserMessage(request.userPrompt, List.of(new Media(MimeTypeUtils.parseMimeType(imageData.getMimeType().toString()), image)));
    Message systemMessage = new SystemMessage(request.systemPrompt);
    chatClientRequest.messages(List.of(systemMessage, userMessage));
    Map<String, Object> result = chatClientRequest.call().entity(new ParameterizedTypeReference<Map<String, Object>>() {
    });
    LOG.info("Successfully generated image metadata for content item with id {} and property {}: {}", request.id, request.property, result);
    return result;
  }

The prompt for the tool specifies the userPrompt and systemPrompt to be used for the metadata generation request, e.g.:

* Use the GenerateImageMetadataTool to generate metadata for the picture's `data` property.
  * Call the tool with the following additional parameters:
    * systemPrompt: "You are an AI assistant generating metadata for pictures as requested by a user."
    * userPrompt: "Generate the following metadata key-value pairs for the picture for use on a corporate website: * `title` - a title for the picture, with a maximum of 50 characters\n * `alt` - HTML ALT text for the picture, with a maximum of 100 characters\n * `keywords` - a list of comma-separated keywords for the picture\n * `detailText` - a caption for the picture, with a maximum of 100 characters\n"
* If the tool returns no metadata, inform the user that no metadata could be generated.

Not sure if this is good practice but it works...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Passing image data from custom function as UserMessage media #1321

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Passing image data from custom function as UserMessage media #1321

Uh oh!

Uh oh!

henningSaulCM Sep 6, 2024

Replies: 2 comments

Uh oh!

RodrigoDiasDeOliveira Sep 6, 2024

1. Directly Passing Image Data via Prompting

2. Custom Function to Handle Image Data

3. Creating a Custom Function for Integration

4. Alternative Approaches

Uh oh!

Uh oh!

henningSaulCM Sep 12, 2024 Author

henningSaulCM
Sep 6, 2024

RodrigoDiasDeOliveira
Sep 6, 2024

henningSaulCM
Sep 12, 2024
Author