Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

images content support in chat_ollama #221

Closed
frankiethull opened this issue Dec 16, 2024 · 3 comments
Closed

images content support in chat_ollama #221

frankiethull opened this issue Dec 16, 2024 · 3 comments
Labels
reprex needs a minimal reproducible example

Comments

@frankiethull
Copy link

it would be useful to add the images arg to chat_ollama. I tried 1) appending the raw image path, 2) using elmer::content_image_file(), 3) elmer::content_image_url(), and 4) elmer::content_image_file()@DaTa, but it seems that ollama's llava, moondream, etc. do not like the image in the prompt itself.

After some testing, what does work is placing the image path directly into the images arg.

for example this works as expected, ollamar::generate("llava-phi3", my_prompt, images = my_image_local_file, output = 'text')
for reference here's the ollama API discussion, https://github.com/ollama/ollama/blob/main/docs/api.md

@frankiethull
Copy link
Author

Here's a reprex with further testing. Adding images to api_args but still getting unexpected results.

library(ellmer)
llm_image_classification <- \(ollama_model = "llava-phi3", 
                              path_to_image = system.file("img/test_img.jpg", package = "kuzco")){
  
  
  chat <- chat_ollama(
          model = ollama_model,
          system_prompt = "
          You are a terse assistant specializing in computer vision image classification. 
          You are short and to the point. You only respond if the user supplies an image. 
          You will observe the image and answer specific questions related to the image.
          ",
          api_args = list(
            #images = ollamar::image_encode_base64(path_to_image)
            images = ellmer::content_image_file(path_to_image)@data
          )
    )
  
  type_image_class <- type_object(
    image_classification = type_string(),
    primary_object       = type_string(),
    secondary_object     = type_string()#,
 #   image_description    = type_string(),
 #   image_colors         = type_string(),
#    image_proba          = type_string()
  )
  
      # image_summary <- type_object(
      #   img_class = type_array(items = type_image_class)
      # )
     
  prompt <- "
  Given an image, you are tasked with image_classification, give one or two words to classify the image.
  Provide the primary_object in the image, the secondary_object
  "
  
  data_list <- chat$extract_data(
                      prompt, 
                       type = type_image_class
                   )
  
  return(data_list)
}

llm_image_classification()
#> $image_classification
#> [1] "fruit"
#> 
#> $primary_object
#> [1] "wrappor"
#> 
#> $secondary_object
#> [1] ""

Created on 2024-12-31 with reprex v2.1.1

I'm not overly confident with "llava-phi3" but the image is of a puppy and does well via ollamar.

same model and similar prompt via ollamar:

llm_results |> str()
#> 'data.frame':    1 obs. of  7 variables:
#>  $ image_classification: chr "puppy"
#>  $ primary_object      : chr "dog"
#>  $ secondary_object    : chr "ear"
#>  $ image_description   : chr "A puppy with black and white fur."
#>  $ image_colors        : chr "#909091, #ffffff, #763c2f, #8fbc8b, #e6c774, #a3ca8d, #354a88, #8faec8"
#>  $ image_proba_names   :List of 1
#>   ..$ : chr "puppy, black, white, fur, ear, snout, eye, nose"
#>  $ image_proba_values  :List of 1
#>   ..$ : chr "0.62, 0.21, 0.75, 0.43, 0.89, 0.67, 0.38, 0.45"

The image does show a puppy.

@hadley hadley added this to the 0.1.1 milestone Jan 10, 2025
@hadley
Copy link
Member

hadley commented Jan 10, 2025

This works for me:

library(ellmer)

chat <- chat_ollama(model = "llava-phi3")
chat$chat(
  "What's in this image?",
  content_image_file(system.file("httr2.png", package = "ellmer"))
)
#> The image showcases a dynamic scene featuring Htr2, a digital font style, used 
#> to spell out "htr2". The typography is colored pink, contrasting against the 
#> blue background. A large red baseball bat logo takes center stage against this 
#> backdrop, rendered in hues of red and black. The logo depicts an athletic male 
#> figure poised to swing his bat, ready for action. All these visual elements are
#> set upon a black square base, adding to the overall digital appeal of the 
#> image. The font "htr2" is positioned on the left side of the poster while a red
#> circle labelled with "www.mikrofoniki.com/a6703859-0" floats in the top left 
#> corner. This combination creates an eye-catching and visually appealing design 
#> that blends typography and graphics seamlessly.

Created on 2025-01-10 with reprex v2.1.0

So if you're having problems you'll need to provide a reprex.

@hadley hadley added the reprex needs a minimal reproducible example label Jan 10, 2025
@hadley hadley removed this from the 0.1.1 milestone Jan 10, 2025
@frankiethull
Copy link
Author

I've truly lost my mind, thanks for finding it and handing back to me.

library(ellmer)

# @@@@@@@@@@@@ Quick Test @@@@@@@@@@@@

chat <- chat_ollama(model = "llava-phi3")
chat$chat(
  "What's in this image?",
  content_image_file(system.file("img/test_img.jpg", package = "kuzco"))
)
#> In the image, there is a young dog with black and white fur. The dog is sitting
#> on a surface that appears to be made of fabric or blanket. The dog is looking 
#> directly at the camera with its eyes closed, giving an impression of 
#> contentment and relaxation. In one corner of the photo, you can see a glimpse 
#> of a red and blue plaid shirt, perhaps belonging to the owner of the dog. The 
#> background forms a blurred image, putting the focus entirely on the resting dog
#> in the center of the frame. This composition creates a peaceful and warm 
#> atmosphere around the dog.

# @@@@@@@@@@@ Structured Test @@@@@@@@

chat <- chat_ollama(
  model = "llava-phi3",
  system_prompt = "
          You are a terse assistant specializing in computer vision image classification. 
          You are short and to the point. You only respond if the user supplies an image. 
          You will observe the image and answer specific questions related to the image.
          Respond in JSON
          ")

type_image_class <- type_object(
  image_classification = type_string(),
  primary_object       = type_string(),
  primary_color        = type_string()
)

image_summary <- type_object(
  img_class = type_array(items = type_image_class)
)

prompt <- "
  Given an image, you are tasked with image_classification, give one or two words to classify the image.
  Provide the primary_object in the image and the primary_color of the primary_object. 
  "

chat$extract_data(
  prompt, ellmer::content_image_file(system.file("img/test_img.jpg", package = "kuzco")),
  type = image_summary
)
#> $img_class
#>   image_classification primary_object   primary_color
#> 1                  dog            dog black and white
#> 2               fabric          cloth   reddish-brown

Created on 2025-01-10 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

2 participants