`images` content support in chat_ollama #221

frankiethull · 2024-12-16T21:43:57Z

it would be useful to add the images arg to chat_ollama. I tried 1) appending the raw image path, 2) using elmer::content_image_file(), 3) elmer::content_image_url(), and 4) elmer::content_image_file()@DaTa, but it seems that ollama's llava, moondream, etc. do not like the image in the prompt itself.

After some testing, what does work is placing the image path directly into the images arg.

for example this works as expected, ollamar::generate("llava-phi3", my_prompt, images = my_image_local_file, output = 'text')
for reference here's the ollama API discussion, https://github.com/ollama/ollama/blob/main/docs/api.md

The text was updated successfully, but these errors were encountered:

frankiethull · 2024-12-31T18:06:45Z

Here's a reprex with further testing. Adding images to api_args but still getting unexpected results.

library(ellmer)
llm_image_classification <- \(ollama_model = "llava-phi3", 
                              path_to_image = system.file("img/test_img.jpg", package = "kuzco")){
  
  
  chat <- chat_ollama(
          model = ollama_model,
          system_prompt = "
          You are a terse assistant specializing in computer vision image classification. 
          You are short and to the point. You only respond if the user supplies an image. 
          You will observe the image and answer specific questions related to the image.
          ",
          api_args = list(
            #images = ollamar::image_encode_base64(path_to_image)
            images = ellmer::content_image_file(path_to_image)@data
          )
    )
  
  type_image_class <- type_object(
    image_classification = type_string(),
    primary_object       = type_string(),
    secondary_object     = type_string()#,
 #   image_description    = type_string(),
 #   image_colors         = type_string(),
#    image_proba          = type_string()
  )
  
      # image_summary <- type_object(
      #   img_class = type_array(items = type_image_class)
      # )
     
  prompt <- "
  Given an image, you are tasked with image_classification, give one or two words to classify the image.
  Provide the primary_object in the image, the secondary_object
  "
  
  data_list <- chat$extract_data(
                      prompt, 
                       type = type_image_class
                   )
  
  return(data_list)
}

llm_image_classification()
#> $image_classification
#> [1] "fruit"
#> 
#> $primary_object
#> [1] "wrappor"
#> 
#> $secondary_object
#> [1] ""

^{Created on 2024-12-31 with reprex v2.1.1}

I'm not overly confident with "llava-phi3" but the image is of a puppy and does well via ollamar.

same model and similar prompt via ollamar:

llm_results |> str()
#> 'data.frame':    1 obs. of  7 variables:
#>  $ image_classification: chr "puppy"
#>  $ primary_object      : chr "dog"
#>  $ secondary_object    : chr "ear"
#>  $ image_description   : chr "A puppy with black and white fur."
#>  $ image_colors        : chr "#909091, #ffffff, #763c2f, #8fbc8b, #e6c774, #a3ca8d, #354a88, #8faec8"
#>  $ image_proba_names   :List of 1
#>   ..$ : chr "puppy, black, white, fur, ear, snout, eye, nose"
#>  $ image_proba_values  :List of 1
#>   ..$ : chr "0.62, 0.21, 0.75, 0.43, 0.89, 0.67, 0.38, 0.45"

The image does show a puppy.

hadley · 2025-01-10T16:21:25Z

This works for me:

library(ellmer)

chat <- chat_ollama(model = "llava-phi3")
chat$chat(
  "What's in this image?",
  content_image_file(system.file("httr2.png", package = "ellmer"))
)
#> The image showcases a dynamic scene featuring Htr2, a digital font style, used 
#> to spell out "htr2". The typography is colored pink, contrasting against the 
#> blue background. A large red baseball bat logo takes center stage against this 
#> backdrop, rendered in hues of red and black. The logo depicts an athletic male 
#> figure poised to swing his bat, ready for action. All these visual elements are
#> set upon a black square base, adding to the overall digital appeal of the 
#> image. The font "htr2" is positioned on the left side of the poster while a red
#> circle labelled with "www.mikrofoniki.com/a6703859-0" floats in the top left 
#> corner. This combination creates an eye-catching and visually appealing design 
#> that blends typography and graphics seamlessly.

^{Created on 2025-01-10 with reprex v2.1.0}

So if you're having problems you'll need to provide a reprex.

frankiethull · 2025-01-10T17:32:58Z

I've truly lost my mind, thanks for finding it and handing back to me.

library(ellmer)

# @@@@@@@@@@@@ Quick Test @@@@@@@@@@@@

chat <- chat_ollama(model = "llava-phi3")
chat$chat(
  "What's in this image?",
  content_image_file(system.file("img/test_img.jpg", package = "kuzco"))
)
#> In the image, there is a young dog with black and white fur. The dog is sitting
#> on a surface that appears to be made of fabric or blanket. The dog is looking 
#> directly at the camera with its eyes closed, giving an impression of 
#> contentment and relaxation. In one corner of the photo, you can see a glimpse 
#> of a red and blue plaid shirt, perhaps belonging to the owner of the dog. The 
#> background forms a blurred image, putting the focus entirely on the resting dog
#> in the center of the frame. This composition creates a peaceful and warm 
#> atmosphere around the dog.

# @@@@@@@@@@@ Structured Test @@@@@@@@

chat <- chat_ollama(
  model = "llava-phi3",
  system_prompt = "
          You are a terse assistant specializing in computer vision image classification. 
          You are short and to the point. You only respond if the user supplies an image. 
          You will observe the image and answer specific questions related to the image.
          Respond in JSON
          ")

type_image_class <- type_object(
  image_classification = type_string(),
  primary_object       = type_string(),
  primary_color        = type_string()
)

image_summary <- type_object(
  img_class = type_array(items = type_image_class)
)

prompt <- "
  Given an image, you are tasked with image_classification, give one or two words to classify the image.
  Provide the primary_object in the image and the primary_color of the primary_object. 
  "

chat$extract_data(
  prompt, ellmer::content_image_file(system.file("img/test_img.jpg", package = "kuzco")),
  type = image_summary
)
#> $img_class
#>   image_classification primary_object   primary_color
#> 1                  dog            dog black and white
#> 2               fabric          cloth   reddish-brown

^{Created on 2025-01-10 with reprex v2.1.1}

hadley added this to the 0.1.1 milestone Jan 10, 2025

hadley added the reprex needs a minimal reproducible example label Jan 10, 2025

hadley removed this from the 0.1.1 milestone Jan 10, 2025

frankiethull closed this as completed Jan 10, 2025

frankiethull mentioned this issue Jan 10, 2025

add ellmer support frankiethull/kuzco#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`images` content support in chat_ollama #221

`images` content support in chat_ollama #221

frankiethull commented Dec 16, 2024

frankiethull commented Dec 31, 2024

hadley commented Jan 10, 2025

frankiethull commented Jan 10, 2025

images content support in chat_ollama #221

images content support in chat_ollama #221

Comments

frankiethull commented Dec 16, 2024

frankiethull commented Dec 31, 2024

hadley commented Jan 10, 2025

frankiethull commented Jan 10, 2025

`images` content support in chat_ollama #221

`images` content support in chat_ollama #221