-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
images
content support in chat_ollama
#221
Comments
Here's a reprex with further testing. Adding library(ellmer)
llm_image_classification <- \(ollama_model = "llava-phi3",
path_to_image = system.file("img/test_img.jpg", package = "kuzco")){
chat <- chat_ollama(
model = ollama_model,
system_prompt = "
You are a terse assistant specializing in computer vision image classification.
You are short and to the point. You only respond if the user supplies an image.
You will observe the image and answer specific questions related to the image.
",
api_args = list(
#images = ollamar::image_encode_base64(path_to_image)
images = ellmer::content_image_file(path_to_image)@data
)
)
type_image_class <- type_object(
image_classification = type_string(),
primary_object = type_string(),
secondary_object = type_string()#,
# image_description = type_string(),
# image_colors = type_string(),
# image_proba = type_string()
)
# image_summary <- type_object(
# img_class = type_array(items = type_image_class)
# )
prompt <- "
Given an image, you are tasked with image_classification, give one or two words to classify the image.
Provide the primary_object in the image, the secondary_object
"
data_list <- chat$extract_data(
prompt,
type = type_image_class
)
return(data_list)
}
llm_image_classification()
#> $image_classification
#> [1] "fruit"
#>
#> $primary_object
#> [1] "wrappor"
#>
#> $secondary_object
#> [1] "" Created on 2024-12-31 with reprex v2.1.1 I'm not overly confident with "llava-phi3" but the image is of a puppy and does well via same model and similar prompt via ollamar: llm_results |> str()
#> 'data.frame': 1 obs. of 7 variables:
#> $ image_classification: chr "puppy"
#> $ primary_object : chr "dog"
#> $ secondary_object : chr "ear"
#> $ image_description : chr "A puppy with black and white fur."
#> $ image_colors : chr "#909091, #ffffff, #763c2f, #8fbc8b, #e6c774, #a3ca8d, #354a88, #8faec8"
#> $ image_proba_names :List of 1
#> ..$ : chr "puppy, black, white, fur, ear, snout, eye, nose"
#> $ image_proba_values :List of 1
#> ..$ : chr "0.62, 0.21, 0.75, 0.43, 0.89, 0.67, 0.38, 0.45" The image does show a puppy. |
This works for me: library(ellmer)
chat <- chat_ollama(model = "llava-phi3")
chat$chat(
"What's in this image?",
content_image_file(system.file("httr2.png", package = "ellmer"))
)
#> The image showcases a dynamic scene featuring Htr2, a digital font style, used
#> to spell out "htr2". The typography is colored pink, contrasting against the
#> blue background. A large red baseball bat logo takes center stage against this
#> backdrop, rendered in hues of red and black. The logo depicts an athletic male
#> figure poised to swing his bat, ready for action. All these visual elements are
#> set upon a black square base, adding to the overall digital appeal of the
#> image. The font "htr2" is positioned on the left side of the poster while a red
#> circle labelled with "www.mikrofoniki.com/a6703859-0" floats in the top left
#> corner. This combination creates an eye-catching and visually appealing design
#> that blends typography and graphics seamlessly. Created on 2025-01-10 with reprex v2.1.0 So if you're having problems you'll need to provide a reprex. |
I've truly lost my mind, thanks for finding it and handing back to me. library(ellmer)
# @@@@@@@@@@@@ Quick Test @@@@@@@@@@@@
chat <- chat_ollama(model = "llava-phi3")
chat$chat(
"What's in this image?",
content_image_file(system.file("img/test_img.jpg", package = "kuzco"))
)
#> In the image, there is a young dog with black and white fur. The dog is sitting
#> on a surface that appears to be made of fabric or blanket. The dog is looking
#> directly at the camera with its eyes closed, giving an impression of
#> contentment and relaxation. In one corner of the photo, you can see a glimpse
#> of a red and blue plaid shirt, perhaps belonging to the owner of the dog. The
#> background forms a blurred image, putting the focus entirely on the resting dog
#> in the center of the frame. This composition creates a peaceful and warm
#> atmosphere around the dog.
# @@@@@@@@@@@ Structured Test @@@@@@@@
chat <- chat_ollama(
model = "llava-phi3",
system_prompt = "
You are a terse assistant specializing in computer vision image classification.
You are short and to the point. You only respond if the user supplies an image.
You will observe the image and answer specific questions related to the image.
Respond in JSON
")
type_image_class <- type_object(
image_classification = type_string(),
primary_object = type_string(),
primary_color = type_string()
)
image_summary <- type_object(
img_class = type_array(items = type_image_class)
)
prompt <- "
Given an image, you are tasked with image_classification, give one or two words to classify the image.
Provide the primary_object in the image and the primary_color of the primary_object.
"
chat$extract_data(
prompt, ellmer::content_image_file(system.file("img/test_img.jpg", package = "kuzco")),
type = image_summary
)
#> $img_class
#> image_classification primary_object primary_color
#> 1 dog dog black and white
#> 2 fabric cloth reddish-brown Created on 2025-01-10 with reprex v2.1.1 |
it would be useful to add the
images
arg to chat_ollama. I tried 1) appending the raw image path, 2) using elmer::content_image_file(), 3) elmer::content_image_url(), and 4) elmer::content_image_file()@DaTa, but it seems that ollama's llava, moondream, etc. do not like the image in theprompt
itself.After some testing, what does work is placing the image path directly into the
images
arg.for example this works as expected,
ollamar::generate("llava-phi3", my_prompt, images = my_image_local_file, output = 'text')
for reference here's the ollama API discussion, https://github.com/ollama/ollama/blob/main/docs/api.md
The text was updated successfully, but these errors were encountered: