[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

samurai00 · 2024-11-08T04:04:06Z

🥰 需求描述

Ollama 0.4.0 支持了 llama3.2-vision 模型，可以识别图片。https://ollama.com/blog/llama3.2-vision

目前尝试了在 LobeChat v1.28.4 中调用了 llama3.2-vision 模型，发现不能正确处理图片。

从日志可以看到相关请求体：

{
  "messages": [
    {
      "content": "图片上有什么文字\n\n<files_info>\n<images>\n<images_docstring>here are user upload images you can refer to</images_docstring>\n<image name=\"deval.png\" url=\"https://s3-lobechat.tabun.pro/files/480844/8b97987b-0e33-4a02-9fbc-f03dc34e0567.png\"></image>\n</images>\n\n</files_info>",
      "role": "user"
    }
  ],
  "model": "llama3.2-vision",
  "options": {
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.35,
    "top_p": 1
  },
  "stream": true
}

看起来是把图片放在 content 中了，ollama 的 llama3.2-vision 模型的支持方式可能不同。

希望能够支持一下，谢谢🙏！

🧐 解决方案

从 ollama 的文档看，应该是类似以下的格式：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

既图片需要 base64 encode 之后放在 images 中，而且是一个数组。

📝 补充信息

No response

The text was updated successfully, but these errors were encountered:

lobehubbot · 2024-11-08T04:04:20Z

👀 @samurai00

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事，并尽快回复您。请确保您已经提供了尽可能多的背景信息。

SpeedupMaster · 2024-11-08T04:19:56Z

LLM_VISION_IMAGE_USE_BASE64=1这个有设置吗？

lobehubbot · 2024-11-08T04:20:09Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Is there any setting for LLM_VISION_IMAGE_USE_BASE64=1?

samurai00 · 2024-11-08T06:21:19Z

LLM_VISION_IMAGE_USE_BASE64=1这个有设置吗？

» docker exec -it lobe-chat-database sh
/ $ echo $LLM_VISION_IMAGE_USE_BASE64
1
/ $

{
  "messages": [
    {
      "content": "图片中有什么？\n\n\n<files_info>\n<images>\n<images_docstring>here are user upload images you can refer to</images_docstring>\n<image name=\"截屏2024-05-22 17.52.16.png\" url=\"https://s3-lobechat.tabun.pro/files/480846/875b55ec-923a-48ce-b1e3-0730c4a92794.png\"></image>\n</images>\n\n</files_info>",
      "role": "user"
    }
  ],
  "model": "llama3.2-vision",
  "options": {
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.35,
    "top_p": 1
  },
  "stream": true
}

设置了环境变量 LLM_VISION_IMAGE_USE_BASE64=1 也还是一样

lobehubbot · 2024-11-08T06:21:30Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Is there any setting for LLM_VISION_IMAGE_USE_BASE64=1?

» docker exec -it lobe-chat-database sh
/$ echo $LLM_VISION_IMAGE_USE_BASE64
1
/$

{
  "messages": [
    {
      "content": "What's in the picture?\n\n\n<files_info>\n<images>\n<images_docstring>here are user upload images you can refer to</images_docstring>\n<image name=\" Screenshot 2024-05-22 17.52.16.png\" url=\"https://s3-lobechat.tabun.pro/files/480846/875b55ec-923a-48ce-b1e3-0730c4a92794.png\"></image>\n</images>\n\n</ files_info>",
      "role": "user"
    }
  ],
  "model": "llama3.2-vision",
  "options": {
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.35,
    "top_p": 1
  },
  "stream": true
}

Setting the environment variable LLM_VISION_IMAGE_USE_BASE64=1 still the same

SpeedupMaster · 2024-11-08T12:52:17Z

看这个 #3888 好像还没实现Ollama url 转 base64，不过有的Ollama模型又可以识别图片

This appears to be an XML (Extensible Markup Language) file that contains information about a single image. Here's a breakdown of the contents:

<files_info>: The root element, which indicates that this is a container for file-related information.
<images>: A child element within <files_info>, suggesting that it holds information about images specifically.
<image>: A child element within <images>, representing an individual image.
- name: An attribute of the <image> element, specifying the filename of the image (648557.jpg).
- url: Another attribute of the <image> element, providing a URL where the image can be accessed (http://localhost:9000/lobe/files/480591/bb8bb4b9-f001-4ba5-8162-acd7c47b688b.jpg).

In summary, this XML snippet describes a single image file with its filename and URL. The context appears to be a web application or API that handles file uploads, as hinted by the localhost:9000 URL.

samurai00 added the 🌠 Feature Request New feature or request | 特性与建议 label Nov 8, 2024

dosubot bot added ollama Relative to Ollama Provider and ollama models vision labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

samurai00 commented Nov 8, 2024

lobehubbot commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

lobehubbot commented Nov 8, 2024

samurai00 commented Nov 8, 2024

lobehubbot commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

Comments

samurai00 commented Nov 8, 2024

🥰 需求描述

🧐 解决方案

📝 补充信息

lobehubbot commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

lobehubbot commented Nov 8, 2024

samurai00 commented Nov 8, 2024

lobehubbot commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024