-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Paligemma support for PNG files #6427
Comments
Thanks for reporting this! Can you check whether #6430 fixes this issue? |
Not related to this PR in particular, but since you're serving this from the OpenAI API server, I don't think PaliGemma is supposed to work out-of-box with it because it was never instruction fine-tuned. In the PaliGemma paper, it says
|
@DarkLight1337 Thank you for taking on this issue! Sorry, but this still doesn't work for me. I pulled your branch using @ywang96 You bring up a good point! I'll have to familiarize myself with the paper, thanks for sharing. |
Oops, I forgot to update the async version of |
thank you! works now :) |
Hi @BabyChouSr
I am getting the following output, not the one you mentioned
What might be the issue here, can you please help me! |
You should use a custom chat template so that the input has the same format as the one shown on HuggingFace. |
@DarkLight1337 I hope the request body for the paligemma api is same for all when hosted through vLLM. Why we should be using custom chat template. Can you please elaborate much on this? |
From my understanding, PaliGemma isn't designed as a chat model so it doesn't have a built in chat template. In this case you are required to define your own template since there isn't a default chat template that works for all models. |
@DarkLight1337 To give more context, I tried the above curl command on the paligemma model that we have hosted through vLLM framework as same as what @BabyChouSr used for his query. But our output was completely different from he has told. So, I had asked a help for that. |
How are you hosting the model? Please show the command that you used. |
I don't think by default the temperature is set to 0 (i.e, we're not greedily sampling) and that's probably why you're seeing the difference. I would also encourage you to take a look at our example script |
@DarkLight1337 It is through a cloud platform called Jarvislabs.ai, they have a vLLM option to host open source models through hugging face. When I tried with paligemma, it gave us two apis, one is /v1/chat/completions and /v1/completions. I thought /v1/chat/completions would work for us and tried it, but didn't proper response. The simple goal here is to given an image and a prompt. It should be able to give the output. |
Do you have the ability to pass through command-line arguments? As mentioned above:
|
No, I have control only on the request body given for the API call. |
How about selecting the HuggingFace model to use? Maybe you can fork the model repo and add the chat template to it. |
Not sure. But, my doubt is why I am not able to get a proper output as like @BabyChouSr got for his jpg image query using /v1/chat/completions api call with paligemma model? |
@JanuRam I don't think that the model should be used for chat responses. You will not receive content that is very meaningful. Try by using the llava template. However, I would say that chat is probably not the use case that you would want to use this model for. If you are looking for chat, you should try https://huggingface.co/openbmb/MiniCPM-V-2_6
|
It is not for chat (conversational purpose), mainly for visual question answering to be precise. |
Your current environment
🐛 Describe the bug
PNG files don't seem to work for
paligemma-3b-mix-448
.To test, try the following command:
python -m vllm.entrypoints.openai.api_server --model google/paligemma-3b-mix-448 --chat-template examples/template_llava.jinja"
on the server.
Then, test this command using:
Error Traceback Output:
However, if we test using a jpg image:
Output:
I believe that the reason why this is the case is because SigLip has a default
num_channels
parameter that is set to 3. When we take in PNG images, PNG images can have 4 channels (RGBA), which can lead to this mismatch. I discovered this mismatch when I was trying to load images usingImage.open(image_url).convert('RGBA')
and then realized that passing these images into vllm would not work due to the above error.The text was updated successfully, but these errors were encountered: