-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5 VL sglang's output much worse than transformers #3746
Comments
I think most of the case is due to your not using the right chat template. And obviously, you used the wrong one. But could @mickqian take a look? |
I assumed the engine will process the default chat template correctly, like vllm or tgi. Below is the client code I used, no template realted param. What did I miss? class LLMClient:
def __init__(
self,
url: str = "http://10.196.164.32:23333/v1",
max_tokens: int = 2000,
frequency_penalty=0.0,
model_name: str = None,
stop: List[str] = None,
):
openai_api_key = os.getenv("OPENAI_SK", "xxx")
self.client = OpenAI(api_key=openai_api_key, base_url=url, max_retries=4)
self.max_tokens = max_tokens
if model_name is None:
self.model_name = self.client.models.list().data[0].id
else:
self.model_name = model_name
self.frequency_penalty = frequency_penalty
self.stop = stop
def generate(self, image, prompt):
image_base64 = encode_image_base64(image)
response = self.client.chat.completions.create(
model=self.model_name,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_base64}",
},
},
],
}
],
temperature=0.0,
frequency_penalty=self.frequency_penalty,
max_tokens=self.max_tokens,
stop=self.stop,
)
return response.choices[0].message.content |
https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template Go through the whole docs @heibaidaolx123 |
@zhaochenyang20 |
Let me ask for help from our multi-modal people. |
Hi @heibaidaolx123 This PR maybe related, #3605, could you have a try? And we also try to integrate a benchmark to set a baseline here #3562 |
The problems of Qwen2.5 VL might be related to:
|
@yizhang2077 I tried the pr. The output changed a little, and the acc remains the same. |
I tried serving qwen2.5 vl 72B using sglang on a node with 4*A40 GPUs.
The image I used is the official sglang:v0.4.3.post2-cu125
The command:
python3 -m sglang.launch_server \ --tp $NUM_SHARD \ --mem-fraction-static 0.99 \ --disable-cuda-graph \ --model-path /model/Qwen2.5-VL-72B-Instruct \ --host 0.0.0.0 \ --port 23333
I tested using an internal image classification dataset, the results were much worse than when using transformers, acc droped from 87% to 80%.
And I tried another image2code task, the rendered images were much worse, too.
The text was updated successfully, but these errors were encountered: