Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5 VL sglang's output much worse than transformers #3746

Open
heibaidaolx123 opened this issue Feb 21, 2025 · 8 comments
Open

Qwen2.5 VL sglang's output much worse than transformers #3746

heibaidaolx123 opened this issue Feb 21, 2025 · 8 comments
Assignees

Comments

@heibaidaolx123
Copy link

I tried serving qwen2.5 vl 72B using sglang on a node with 4*A40 GPUs.
The image I used is the official sglang:v0.4.3.post2-cu125
The command:

python3 -m sglang.launch_server \
  --tp $NUM_SHARD \
  --mem-fraction-static 0.99 \
  --disable-cuda-graph \
  --model-path /model/Qwen2.5-VL-72B-Instruct \
  --host 0.0.0.0 \
  --port 23333

I tested using an internal image classification dataset, the results were much worse than when using transformers, acc droped from 87% to 80%.
And I tried another image2code task, the rendered images were much worse, too.

@zhaochenyang20
Copy link
Collaborator

I think most of the case is due to your not using the right chat template. And obviously, you used the wrong one. But could @mickqian take a look?

@heibaidaolx123
Copy link
Author

@zhaochenyang20

I assumed the engine will process the default chat template correctly, like vllm or tgi.

Below is the client code I used, no template realted param. What did I miss?

class LLMClient:
    def __init__(
        self,
        url: str = "http://10.196.164.32:23333/v1",
        max_tokens: int = 2000,
        frequency_penalty=0.0,
        model_name: str = None,
        stop: List[str] = None,
    ):
        openai_api_key = os.getenv("OPENAI_SK", "xxx")
        self.client = OpenAI(api_key=openai_api_key, base_url=url, max_retries=4)
        self.max_tokens = max_tokens
        if model_name is None:
            self.model_name = self.client.models.list().data[0].id
        else:
            self.model_name = model_name
        self.frequency_penalty = frequency_penalty
        self.stop = stop

    def generate(self, image, prompt):
        image_base64 = encode_image_base64(image)
        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": prompt,
                        },
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{image_base64}",
                            },
                        },
                    ],
                }
            ],
            temperature=0.0,
            frequency_penalty=self.frequency_penalty,
            max_tokens=self.max_tokens,
            stop=self.stop,
        )
        return response.choices[0].message.content

@zhaochenyang20
Copy link
Collaborator

@heibaidaolx123
Copy link
Author

@zhaochenyang20
Oh, I missed the chat template. Thanks.
By adding --chat-tempalte qwen2-vl, the result gets better, but still lags behind that of transfomers (acc 83% vs 87%).
Any clue?

@zhaochenyang20
Copy link
Collaborator

Let me ask for help from our multi-modal people.

@yizhang2077
Copy link
Collaborator

Hi @heibaidaolx123 This PR maybe related, #3605, could you have a try? And we also try to integrate a benchmark to set a baseline here #3562

@mickqian
Copy link
Contributor

The problems of Qwen2.5 VL might be related to:

  1. the image process procedure which is not included in hf image_processor
  2. the rotary position embedding of Vit

@heibaidaolx123
Copy link
Author

Hi @heibaidaolx123 This PR maybe related, #3605, could you have a try? And we also try to integrate a benchmark to set a baseline here #3562

@yizhang2077 I tried the pr. The output changed a little, and the acc remains the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants