Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] added support for vlm in offline inference #3548

Merged
merged 5 commits into from
Feb 14, 2025

Conversation

FrankLeeeee
Copy link
Collaborator

@FrankLeeeee FrankLeeeee commented Feb 13, 2025

Motivation

This PR aims to fix the issue #3545 by enhancing the engine with the support of vision language models such as Qwen2-VL for offline inference.

Modifications

First of all, it should be noted that the design of current code has some issues which make this PR an imperfect solution, as a result, I have not added any documentation or unit test yet and wish to look for discussion on how to improve the code as a whole.

The root causes of #3545 are that:

  1. VLMs require the prompt to be pre-processed by the chat template. This is because that the processor will add in some image-related tokens in the prompt. For exmaple, Qwen2-VL adds the following tokens to the prompt: <|vision_start|><|image_pad|><|vision_end|>. Thus, VLMs are different from LLMs in the sense that the chat template is a must for VLM but not always for LLMs (LLMs can still generate sensible outputs even without the template).
  2. sgl.Engine is not responsible for applying the chat template to the prompts
  3. the current API for applying the chat template is only for online serving, i.e. v1_chat_generate_request

As a result, the current proposed workflow for running VLMs offline is shown below:
WechatIMG469

It is not so elegant because it counter-intuitively uses a API used for online serving in the offline inference scenario. I would suggest we extract the preprocessing logic to independent APIs or create a new API for preprocessing in offline cases.

Open for discussion.

Checklist

@yizhang2077
Copy link
Collaborator

LGTM, cc @zhaochenyang20 @merrymercy

@zhyncs zhyncs merged commit fb4c9c3 into sgl-project:main Feb 14, 2025
@FrankLeeeee FrankLeeeee deleted the hotfix/offline-vlm branch February 15, 2025 03:40
chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants