[fix] added support for vlm in offline inference #3548

FrankLeeeee · 2025-02-13T14:38:48Z

Motivation

This PR aims to fix the issue #3545 by enhancing the engine with the support of vision language models such as Qwen2-VL for offline inference.

Modifications

First of all, it should be noted that the design of current code has some issues which make this PR an imperfect solution, as a result, I have not added any documentation or unit test yet and wish to look for discussion on how to improve the code as a whole.

The root causes of #3545 are that:

VLMs require the prompt to be pre-processed by the chat template. This is because that the processor will add in some image-related tokens in the prompt. For exmaple, Qwen2-VL adds the following tokens to the prompt: <|vision_start|><|image_pad|><|vision_end|>. Thus, VLMs are different from LLMs in the sense that the chat template is a must for VLM but not always for LLMs (LLMs can still generate sensible outputs even without the template).
sgl.Engine is not responsible for applying the chat template to the prompts
the current API for applying the chat template is only for online serving, i.e. v1_chat_generate_request

As a result, the current proposed workflow for running VLMs offline is shown below:

It is not so elegant because it counter-intuitively uses a API used for online serving in the offline inference scenario. I would suggest we extract the preprocessing logic to independent APIs or create a new API for preprocessing in offline cases.

Open for discussion.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

yizhang2077 · 2025-02-14T02:19:10Z

LGTM, cc @zhaochenyang20 @merrymercy

added support for vlm in offline inference

fe8b351

FrankLeeeee requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 13, 2025 14:38

FrankLeeeee added 2 commits February 13, 2025 14:40

fixed usage doc

0cdbed3

fixed typo

fdc82a1

yizhang2077 approved these changes Feb 14, 2025

View reviewed changes

zhaochenyang20 and others added 2 commits February 14, 2025 08:42

Merge branch 'main' into hotfix/offline-vlm

e021ae3

Merge branch 'main' into hotfix/offline-vlm

b1c8913

zhyncs merged commit fb4c9c3 into sgl-project:main Feb 14, 2025

FrankLeeeee deleted the hotfix/offline-vlm branch February 15, 2025 03:40

yizhang2077 mentioned this pull request Feb 15, 2025

bench: Add a benchmark for vLM: MMMU #3562

Merged

6 tasks

chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025

[fix] added support for vlm in offline inference (sgl-project#3548)

d540ec8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] added support for vlm in offline inference #3548

[fix] added support for vlm in offline inference #3548

FrankLeeeee commented Feb 13, 2025 •

edited

Loading

yizhang2077 commented Feb 14, 2025

[fix] added support for vlm in offline inference #3548

[fix] added support for vlm in offline inference #3548

Conversation

FrankLeeeee commented Feb 13, 2025 • edited Loading

Motivation

Modifications

Checklist

yizhang2077 commented Feb 14, 2025

FrankLeeeee commented Feb 13, 2025 •

edited

Loading