[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347

fyabc · 2025-04-09T13:41:21Z

This draft PR adding support for Qwen2.5-Omni model (end-to-end full support).

This PR is a later version of #15130, it adds support for talker, code2wav, and an OmniLLMEngine class to manage the end-to-end audio generation process.
You can see #15130 for more details about Qwen2.5-Omni model architecture.

NOTE: Since this PR makes significant changes to vLLM, its a draft and will not be merged in the short term.

Requirements

This PR requires huggingface/transformers#36752.

pip install git+https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8

Note: You need to install transformers from source from that branch

Example Usage

python examples/offline_inference/qwen2_5_omni/end2end.py --model Qwen/Qwen2.5-Omni-7B --prompt audio-in-video-v2 --enforce-eager --do-wave --voice-type m02 --warmup-voice-type m02

This command will print text output and generate .wav output files under current folder.

Signed-off-by: Tao He <[email protected]>

github-actions · 2025-04-09T13:41:32Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-04-09T13:42:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fyabc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-04-09T14:08:13Z

I think we can further split this PR, with the first one (after Qwen2.5-Omni thinker only) adding prompt_embeds support to vLLM. For reference, here are some previous/ongoing efforts to add this feature:

ywang96 · 2025-04-09T19:52:04Z

Thanks for this contribution! As we discussed offline, we'll be carefully reviewing this PR/design and think about how to enable end-to-end support for models like this with vLLM!

Signed-off-by: Tao He <[email protected]> (cherry picked from commit 005879f2b22e40b7d03be7063e80686862a72e2d)

Signed-off-by: fyabc <[email protected]>

examples/offline_inference/qwen2_5_omni/end2end.py

Co-authored-by: Yuyi Wang <[email protected]>

majunze2001 · 2025-06-01T00:42:48Z

Is this fork still usable? After cloning and building I got the following errors:

root@ubuntu:/workspace# python examples/offline_inference/qwen2_5_omni/end2end.py --model Qwen/Qwen2.5-Omni-7B --prompt audio-in-video-v2 --enforce-eager --do-wave --voice-type m02 --warmup-voice-type m02
INFO 06-01 00:40:02 [__init__.py:239] Automatically detected platform cuda.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
/workspace/examples/offline_inference/qwen2_5_omni/end2end.py:258: UserWarning: PySoundFile failed. Trying audioread instead.
  librosa.load(temp_video_file_path, sr=16000)[0])
/opt/venv/lib/python3.11/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
        Deprecated as of librosa version 0.10.0.
        It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/librosa/core/audio.py", line 176, in load
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/librosa/core/audio.py", line 209, in __soundfile_load
    context = sf.SoundFile(path)
              ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/soundfile.py", line 690, in __init__
    self._file = self._open(file, mode_int, closefd)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/soundfile.py", line 1265, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/tmp/tmp3_ttt320': Format not recognised.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/examples/offline_inference/qwen2_5_omni/end2end.py", line 677, in <module>
    main()
  File "/workspace/examples/offline_inference/qwen2_5_omni/end2end.py", line 651, in main
    prompt = make_omni_prompt()
             ^^^^^^^^^^^^^^^^^^
  File "/workspace/examples/offline_inference/qwen2_5_omni/end2end.py", line 480, in make_omni_prompt
    prompt = make_audio_in_video_v2_prompt()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/examples/offline_inference/qwen2_5_omni/end2end.py", line 400, in make_audio_in_video_v2_prompt
    prompt = make_inputs_qwen2_omni(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/examples/offline_inference/qwen2_5_omni/end2end.py", line 258, in make_inputs_qwen2_omni
    librosa.load(temp_video_file_path, sr=16000)[0])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/librosa/core/audio.py", line 184, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/decorator.py", line 235, in fun
    return caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/librosa/util/decorators.py", line 63, in __wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/librosa/core/audio.py", line 240, in __audioread_load
    reader = audioread.audio_open(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/audioread/__init__.py", line 132, in audio_open
    raise NoBackendError()
audioread.exceptions.NoBackendError

liaoweiguo · 2025-06-23T17:56:19Z

watching ...

BakerBunker · 2025-07-01T08:44:33Z

@majunze2001 librosa needs filename suffix to get the file format in some cases, add suffix to your tmpfile and try again.

SamitHuang · 2025-07-16T07:09:00Z

Thanks for this contribution! As we discussed offline, we'll be carefully reviewing this PR/design and think about how to enable end-to-end support for models like this with vLLM!

looking forward to this feature!

Qwen2.5 omni engine on top of vllm main.

7bd0c48

Signed-off-by: Tao He <[email protected]>

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) tpu Related to Google TPUs labels Apr 9, 2025

mergify bot added the needs-rebase label Apr 9, 2025

mergify bot removed the tpu Related to Google TPUs label Apr 9, 2025

DarkLight1337 mentioned this pull request Apr 9, 2025

[RFC]: Extending VLLM towards native support of non text-generating models #16052

Open

1 task

fix tp bug

6b37b52

mergify bot added the ci/build label Apr 10, 2025

fyabc and others added 3 commits April 10, 2025 22:37

fix voice type

d56758a

Omni: fixes the first chunk of code2wav in the 50hz path.

c287be3

Signed-off-by: Tao He <[email protected]> (cherry picked from commit 005879f2b22e40b7d03be7063e80686862a72e2d)

fix chat_utils

729feed

Signed-off-by: fyabc <[email protected]>

Berrysoft reviewed Apr 14, 2025

View reviewed changes

examples/offline_inference/qwen2_5_omni/end2end.py Outdated Show resolved Hide resolved

lengrongfu mentioned this pull request Apr 15, 2025

[Bug]: AttributeError: 'Qwen2_5OmniConfig' object has no attribute 'num_attention_heads' #16645

Closed

1 task

weedge mentioned this pull request Apr 24, 2025

feat: add qwen2.5-omni ai-bot-pro/achatbot#143

Merged

wangxiongts and others added 3 commits April 25, 2025 10:47

Update qwen2_5_omni_thinker.py

21389e1

Update qwen2_5_omni_talker.py

de8f43f

Fix examples/offline_inference/qwen2_5_omni/end2end.py

ea5a5c3

Co-authored-by: Yuyi Wang <[email protected]>

wangxiongts mentioned this pull request May 19, 2025

vLLM 在线serve 什么时候能支持输出语音呢 QwenLM/Qwen2.5-Omni#287

Open

TUDelftHao mentioned this pull request Jun 4, 2025

Qwen-Omni 全量微调grpo报错ValueError: max_new_tokens must be greater than 0, but is -16384 modelscope/ms-swift#4392

Closed

wangxiongts mentioned this pull request Jun 6, 2025

vllm 部署还不支持音频输出么 QwenLM/Qwen2.5-Omni#307

Open

mergify bot added the qwen Related to Qwen models label Jun 19, 2025

mergify bot added the new-model Requests to new models label Jul 11, 2025

Gaohan123 mentioned this pull request Aug 1, 2025

[RFC]: Add Multimodal Model Recipes (Qwen2.5-VL, Qwen2.5-Omni, InternVL, etc) vllm-project/recipes#10

Open

10 tasks

DarkLight1337 mentioned this pull request Sep 17, 2025

[Feature]: Streaming multi-modal input/output #25066

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347

[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347

Uh oh!

fyabc commented Apr 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

mergify bot commented Apr 9, 2025

Uh oh!

DarkLight1337 commented Apr 9, 2025 •

edited

Loading

Uh oh!

ywang96 commented Apr 9, 2025

Uh oh!

Uh oh!

majunze2001 commented Jun 1, 2025

Uh oh!

liaoweiguo commented Jun 23, 2025

Uh oh!

BakerBunker commented Jul 1, 2025

Uh oh!

SamitHuang commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347

Are you sure you want to change the base?

[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347

Uh oh!

Conversation

fyabc commented Apr 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Requirements

Example Usage

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

mergify bot commented Apr 9, 2025

Uh oh!

DarkLight1337 commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Apr 9, 2025

Uh oh!

Uh oh!

majunze2001 commented Jun 1, 2025

Uh oh!

liaoweiguo commented Jun 23, 2025

Uh oh!

BakerBunker commented Jul 1, 2025

Uh oh!

SamitHuang commented Jul 16, 2025

Uh oh!

Uh oh!

fyabc commented Apr 9, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Apr 9, 2025 •

edited

Loading