-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tao He <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
I think we can further split this PR, with the first one (after Qwen2.5-Omni thinker only) adding |
Thanks for this contribution! As we discussed offline, we'll be carefully reviewing this PR/design and think about how to enable end-to-end support for models like this with vLLM! |
Signed-off-by: Tao He <[email protected]> (cherry picked from commit 005879f2b22e40b7d03be7063e80686862a72e2d)
Signed-off-by: fyabc <[email protected]>
Is this fork still usable? After cloning and building I got the following errors:
|
watching ... |
@majunze2001 librosa needs filename suffix to get the file format in some cases, add suffix to your tmpfile and try again. |
looking forward to this feature! |
This draft PR adding support for Qwen2.5-Omni model (end-to-end full support).
This PR is a later version of #15130, it adds support for talker, code2wav, and an
OmniLLMEngine
class to manage the end-to-end audio generation process.You can see #15130 for more details about
Qwen2.5-Omni
model architecture.NOTE: Since this PR makes significant changes to vLLM, its a draft and will not be merged in the short term.
Requirements
This PR requires huggingface/transformers#36752.
Note: You need to install transformers from source from that branch
Example Usage
This command will print text output and generate
.wav
output files under current folder.