Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Refactor Ultravox to use merged input processor #11198

Merged
merged 27 commits into from
Dec 16, 2024

Conversation

Isotr0py
Copy link
Collaborator

@Isotr0py Isotr0py commented Dec 14, 2024

  • Refactor Ultravox to use merged input processor
  • Ultravox placeholder will be changed to <|audio|> to keep align with HF.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@Isotr0py Isotr0py marked this pull request as ready for review December 15, 2024 10:51
@DarkLight1337
Copy link
Member

cc @petersalas regarding the change of placeholder token.

@DarkLight1337
Copy link
Member

WDYT of making sampling_rate part of mm_processor_kwargs to make the input format consistent with HF? Even so, we should maintain backwards compatibility for a while.

@Isotr0py
Copy link
Collaborator Author

WDYT of making sampling_rate part of mm_processor_kwargs to make the input format consistent with HF?

But the whisper feature extraction is using a fixed sampling rate, so if we expose the sampling rate to be dynamic, this may cause unnecessary exception.

For example, if we specify the sampling_rate=32000, the ultravox processor will raise an error due to incorrect sampling rate.

from transformers import AutoProcessor
import librosa

processor = AutoProcessor.from_pretrained("fixie-ai/ultravox-v0_3", trust_remote_code=True)
audio, sr = librosa.load("translate_to_chinese.wav")
processor(text="<|audio|>", audio=audio, sampling_rate=32000)
ValueError: The model corresponding to this feature extractor: WhisperFeatureExtractor was trained using a sampling rate of 16000. Please make sure that the provided `raw_speech` input was sampled with 16000 and not 32000.

@DarkLight1337
Copy link
Member

WDYT of making sampling_rate part of mm_processor_kwargs to make the input format consistent with HF?

But the whisper feature extraction is using a fixed sampling rate, so if we expose the sampling rate to be dynamic, this may cause unnecessary exception.

For example, if we specify the sampling_rate=32000, the ultravox processor will raise an error due to incorrect sampling rate.

from transformers import AutoProcessor
import librosa

processor = AutoProcessor.from_pretrained("fixie-ai/ultravox-v0_3", trust_remote_code=True)
audio, sr = librosa.load("translate_to_chinese.wav")
processor(text="<|audio|>", audio=audio, sampling_rate=32000)
ValueError: The model corresponding to this feature extractor: WhisperFeatureExtractor was trained using a sampling rate of 16000. Please make sure that the provided `raw_speech` input was sampled with 16000 and not 32000.

I see, so the sampling_rate parameter actually refers to the input, not the HF processor. Let's keep this as is then.

Isotr0py and others added 4 commits December 15, 2024 22:38
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@DarkLight1337
Copy link
Member

There seems to be some problem with online inference of this model, please fix it.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass so LGTM!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 16, 2024 08:16
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024
@DarkLight1337 DarkLight1337 merged commit d927dbc into vllm-project:main Dec 16, 2024
69 checks passed
@Isotr0py Isotr0py deleted the ultravox-refactor branch December 16, 2024 11:26
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants