-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Whisper model implementation #11280
Open
aurickq
wants to merge
39
commits into
vllm-project:main
Choose a base branch
from
Snowflake-Labs:whisper
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,021
−55
Open
Changes from 37 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
cfbd164
add model def
sfc-gh-aqiao ced0141
WIP
sfc-gh-aqiao 248bafb
WIP, passes profile run
sfc-gh-aqiao 6c9ee61
Merge remote-tracking branch 'upstream/main' into whisper
aurickq 7329b2d
WIP
sfc-gh-aqiao 77ad7ed
kinda working with encoder decoder
sfc-gh-aqiao 755086b
add whisper example
sfc-gh-aqiao b38f5b7
update
sfc-gh-aqiao ff70bce
cleanup a bit
sfc-gh-aqiao 3fbd067
batching
sfc-gh-aqiao 9032aa1
flash_attn
sfc-gh-aqiao ce3a87c
WIP (broken)
sfc-gh-aqiao 04a0ef4
WIP
sfc-gh-aqiao fd4ed14
13rps
sfc-gh-aqiao 26cfede
fuse qkv
sfc-gh-aqiao 34c5830
clean
sfc-gh-aqiao bf111b2
20 RPS
sfc-gh-aqiao a21470b
26rps
sfc-gh-aqiao b457c01
41 rps
sfc-gh-aqiao d81d217
fix tokenizer
sfc-gh-aqiao 17712a4
fix tp
sfc-gh-aqiao b573fa9
clean
sfc-gh-aqiao 6d6cbd9
clean
sfc-gh-aqiao 94a867b
udpate
sfc-gh-aqiao 787708a
add test
sfc-gh-aqiao e943905
some cleanup
sfc-gh-aqiao 606642e
formatting
sfc-gh-aqiao fe8e245
format
sfc-gh-aqiao b59fddb
mypy
sfc-gh-aqiao d66cd42
mypy
sfc-gh-aqiao 6ba1afc
format
sfc-gh-aqiao 26fd92a
fix tests
sfc-gh-aqiao 4566b10
librosa
sfc-gh-aqiao a21334c
Merge remote-tracking branch 'vllm-project/main' into whisper
sfc-gh-aqiao 1fe41fc
small
sfc-gh-aqiao 1c16ad2
updates
sfc-gh-aqiao 7282280
lint
sfc-gh-aqiao 3442852
add todos
sfc-gh-aqiao e0cc63e
bugfix
sfc-gh-aqiao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
import time | ||
|
||
from vllm import LLM, SamplingParams | ||
from vllm.assets.audio import AudioAsset | ||
|
||
dtype = "float" | ||
|
||
# Create a Whisper encoder/decoder model instance | ||
llm = LLM( | ||
model="openai/whisper-large-v3", | ||
max_model_len=448, | ||
max_num_seqs=400, | ||
limit_mm_per_prompt={"audio": 1}, | ||
kv_cache_dtype="fp8", | ||
) | ||
|
||
prompts = [ | ||
{ | ||
"prompt": "<|startoftranscript|>", | ||
"multi_modal_data": { | ||
"audio": AudioAsset("mary_had_lamb").audio_and_sample_rate, | ||
}, | ||
}, | ||
{ # Test explicit encoder/decoder prompt | ||
"encoder_prompt": { | ||
"prompt": "", | ||
"multi_modal_data": { | ||
"audio": AudioAsset("winning_call").audio_and_sample_rate, | ||
}, | ||
}, | ||
"decoder_prompt": "<|startoftranscript|>", | ||
} | ||
] * 1024 | ||
|
||
# Create a sampling params object. | ||
sampling_params = SamplingParams( | ||
temperature=0, | ||
top_p=1.0, | ||
max_tokens=200, | ||
) | ||
|
||
start = time.time() | ||
|
||
# Generate output tokens from the prompts. The output is a list of | ||
# RequestOutput objects that contain the prompt, generated | ||
# text, and other information. | ||
outputs = llm.generate(prompts, sampling_params) | ||
|
||
# Print the outputs. | ||
for output in outputs: | ||
prompt = output.prompt | ||
encoder_prompt = output.encoder_prompt | ||
generated_text = output.outputs[0].text | ||
print(f"Encoder prompt: {encoder_prompt!r}, " | ||
f"Decoder prompt: {prompt!r}, " | ||
f"Generated text: {generated_text!r}") | ||
|
||
duration = time.time() - start | ||
|
||
print("Duration:", duration) | ||
print("RPS:", len(prompts) / duration) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
"""Compare the outputs of HF and vLLM for Whisper models using greedy sampling. | ||
|
||
Run `pytest tests/models/encoder_decoder/audio/test_whisper.py`. | ||
""" | ||
from typing import Optional | ||
|
||
import pytest | ||
|
||
from vllm import LLM, SamplingParams | ||
from vllm.assets.audio import AudioAsset | ||
|
||
from ....utils import fork_new_process_for_each_test, multi_gpu_test | ||
|
||
PROMPTS = [ | ||
{ | ||
"prompt": | ||
"<|startoftranscript|><|en|><|transcribe|><|notimestamps|>", | ||
"multi_modal_data": { | ||
"audio": AudioAsset("mary_had_lamb").audio_and_sample_rate, | ||
}, | ||
}, | ||
{ # Test explicit encoder/decoder prompt | ||
"encoder_prompt": { | ||
"prompt": "", | ||
"multi_modal_data": { | ||
"audio": AudioAsset("winning_call").audio_and_sample_rate, | ||
}, | ||
}, | ||
"decoder_prompt": | ||
"<|startoftranscript|><|en|><|transcribe|><|notimestamps|>", | ||
} | ||
] | ||
|
||
EXPECTED = { | ||
"openai/whisper-medium": [ | ||
" The first words I spoke in the original phonograph, a little piece" | ||
" of practical poetry. Mary had a little lamb, its fleece was white as" | ||
" snow, and everywhere that Mary went the lamb would shun it all.", | ||
" And the old one pitch on the way to Edgar Martinez swung on the line" | ||
" down the left field line for Obeysmith. Here comes Joy. Here is" | ||
" Jorgen at third base. They're gonna wave him in. The throw to the" | ||
" plate will be late. The Mariners are going to play for the American" | ||
" League Championship. I don't believe it. It just continues. My, oh" | ||
" my." | ||
], | ||
"openai/whisper-large-v3": [ | ||
" The first words I spoke in the original phonograph. A little piece" | ||
" of practical poetry. Mary had a little lamb, its fleece was white as" | ||
" snow, and everywhere that Mary went, the lamb was sure to go.", | ||
" And the 0-1 pitch on the way to Edgar Martinez. Swung on the line," | ||
" down the left field line for a base hit. Here comes Joy. Here is" | ||
" Junior to third base. They're going to wave him in. The throw to the" | ||
" plate will be late. The Mariners are going to play for the American" | ||
" League Championship. I don't believe it. It just continues. My, oh," | ||
" my." | ||
] | ||
} | ||
|
||
|
||
def run_test( | ||
model: str, | ||
*, | ||
enforce_eager: bool, | ||
tensor_parallel_size: int, | ||
distributed_executor_backend: Optional[str] = None, | ||
) -> None: | ||
prompt_list = PROMPTS * 10 | ||
expected_list = EXPECTED[model] * 10 | ||
|
||
llm = LLM( | ||
model=model, | ||
tensor_parallel_size=tensor_parallel_size, | ||
distributed_executor_backend=distributed_executor_backend, | ||
enforce_eager=enforce_eager, | ||
) | ||
|
||
sampling_params = SamplingParams( | ||
temperature=0, | ||
top_p=1.0, | ||
max_tokens=200, | ||
) | ||
|
||
outputs = llm.generate(prompt_list, sampling_params) | ||
|
||
for output, expected in zip(outputs, expected_list): | ||
print(output.outputs[0].text) | ||
assert output.outputs[0].text == expected | ||
|
||
|
||
@fork_new_process_for_each_test | ||
@pytest.mark.parametrize("model", | ||
["openai/whisper-medium", "openai/whisper-large-v3"]) | ||
@pytest.mark.parametrize("enforce_eager", [True, False]) | ||
def test_models(model, enforce_eager) -> None: | ||
run_test(model, enforce_eager=enforce_eager, tensor_parallel_size=1) | ||
|
||
|
||
@multi_gpu_test(num_gpus=2) | ||
@pytest.mark.parametrize("model", ["openai/whisper-large-v3"]) | ||
@pytest.mark.parametrize("enforce_eager", [True, False]) | ||
@pytest.mark.parametrize("distributed_executor_backend", ["ray", "mp"]) | ||
def test_models_distributed(model, enforce_eager, | ||
distributed_executor_backend) -> None: | ||
run_test(model, | ||
enforce_eager=enforce_eager, | ||
tensor_parallel_size=2, | ||
distributed_executor_backend=distributed_executor_backend) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to determine this without model type information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about generalizing this from a single example. In the long term it may be better to allow the model definition to specify exactly the mapping between input fields and where they go (e.g. encoder/decoder)