[Core] Modulize prepare input and attention metadata builder #6596

comaniac · 2024-07-19T21:40:46Z

This PR further refactors model input builder and attention metadata builder to be more modulized and maintainable. Specifically:

Introduce an inner data class to encapsulate intermediate data.
Put the logic of processing a particular feature (e.g., prefix caching, sliding windows, lora, mm, etc) to separate functions.
Use pre-defined lists to apply these functions in order.
Make attention_matadata_builder._add_seq_group private, and let each attention metadata builder handle by itself. This removes the ugly argument list and provides more flexibility for each attention backend to customize add_seq_group.

cc @Yard1 @rkooo567

github-actions · 2024-07-19T21:40:59Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

rkooo567

LGTM assuming most of code is refactoring. I will also wait for @Yard1's review since he will be the one who needs to use API

vllm/worker/model_runner.py

Yard1

looks good, some nits

vllm/attention/backends/flash_attn.py

vllm/worker/model_runner.py

comaniac · 2024-07-22T19:57:06Z

Thanks for the review and all comments should be addressed. @rkooo567 @Yard1 PTAL.

Yard1

lgtm

vllm/worker/model_runner.py

…oject#6596)

congcongchen123 · 2024-08-14T20:56:53Z

vllm/worker/model_runner.py

+            tokens = [seq_data.get_last_token_id()]
+
+        inter_data.seq_lens[seq_idx] = seq_len
+        inter_data.orig_seq_lens[seq_idx] = seq_len


Is it a bug? seq_len might be truncated according to line 300: seq_len = min(seq_len, context_len + token_chunk_size)

…oject#6596)

…oject#6596) Signed-off-by: Alvant <[email protected]>

…oject#6596)

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 19, 2024

comaniac requested a review from rkooo567 July 19, 2024 22:34

comaniac assigned Yard1 and unassigned Yard1 Jul 19, 2024

comaniac requested a review from Yard1 July 19, 2024 22:34

comaniac added 5 commits July 21, 2024 23:33

done model runner

fbf4455

done

f4f3531

fix lora

c9a329a

fix query_len when prefix cache hit

88d1e5a

fix mm

73bdf90

comaniac force-pushed the refactor-prepare-input branch from fb00532 to 8a2c134 Compare July 22, 2024 06:36

fix prompt adapter

0d9d62e

comaniac force-pushed the refactor-prepare-input branch from 8a2c134 to 0d9d62e Compare July 22, 2024 16:31

rkooo567 approved these changes Jul 22, 2024

View reviewed changes

Yard1 reviewed Jul 22, 2024

View reviewed changes

vllm/attention/backends/flash_attn.py Show resolved Hide resolved

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

comaniac added 2 commits July 22, 2024 12:52

address comments

60dcded

comment

4d4eacc

fix lora and prompt adapter

f386f69

comaniac force-pushed the refactor-prepare-input branch from a7719d9 to f386f69 Compare July 22, 2024 22:01

Yard1 approved these changes Jul 22, 2024

View reviewed changes

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

use list

080746e

comaniac enabled auto-merge (squash) July 22, 2024 23:16

comaniac merged commit e0c1575 into vllm-project:main Jul 23, 2024
73 checks passed

comaniac deleted the refactor-prepare-input branch July 23, 2024 16:25

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

0b7f949

…oject#6596)

gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

9a082f3

…oject#6596)

cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

efd7a29

…oject#6596)

congcongchen123 reviewed Aug 14, 2024

View reviewed changes

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

951ba3a

…oject#6596)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

17ea782

…oject#6596) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Core] Modulize prepare input and attention metadata builder (vllm-pr…

bd3e28c

…oject#6596)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Modulize prepare input and attention metadata builder #6596

[Core] Modulize prepare input and attention metadata builder #6596

comaniac commented Jul 19, 2024

github-actions bot commented Jul 19, 2024

rkooo567 left a comment

Yard1 left a comment

comaniac commented Jul 22, 2024

Yard1 left a comment

congcongchen123 Aug 14, 2024

[Core] Modulize prepare input and attention metadata builder #6596

[Core] Modulize prepare input and attention metadata builder #6596

Conversation

comaniac commented Jul 19, 2024

github-actions bot commented Jul 19, 2024

rkooo567 left a comment

Choose a reason for hiding this comment

Yard1 left a comment

Choose a reason for hiding this comment

comaniac commented Jul 22, 2024

Yard1 left a comment

Choose a reason for hiding this comment

congcongchen123 Aug 14, 2024

Choose a reason for hiding this comment