Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Modulize prepare input and attention metadata builder #6596

Merged
merged 10 commits into from
Jul 23, 2024

Conversation

comaniac
Copy link
Collaborator

This PR further refactors model input builder and attention metadata builder to be more modulized and maintainable. Specifically:

  • Introduce an inner data class to encapsulate intermediate data.
  • Put the logic of processing a particular feature (e.g., prefix caching, sliding windows, lora, mm, etc) to separate functions.
  • Use pre-defined lists to apply these functions in order.
  • Make attention_matadata_builder._add_seq_group private, and let each attention metadata builder handle by itself. This removes the ugly argument list and provides more flexibility for each attention backend to customize add_seq_group.

cc @Yard1 @rkooo567

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 19, 2024
@comaniac comaniac requested a review from rkooo567 July 19, 2024 22:34
@comaniac comaniac assigned Yard1 and unassigned Yard1 Jul 19, 2024
@comaniac comaniac requested a review from Yard1 July 19, 2024 22:34
@comaniac comaniac force-pushed the refactor-prepare-input branch from fb00532 to 8a2c134 Compare July 22, 2024 06:36
@comaniac comaniac force-pushed the refactor-prepare-input branch from 8a2c134 to 0d9d62e Compare July 22, 2024 16:31
Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming most of code is refactoring. I will also wait for @Yard1's review since he will be the one who needs to use API

vllm/worker/model_runner.py Show resolved Hide resolved
vllm/worker/model_runner.py Outdated Show resolved Hide resolved
vllm/worker/model_runner.py Show resolved Hide resolved
vllm/worker/model_runner.py Outdated Show resolved Hide resolved
vllm/worker/model_runner.py Show resolved Hide resolved
vllm/worker/model_runner.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some nits

vllm/attention/backends/flash_attn.py Show resolved Hide resolved
vllm/worker/model_runner.py Outdated Show resolved Hide resolved
vllm/worker/model_runner.py Outdated Show resolved Hide resolved
@comaniac
Copy link
Collaborator Author

Thanks for the review and all comments should be addressed. @rkooo567 @Yard1 PTAL.

@comaniac comaniac force-pushed the refactor-prepare-input branch from a7719d9 to f386f69 Compare July 22, 2024 22:01
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

vllm/worker/model_runner.py Outdated Show resolved Hide resolved
@comaniac comaniac enabled auto-merge (squash) July 22, 2024 23:16
@comaniac comaniac merged commit e0c1575 into vllm-project:main Jul 23, 2024
73 checks passed
@comaniac comaniac deleted the refactor-prepare-input branch July 23, 2024 16:25
cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024
tokens = [seq_data.get_last_token_id()]

inter_data.seq_lens[seq_idx] = seq_len
inter_data.orig_seq_lens[seq_idx] = seq_len

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bug? seq_len might be truncated according to line 300: seq_len = min(seq_len, context_len + token_chunk_size)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants