[Core] Efficient CPU prefix caching for the prefill step #10888

lixiaobai09 · 2024-12-04T07:28:15Z

We implement an efficient CPU prefix caching for the prefill step, including:

A naive implementation of CPU KV Block Cache Manager.
A data transmission optimization to overlap layer-wise block swapping and forward computing.
A data transmission optimization to overlap request-level block swapping and computing by delaying the transmission requests one step.

github-actions · 2024-12-04T07:28:26Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Conflicts: vllm/attention/backends/abstract.py vllm/attention/backends/flash_attn.py vllm/core/scheduler.py vllm/model_executor/models/qwen2.py vllm/sequence.py vllm/worker/worker.py vllm/worker/worker_base.py Signed-off-by: Dahai Tang <[email protected]>

Signed-off-by: Dahai Tang <[email protected]>

lixiaobai09 requested review from tlrmchlsmth, zhuohan123, youkaichao, alexm-neuralmagic, comaniac and njhill as code owners December 4, 2024 07:28

mergify bot added ci/build frontend labels Dec 4, 2024

lixiaobai09 changed the title ~~[Misc] Efficient CPU prefix caching for the prefill step~~ [Core] Efficient CPU prefix caching for the prefill step Dec 4, 2024

lixiaobai09 force-pushed the main branch from 17b0aa5 to 2c7d1a8 Compare December 4, 2024 09:14

Dahai Tang added 8 commits December 4, 2024 09:50

Fix(kv store): fix some building and running bugs

8613037

Signed-off-by: Dahai Tang <[email protected]>

Refactor: pass lint check

f47bbce

Signed-off-by: Dahai Tang <[email protected]>

Refactor: pass lint check

3e9fd6a

Signed-off-by: Dahai Tang <[email protected]>

Feat(lint): pass the clang format checker

f272bd7

Signed-off-by: Dahai Tang <[email protected]>

Refactor: pass more detail lint checker

243e4f9

Signed-off-by: Dahai Tang <[email protected]>

isort the import order

2a9deb6

Signed-off-by: Dahai Tang <[email protected]>

Refactor: lint checker

d9b6509

Signed-off-by: Dahai Tang <[email protected]>

lixiaobai09 force-pushed the main branch from 2c7d1a8 to d9b6509 Compare December 4, 2024 09:51

Dahai Tang added 2 commits December 5, 2024 02:17

Feat: move functions about cpu kv_store in worker.py to worker_base

11d934f

Signed-off-by: Dahai Tang <[email protected]>

Feat: add kv_store_meta for common attation meta

068342c

Signed-off-by: Dahai Tang <[email protected]>

lixiaobai09 requested a review from WoosukKwon as a code owner December 5, 2024 06:52

Fix: attn make_metadata with kv_store_meta

8d2816b

Signed-off-by: Dahai Tang <[email protected]>

lixiaobai09 closed this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Efficient CPU prefix caching for the prefill step #10888

[Core] Efficient CPU prefix caching for the prefill step #10888

lixiaobai09 commented Dec 4, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 4, 2024

[Core] Efficient CPU prefix caching for the prefill step #10888

[Core] Efficient CPU prefix caching for the prefill step #10888

Conversation

lixiaobai09 commented Dec 4, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 4, 2024

lixiaobai09 commented Dec 4, 2024 •

edited by github-actions bot

Loading