[core] clean up cudagraph batchsize padding logic #10996

youkaichao · 2024-12-08T23:53:32Z

unify the logic of cudagraph batchsize padding, and allow user-specified cudagraph capture sizes.

users might want to customize cudagraph capture sizes to accelerate start time, while only capture cudagraph for certain sizes they care about.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-12-08T23:53:45Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

vllm/config.py

mergify · 2024-12-12T21:01:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-12T23:22:34Z

vllm/model_executor/models/jamba.py

@@ -420,6 +420,9 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):

        self.make_empty_intermediate_tensors = (
            self.model.make_empty_intermediate_tensors)
+        self.max_batch_size = (vllm_config.pad_for_cudagraph(
+            self.scheduler_config.max_num_seqs)
+                               if self.scheduler_config else 8192 + 2)


@tlrmchlsmth do you know when self.scheduler_config will be None here?

No, I don't. @mzusman do you know?

Shouldn't be None, just added it as a safety guard - It was added before the whole vllm_config was available inside the modeling file

thanks! can you try to remove it then?

yeah, will open a PR shortly

vllm/config.py

Signed-off-by: youkaichao <[email protected]>

WoosukKwon

Thanks for the fix!

Signed-off-by: youkaichao <[email protected]>

youkaichao added 2 commits December 8, 2024 15:31

draft

81d93c4

Signed-off-by: youkaichao <[email protected]>

fix cudagraph logic

a69adbc

Signed-off-by: youkaichao <[email protected]>

youkaichao requested review from DarkLight1337, ywang96, WoosukKwon, robertgshaw2-neuralmagic, njhill, comaniac and alexm-neuralmagic as code owners December 8, 2024 23:53

youkaichao changed the title ~~[core] allow user-specified cudagraph capture sizes~~ [core] clean up cudagraph batchsize padding logic Dec 8, 2024

youkaichao added 13 commits December 8, 2024 15:58

add max_capture_size

f2db1d0

Signed-off-by: youkaichao <[email protected]>

add _MAX_BATCH_SIZE_TO_CAPTURE

08b6dd4

Signed-off-by: youkaichao <[email protected]>

remove dead code

3a1501a

Signed-off-by: youkaichao <[email protected]>

fix

6907008

Signed-off-by: youkaichao <[email protected]>

fix

42c9300

Signed-off-by: youkaichao <[email protected]>

fix

4c8adcb

Signed-off-by: youkaichao <[email protected]>

fix

69f8ff2

Signed-off-by: youkaichao <[email protected]>

hide some details in string form

830e34d

Signed-off-by: youkaichao <[email protected]>

hide some details in string form

a8f3ef1

Signed-off-by: youkaichao <[email protected]>

hide some details in string form

8edddfd

Signed-off-by: youkaichao <[email protected]>

hide some details in string form

2f7a17d

Signed-off-by: youkaichao <[email protected]>

fix pydantic

7379c67

Signed-off-by: youkaichao <[email protected]>

fix enforce eager

1c8067c

Signed-off-by: youkaichao <[email protected]>

WoosukKwon reviewed Dec 12, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Dec 12, 2024

youkaichao added 2 commits December 12, 2024 14:20

Merge branch 'main' into cudagraph_sizes

59ea38f

fix merge

ec2d484

Signed-off-by: youkaichao <[email protected]>

mergify bot removed the needs-rebase label Dec 12, 2024

youkaichao added 13 commits December 12, 2024 14:25

rename to pad_for_cudagraph

224c6f2

Signed-off-by: youkaichao <[email protected]>

remove mamba import

957a72e

Signed-off-by: youkaichao <[email protected]>

unify one function

d3c3bdc

Signed-off-by: youkaichao <[email protected]>

fix

f952d18

Signed-off-by: youkaichao <[email protected]>

fix

e92559c

Signed-off-by: youkaichao <[email protected]>

fix mamba

a02b8f1

Signed-off-by: youkaichao <[email protected]>

use list

24b548a

Signed-off-by: youkaichao <[email protected]>

comment

4ba82c0

Signed-off-by: youkaichao <[email protected]>

remove comments

859e3ee

Signed-off-by: youkaichao <[email protected]>

comments

ce0cff9

Signed-off-by: youkaichao <[email protected]>

comments

3f750a2

Signed-off-by: youkaichao <[email protected]>

comments

56512ba

Signed-off-by: youkaichao <[email protected]>

fix

5d6928a

Signed-off-by: youkaichao <[email protected]>

youkaichao commented Dec 12, 2024

View reviewed changes

youkaichao requested a review from WoosukKwon December 12, 2024 23:23

WoosukKwon reviewed Dec 13, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

remove if

f4a7a77

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from WoosukKwon December 13, 2024 01:05

WoosukKwon approved these changes Dec 13, 2024

View reviewed changes

youkaichao enabled auto-merge (squash) December 13, 2024 01:15

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 13, 2024

youkaichao added 3 commits December 12, 2024 20:37

fix jamba

07d77a1

Signed-off-by: youkaichao <[email protected]>

fix mamba

c454391

Signed-off-by: youkaichao <[email protected]>

fix both

74f69b6

Signed-off-by: youkaichao <[email protected]>

youkaichao merged commit be39e3c into vllm-project:main Dec 13, 2024
54 checks passed

youkaichao deleted the cudagraph_sizes branch December 13, 2024 06:58

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[core] clean up cudagraph batchsize padding logic (vllm-project#10996)

4e18318

Signed-off-by: youkaichao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] clean up cudagraph batchsize padding logic #10996

[core] clean up cudagraph batchsize padding logic #10996

youkaichao commented Dec 8, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 8, 2024

mergify bot commented Dec 12, 2024

youkaichao Dec 12, 2024

tlrmchlsmth Dec 12, 2024

mzusman Dec 16, 2024

youkaichao Dec 16, 2024

mzusman Dec 16, 2024

WoosukKwon left a comment

[core] clean up cudagraph batchsize padding logic #10996

[core] clean up cudagraph batchsize padding logic #10996

Conversation

youkaichao commented Dec 8, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 8, 2024

mergify bot commented Dec 12, 2024

youkaichao Dec 12, 2024

Choose a reason for hiding this comment

tlrmchlsmth Dec 12, 2024

Choose a reason for hiding this comment

mzusman Dec 16, 2024

Choose a reason for hiding this comment

youkaichao Dec 16, 2024

Choose a reason for hiding this comment

mzusman Dec 16, 2024

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

youkaichao commented Dec 8, 2024 •

edited by github-actions bot

Loading