Skip to content

[Fix] Fix update_aclgraph_sizes when running MoE models #913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yiz-liu
Copy link
Contributor

@yiz-liu yiz-liu commented May 21, 2025

What this PR does / why we need it?

Fix update_aclgraph_sizes when running MoE models.

Does this PR introduce any user-facing change?

How was this patch tested?

@yiz-liu yiz-liu force-pushed the fix-graph branch 2 times, most recently from c9a9b21 to 45d1cc8 Compare May 21, 2025 08:21
if (additional_config
and "expert_tensor_parallel_size" in additional_config
and not parallel_config.enable_expert_parallel):
parallel_config.expert_tensor_parallel_size = int(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no expert_tensor_parallel_size in ParallelConfig? same for parallel_config.expert_parallel_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM does not support tensor parallelism for experts’ weights, so I added this attribute. This change primarily addresses scenarios where the number of devices significantly exceeds the number of experts.

# Calculate expert parallel size based on world size
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Does this mean etp size not equal to 1 only when disabling ep and enabling etp? then why we set ep size here if ep is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned above, this is more like an enhancement.

parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still confused. Could I understand like this: when disabling ep and enabling etp, a expected etp size is set, and an unexpected ep size is set?

or should ep size be set only when ep is enabled?

Suggested change
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)
if parallel_config.enable_expert_parallel:
parallel_config.expert_parallel_size = (
parallel_config.world_size //
parallel_config.expert_tensor_parallel_size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this in detail later, you are right, perhaps we need to give careful consideration to the configuration logic here to avoid any unwanted confusion.

@yiz-liu yiz-liu force-pushed the fix-graph branch 3 times, most recently from 6c98888 to 64ea69c Compare May 22, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants