[Fix] Fix update_aclgraph_sizes when running MoE models #913

yiz-liu · 2025-05-21T08:04:26Z

What this PR does / why we need it?

Fix update_aclgraph_sizes when running MoE models.

Does this PR introduce any user-facing change?

How was this patch tested?

MengqingCao · 2025-05-21T09:27:09Z

vllm_ascend/platform.py

+            if (additional_config
+                    and "expert_tensor_parallel_size" in additional_config
+                    and not parallel_config.enable_expert_parallel):
+                parallel_config.expert_tensor_parallel_size = int(


Seems there is no expert_tensor_parallel_size in ParallelConfig? same for parallel_config.expert_parallel_size

vLLM does not support tensor parallelism for experts’ weights, so I added this attribute. This change primarily addresses scenarios where the number of devices significantly exceeds the number of experts.

MengqingCao · 2025-05-21T12:42:11Z

vllm_ascend/platform.py

+            # Calculate expert parallel size based on world size
+            parallel_config.expert_parallel_size = (
+                parallel_config.world_size //
+                parallel_config.expert_tensor_parallel_size)


QQ: Does this mean etp size not equal to 1 only when disabling ep and enabling etp? then why we set ep size here if ep is disabled?

Like I mentioned above, this is more like an enhancement.

MengqingCao · 2025-05-21T13:14:18Z

vllm_ascend/platform.py

+            parallel_config.expert_parallel_size = (
+                parallel_config.world_size //
+                parallel_config.expert_tensor_parallel_size)


Sorry, I'm still confused. Could I understand like this: when disabling ep and enabling etp, a expected etp size is set, and an unexpected ep size is set?

or should ep size be set only when ep is enabled?

Suggested change

parallel_config.expert_parallel_size = (

parallel_config.world_size //

parallel_config.expert_tensor_parallel_size)

if parallel_config.enable_expert_parallel:

parallel_config.expert_parallel_size = (

parallel_config.world_size //

parallel_config.expert_tensor_parallel_size)

We can discuss this in detail later, you are right, perhaps we need to give careful consideration to the configuration logic here to avoid any unwanted confusion.

Signed-off-by: Yizhou Liu <[email protected]>

github-actions bot added the module:core label May 21, 2025

yiz-liu force-pushed the fix-graph branch 2 times, most recently from c9a9b21 to 45d1cc8 Compare May 21, 2025 08:21

MengqingCao reviewed May 21, 2025

View reviewed changes

yiz-liu force-pushed the fix-graph branch from 45d1cc8 to ae10bc3 Compare May 21, 2025 12:56

MengqingCao reviewed May 21, 2025

View reviewed changes

yiz-liu force-pushed the fix-graph branch 3 times, most recently from 6c98888 to 64ea69c Compare May 22, 2025 07:29

[Fix] Fix update_aclgraph_sizes when running MoE models

06be3d2

Signed-off-by: Yizhou Liu <[email protected]>

yiz-liu force-pushed the fix-graph branch from 64ea69c to 06be3d2 Compare May 26, 2025 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Fix update_aclgraph_sizes when running MoE models #913

[Fix] Fix update_aclgraph_sizes when running MoE models #913

Uh oh!

yiz-liu commented May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

MengqingCao May 21, 2025

Uh oh!

yiz-liu May 21, 2025

Uh oh!

Uh oh!

[Fix] Fix update_aclgraph_sizes when running MoE models #913

Are you sure you want to change the base?

[Fix] Fix update_aclgraph_sizes when running MoE models #913

Uh oh!

Conversation

yiz-liu commented May 21, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!