-
Notifications
You must be signed in to change notification settings - Fork 171
[Fix] Fix update_aclgraph_sizes when running MoE models #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c9a9b21
to
45d1cc8
Compare
if (additional_config | ||
and "expert_tensor_parallel_size" in additional_config | ||
and not parallel_config.enable_expert_parallel): | ||
parallel_config.expert_tensor_parallel_size = int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there is no expert_tensor_parallel_size
in ParallelConfig
? same for parallel_config.expert_parallel_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM does not support tensor parallelism for experts’ weights, so I added this attribute. This change primarily addresses scenarios where the number of devices significantly exceeds the number of experts.
# Calculate expert parallel size based on world size | ||
parallel_config.expert_parallel_size = ( | ||
parallel_config.world_size // | ||
parallel_config.expert_tensor_parallel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Does this mean etp size not equal to 1 only when disabling ep and enabling etp? then why we set ep size here if ep is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I mentioned above, this is more like an enhancement.
parallel_config.expert_parallel_size = ( | ||
parallel_config.world_size // | ||
parallel_config.expert_tensor_parallel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still confused. Could I understand like this: when disabling ep and enabling etp, a expected etp size is set, and an unexpected ep size is set?
or should ep size be set only when ep is enabled?
parallel_config.expert_parallel_size = ( | |
parallel_config.world_size // | |
parallel_config.expert_tensor_parallel_size) | |
if parallel_config.enable_expert_parallel: | |
parallel_config.expert_parallel_size = ( | |
parallel_config.world_size // | |
parallel_config.expert_tensor_parallel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss this in detail later, you are right, perhaps we need to give careful consideration to the configuration logic here to avoid any unwanted confusion.
6c98888
to
64ea69c
Compare
Signed-off-by: Yizhou Liu <[email protected]>
What this PR does / why we need it?
Fix update_aclgraph_sizes when running MoE models.
Does this PR introduce any user-facing change?
How was this patch tested?