Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Optional topk_weights arg in Fused MoE B2B GEMM #193

Closed
sijiac opened this issue Mar 11, 2025 · 1 comment
Closed

[Feature]: Optional topk_weights arg in Fused MoE B2B GEMM #193

sijiac opened this issue Mar 11, 2025 · 1 comment

Comments

@sijiac
Copy link
Contributor

sijiac commented Mar 11, 2025

Suggestion Description

We may not need topk_weights for the B2B GEMM fused MoE. For now, we can pass a dummy torch.ones() tensor to the kernel, which loads the argument and performs the calculation. Ideally, we should make this an optional argument and skip the associated logic when it's set to None.

cc @carlushuang

Operating System

No response

GPU

No response

ROCm Component

No response

@coconutruben
Copy link

@carlushuang to simultaneously address this and #195 it would be ideal if we can control when the topk_weight scaling happens: before GEMMs or after (current implementation). An interface that allows us to say

  • topk_weights before GEMMs
  • topk_weights after GEMMs (current)
  • no topk_weights

@sijiac sijiac closed this as completed Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants