You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We may not need topk_weights for the B2B GEMM fused MoE. For now, we can pass a dummy torch.ones() tensor to the kernel, which loads the argument and performs the calculation. Ideally, we should make this an optional argument and skip the associated logic when it's set to None.
@carlushuang to simultaneously address this and #195 it would be ideal if we can control when the topk_weight scaling happens: before GEMMs or after (current implementation). An interface that allows us to say
Suggestion Description
We may not need
topk_weights
for the B2B GEMM fused MoE. For now, we can pass a dummy torch.ones() tensor to the kernel, which loads the argument and performs the calculation. Ideally, we should make this an optional argument and skip the associated logic when it's set to None.cc @carlushuang
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered: