[Feature]: Optional `topk_weights` arg in Fused MoE B2B GEMM #193

sijiac · 2025-03-11T17:26:05Z

Suggestion Description

We may not need topk_weights for the B2B GEMM fused MoE. For now, we can pass a dummy torch.ones() tensor to the kernel, which loads the argument and performs the calculation. Ideally, we should make this an optional argument and skip the associated logic when it's set to None.

cc @carlushuang

Operating System

No response

GPU

No response

ROCm Component

No response

The text was updated successfully, but these errors were encountered:

coconutruben · 2025-03-13T18:18:06Z

@carlushuang to simultaneously address this and #195 it would be ideal if we can control when the topk_weight scaling happens: before GEMMs or after (current implementation). An interface that allows us to say

topk_weights before GEMMs
topk_weights after GEMMs (current)
no topk_weights

sijiac closed this as completed Mar 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Optional `topk_weights` arg in Fused MoE B2B GEMM #193

[Feature]: Optional `topk_weights` arg in Fused MoE B2B GEMM #193

sijiac commented Mar 11, 2025

coconutruben commented Mar 13, 2025

[Feature]: Optional topk_weights arg in Fused MoE B2B GEMM #193

[Feature]: Optional topk_weights arg in Fused MoE B2B GEMM #193

Comments

sijiac commented Mar 11, 2025

Suggestion Description

Operating System

GPU

ROCm Component

coconutruben commented Mar 13, 2025

[Feature]: Optional `topk_weights` arg in Fused MoE B2B GEMM #193

[Feature]: Optional `topk_weights` arg in Fused MoE B2B GEMM #193