vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

jeffbolznv · 2025-10-18T21:34:41Z

Based on #16649.

jeffbolznv · 2025-10-21T13:15:25Z

CC @am17an I've included the ggml_check_edges change in this PR.

0cc4m

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

0cc4m · 2025-10-25T05:36:12Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+                                                                             GGML_OP_GET_ROWS, GGML_OP_RESHAPE,
+                                                                             GGML_OP_SOFT_MAX, GGML_OP_RESHAPE };
+
+//node #963 (  SOFT_MAX):     ffn_moe_probs-15 (  64K) [Vulka         ] use=2:    ffn_moe_logits-15 (  64K) [Vulka         ]


This logging is from ggml_backend_sched_print_assignments (set GGML_SCHED_DEBUG=2, run llama-bench with -v, I also edit the code not to skip views), it truncates the name to keep the formatting aligned.

am17an · 2025-10-25T05:49:38Z

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

Usually I put a debug statement printing the number of nodes fused. We'll need to come up with a better way to assert that the nodes were actually fused

jeffbolznv · 2025-10-25T16:09:19Z

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

I've added some logging in the latest commit that I use to verify fusion and the effects of graph_optimize. You can see the whole sequence of ops without a sync in between, which implies the fusion is working.

Early softmax w/norm: qwen3
Early softmax w/o norm: deepseek2
Late softmax: gpt-oss

jeffbolznv requested a review from 0cc4m as a code owner October 18, 2025 21:34

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Oct 18, 2025

jeffbolznv added 2 commits October 21, 2025 08:03

vulkan: Update topk_moe fusion to handle gpt's late softmax

b2d689a

Based on ggml-org#16649.

Add ggml_check_edges

bee8468

jeffbolznv force-pushed the topk_gpt branch from 0111a34 to bee8468 Compare October 21, 2025 13:14

jeffbolznv requested review from ggerganov and slaren as code owners October 21, 2025 13:14

0cc4m reviewed Oct 25, 2025

View reviewed changes

Add sync logging to show fusion effects

5c973c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

jeffbolznv commented Oct 18, 2025

Uh oh!

jeffbolznv commented Oct 21, 2025

Uh oh!

0cc4m left a comment

Uh oh!

0cc4m Oct 25, 2025

Uh oh!

jeffbolznv Oct 25, 2025

Uh oh!

am17an commented Oct 25, 2025

Uh oh!

jeffbolznv commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

Are you sure you want to change the base?

vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

Conversation

jeffbolznv commented Oct 18, 2025

Uh oh!

jeffbolznv commented Oct 21, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

am17an commented Oct 25, 2025

Uh oh!

jeffbolznv commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants