Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

Based on #16649.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner October 18, 2025 21:34
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Oct 18, 2025
@jeffbolznv
Copy link
Collaborator Author

CC @am17an I've included the ggml_check_edges change in this PR.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

GGML_OP_GET_ROWS, GGML_OP_RESHAPE,
GGML_OP_SOFT_MAX, GGML_OP_RESHAPE };

//node #963 ( SOFT_MAX): ffn_moe_probs-15 ( 64K) [Vulka ] use=2: ffn_moe_logits-15 ( 64K) [Vulka ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vulka?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging is from ggml_backend_sched_print_assignments (set GGML_SCHED_DEBUG=2, run llama-bench with -v, I also edit the code not to skip views), it truncates the name to keep the formatting aligned.

@am17an
Copy link
Collaborator

am17an commented Oct 25, 2025

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

Usually I put a debug statement printing the number of nodes fused. We'll need to come up with a better way to assert that the nodes were actually fused

@jeffbolznv
Copy link
Collaborator Author

I understand what this change is doing, but how do I test it? The topk_moe tests pass before and after this change. Which model architectures correspond to the three modes?

I've added some logging in the latest commit that I use to verify fusion and the effects of graph_optimize. You can see the whole sequence of ops without a sync in between, which implies the fusion is working.

Early softmax w/norm: qwen3
Early softmax w/o norm: deepseek2
Late softmax: gpt-oss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants