make fused_moe_kernel
's EM
and num_valid_tokens
arguments do_not_specialize
#11057
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see #11056 for more details.
In the latest Triton release 3.1.0,
do_not_specialize
is used to indicate arguments that do not require kernel specialization. However, in the latest Triton main branch, a newdo_not_specialize_on_alignment
has been added to specify arguments that should not be specialized due to memory alignment reasons, which is more precisely what we need.see: https://github.com/triton-lang/triton/blob/main/python/test/unit/runtime/test_cache.py#L56-L60
Therefore, I have added compatibility in my code: if the
do_not_specialize_on_alignment
is available, it will be used; if not, it will fall back to usingdo_not_specialize
.results: