Skip to content

Enable autotuning and bf16 accumulation for SYCL CUTLASS #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: sycl-develop
Choose a base branch
from

Conversation

sommerlukas
Copy link
Collaborator

@sommerlukas sommerlukas commented Apr 30, 2025

Enable autotuning for SYCL CUTLASS by completing the SYCL benchmark request class.

Also removes a temporary workaround that forced float32 accumulation to now allow GEMM to accumulate in bfloat16.

This addresses one of the items left open in #2.

@sommerlukas sommerlukas self-assigned this Apr 30, 2025
Enable autotuning for SYCL CUTLASS by completing
the SYCL benchmark request class.

Also adds a temporary workaround to allow bf16 GEMM
to accumulate in FP32 in code paths used when
auto-tuning is active.

Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
@sommerlukas sommerlukas force-pushed the cutlass-sycl-autotune branch from e21c49d to d76676d Compare May 5, 2025 14:46
@sommerlukas sommerlukas changed the title Enable autotuning for SYCL CUTLASS Enable autotuning and bf16 accumulation for SYCL CUTLASS May 5, 2025
@sommerlukas
Copy link
Collaborator Author

This PR depends on codeplaysoftware/cutlass-sycl#356 for the GEMM accumulation in bf16. We can only merge this PR once codeplaysoftware/cutlass-sycl#356 has been merged and the third_party/cutlass submodule has been updated to include the changes from codeplaysoftware/cutlass-sycl#356.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants