Only enable example 14 when the backend is CUDA #170

FMarno · 2024-12-06T15:57:27Z

Maybe it was already broken, but the recent rebase for CUTLASS 3.6 introduced this line:

https://github.com/codeplaysoftware/cutlass-fork/blame/d49319f72cd9eeeb49830a94e23b3fdf8635fcc8/include/cutlass/gemm/device/gemm_universal_adapter.h#L461

Making use of DispatchPolicy::SubgroupSize. The example examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm_cute.cu had set DispatchPolicy to

using DispatchPolicy = cutlass::gemm::MainloopSm80CpAsync<PipelineStages>;

leading to this compiler error:

/tmp/finlay/cutlass-fork/include/cutlass/gemm/device/gemm_universal_adapter.h:461:70: error: no member named 'SubgroupSize' in 'cutlass::gemm::MainloopSm80CpAsync<4>'
  461 |           kernel_properties{sycl_exp::sub_group_size<DispatchPolicy::SubgroupSize>}
      |                                                      ~~~~~~~~~~~~~~~~^

I have modified the cmake to only enable that example for nvidia devices.

examples/14_ampere_tf32_tensorop_gemm/CMakeLists.txt

aacostadiaz

LGTM

Only enable example 14 when the backend is CUDA

9b89154

t4c1 reviewed Dec 9, 2024

View reviewed changes

examples/14_ampere_tf32_tensorop_gemm/CMakeLists.txt Outdated Show resolved Hide resolved

include cute example 14 for cuda backend

1a7ed99

aacostadiaz approved these changes Dec 11, 2024

View reviewed changes

t4c1 approved these changes Dec 12, 2024

View reviewed changes

aacostadiaz merged commit 0f258ae into codeplaysoftware:sycl-develop Dec 12, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only enable example 14 when the backend is CUDA #170

Only enable example 14 when the backend is CUDA #170

FMarno commented Dec 6, 2024

aacostadiaz left a comment

Only enable example 14 when the backend is CUDA #170

Only enable example 14 when the backend is CUDA #170

Conversation

FMarno commented Dec 6, 2024

aacostadiaz left a comment

Choose a reason for hiding this comment