Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Complete the cutlass::library::GemmDescription class to cover Hopper GEMM kernels #2073

Open
manishucsd opened this issue Jan 30, 2025 · 3 comments
Labels

Comments

@manishucsd
Copy link
Contributor

manishucsd commented Jan 30, 2025

Issue

CUTLASS GemmDevice Operator contains compile-time attributes (functional and performance attribute). The GemmDevice Operator is consumed by GemmOperation[3xBase]. In the past, I have found some of the values in this data structure incorrect and sometimes just completely missing.

For e.g. given the kernel by its full procedural name = cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_bf16_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_cooperative_epi_tma.

This kernel has the following functional and performance attribute:

Functional Attribute

  • dtypeA_dtypeB_dtypeAccumulation_dtypeC_dtypeD : e4m3_e4m3_f32_bf16_bf16
  • RowMajor_ColumnMajor_ColumnMajor : tnn

Performance Attribute

  • Instruction Shape : 64x128x32
  • Threadblock Shape : 128x128x128
  • Cluster Shape : 1x2x1
  • Mainloop Kind : warpspecialized_cooperative // This is missing from GemmDescription
  • Epilogue Kind : epi_tma // This is missing from GemmDescription

I have added a test so someone at NVIDIA can start on this. Can you please uncomment the two lines, add whatever is needed to fix this?

You can follow any other enum that is lifted up to GemmDevice Operator from internal templates and used to set the data members of GemmDescription class.

We can then commit this test and add more for Hopper and make sure this class is also covered for Blackwell. The tests are CPU-only and should not take too much time in the CI, this will allow us to catch bugs like this one.

@manishucsd
Copy link
Contributor Author

@hwu36, can we assign this to be fixed sometime in the next release. Thank you!

@hwu36
Copy link
Collaborator

hwu36 commented Jan 31, 2025

@itramble

@manishucsd
Copy link
Contributor Author

Here is the commit that shows what I am thinking of adding to solve this. Things to do:

  • Make sure that KernelSchedule tag is available CUTLASS GemmDevice Operator
  • For all the kernel compile-time info encoded in this KernelSchedule Tag for Hopper here needs to be unpacked into runtime enums like this.
  • Also do the same for Blackwell kernel schedules.
  • At the minimum, anything that is in kernel name should be reflected in the library enum types for us to be able to select the right kernel at the runtime. So check the kernel name for existing and new kernels in the cutlass library and check if something is missing from tools/library/include/cutlass/library/types.h

This is just the starting code, I had in my mind. If you have a better way to solve this, please choose that. I won't be working on it after posting this, but looking forward to see it solved soon :).

cc: @itramble, @hwu36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants