Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Cutlass 3.6 #169

Merged
merged 56 commits into from
Dec 5, 2024
Merged

Conversation

aacostadiaz
Copy link
Collaborator

This PR merges the changes from cutlass version 3.6

mhoemmen and others added 30 commits August 5, 2024 14:28
* Fix unrelated MSVC build warnings

* Fix use of isnan in functional.h

Correct namespace qualification of isnan in functional.h
so that it invokes cutlass::isnan for half_t, instead of
converting half_t to float and invoking std::isnan (on host,
or ::isnan on device).
Without this I get compilation error when the extended shapes are enabled
* Add couple configs into generator.py for mixed input MM

* change one unit test name; reenable 128x32 in the profiler

* Added U8/BF16 tests.

---------

Co-authored-by: Haicheng Wu <[email protected]>
Co-authored-by: Haicheng Wu <[email protected]>
…IA#1700)

* Query pfn to driver api

* use default for older toolkits

---------

Co-authored-by: shunfans <[email protected]>
* Add support for mixed 4-bit/8-bit data types GEMM

* fix ( and )

---------

Co-authored-by: Aleksandar Samardžić <[email protected]>
Co-authored-by: Haicheng Wu <[email protected]>
)

This is useful for e.g. function taking in 2 float inputs and turn them to complex
* add print_svg for mma

* correct the code indentation
yzhaiustc and others added 25 commits October 9, 2024 15:33
* v3.6

* update changelog

* update readme

* fix typo

* fixing typos

* hopper gemm with weight prefetch

---------

Co-authored-by: yuzhai <[email protected]>
Co-authored-by: Haicheng Wu <[email protected]>
* Fix README

* Improve README

---------

Co-authored-by: Haicheng Wu <[email protected]>
* Include of regular_tile_iterator.h fixed for NVRTC

* More include fixed for NVRTC
…s/gemm/device/gemm_universal.h" (NVIDIA#1569)

fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`
…_Traits support (NVIDIA#1856)

* fix wrong A/BLayout in  MMA_Traits<SM80_16x8x256_S32U1U1S32_TN_XORPOPC> and append support for  m8n8k128, m16n8k128  mma.and.popc in MMA_Traits instantiation

* add "print" template for  subbyte_reference<T>
…rs (NVIDIA#1931)

* move two warpgroup_wait

* merge main

---------

Co-authored-by: Siyuan Fu <[email protected]>
* Fix `cutlass` python library with cuda `12.6.2.post1`

Previously we had this error:
```
  File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp>
    _version_splits = [int(x) for x in __version__.split("rc")[0].split(".")]
                       ^^^^^^
ValueError: invalid literal for int() with base 10: 'post1'
```

* Update sm90_utils.py

* Update generator.py

* Update python/cutlass_library/generator.py

Co-authored-by: Jack Kosaian <[email protected]>

* Update python/cutlass_library/sm90_utils.py

Co-authored-by: Jack Kosaian <[email protected]>

---------

Co-authored-by: Jack Kosaian <[email protected]>
---------

Co-authored-by: Joe Todd <[email protected]>
# Conflicts:
#	examples/CMakeLists.txt
#	examples/cute/tutorial/tiled_copy_sycl.cpp
#	include/cute/arch/copy_sm90_desc.hpp
#	include/cute/arch/util.hpp
#	include/cute/atom/mma_traits.hpp
#	include/cute/numeric/numeric_types.hpp
#	include/cutlass/arch/barrier.h
#	include/cutlass/epilogue/collective/collective_epilogue.hpp
#	include/cutlass/epilogue/fusion/xe_callbacks.hpp
#	include/cutlass/gemm/collective/collective_builder.hpp
#	include/cutlass/gemm/device/gemm.h
#	include/cutlass/gemm/device/gemm_universal_adapter.h
#	include/cutlass/gemm/dispatch_policy.hpp
#	include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp
#	include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp
#	include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp
#	include/cutlass/gpu_generics.h
#	include/cutlass/platform/platform.h
#	test/unit/gemm/device/gemm_testbed_3x.hpp
#	tools/library/CMakeLists.txt
#	tools/util/include/cutlass/util/device_memory.h
@aacostadiaz
Copy link
Collaborator Author

Changes originally approved in #162

@aacostadiaz aacostadiaz merged commit d49319f into codeplaysoftware:sycl-develop Dec 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.