[CK_BUILDER]ckb add remining fwd conv device ops #3155

JH-Leon-KIM-AMD · 2025-11-04T15:01:09Z

Proposed changes

This PR adds factory support for the remaining forward convolution device operations in CK Builder.

Jira Ticket: https://amd-hub.atlassian.net/jira/software/c/projects/ALMIOPEN/boards/319/backlog?selectedIssue=ALMIOPEN-350

Task 350 - Add remaining forward convolution device operations:

DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
- Added DL (Direct Load) factory specialization for NHWC layout-specific convolution
- Created new DL-specific algorithm descriptor with 30 template parameters
- Added test helper function and 3 test cases covering DEFAULT and FILTER_1X1_PAD0 specializations
- All tests passing (15/15 builder tests)
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Added Large_Tensor factory specialization for N-dimension splitting (large-than-memory tensors)
- Implemented macro collision workaround using pragma push/pop for GridwiseGemmTemplateParameters
- Reuses existing XDL algorithm descriptor (42 identical template parameters)
- Added test helper function and 2 test cases covering DEFAULT and FILTER_1X1_PAD0 specializations
- All tests passing (15/15 builder tests)

This completes Task 350 - all 4 forward convolution device operations are now supported in CK Builder:

✅ DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle (by Ville)
✅ DeviceGroupedConvFwdMultipleD_Wmma_CShuffle (by Ville)
✅ DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK (this PR)
✅ DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor (this PR)

Checklist

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

Design Decisions

1. DL Algorithm Descriptor (30 parameters)

DL uses VALU instructions instead of XDL matrix cores, requiring different parameter structure
Fixed NHWC_KYXC_NHWK layout only (not flexible like XDL)
Created separate descriptor type: ConvAlgorithm_DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
New DL-specific concepts: DlThreadConfigDescriptor, DlThreadClusterDescriptor, DlBlockTransferK0M0M1K1Descriptor, etc.

2. Large_Tensor Descriptor Reuse (42 parameters)

Large_Tensor has identical template parameters to regular XDL CShuffle
Only difference: internal SplitN=true flag in device operation (not exposed in factory interface)
Reuses existing descriptor: ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
No new descriptor or concepts needed

3. Macro Collision Workaround

Both device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp (line 41) and device_grouped_conv_fwd_multiple_d_xdl_large_tensor_cshuffle.hpp (line 51) define GridwiseGemmTemplateParameters macro without #undef
Used #pragma push_macro/#pragma pop_macro to isolate Large_Tensor header's macro scope
This may need to fix on CK headers

…ts and device operations.

…ect load template parameter in CK.

…ralize-conv-factory

…e-ops

- Added 5 DL descriptor structs (39 configurable parameters) - Added 10 C++20 concepts for type-safe validation - Updated factory to read all parameters from descriptors - Updated test helper to populate all descriptors - All tests passing (13/13 including 3 new DL tests)

…huffle_Large_Tensor - Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265) - Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51) - Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor - Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases - Update CMakeLists.txt to include new test file - Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor - Map all 42 template parameters identical to regular XDL CShuffle - All 15 builder tests passing including 2 new Large_Tensor tests Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.

…e-ops

vpietila-amd

Looks good to me. There's a PR #3154 that will reduce the code duplication in the test code. If that PR goes in first, we need to do the same changes for this PR too. But they are relatively straightforward changes. Once this PR is merged in the develop branch, I'll do another refactoring to remove the explicit device op flag from the signature.

vpietila-amd · 2025-11-05T09:31:41Z

experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp

@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.


Check that the Copyright follows the convention introduced in PR #3150. Same for the other new test file.

- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates. - Reorder headers: Copyright first, then SPDX-License-Identifier - Updated files: * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp * experimental/builder/include/ck_tile/builder/device_op_types.hpp

…e-ops

vpietila-amd and others added 13 commits October 29, 2025 13:33

Add device operation to conv signature. Use unions to hold conv layou…

e5eb4ed

…ts and device operations.

Add predicates for all device op instances.

74ba32e

Use the device op signature for validation.

fbdded6

Merge branch 'develop' into vpietila/ckb-generalize-conv-factory

7b14dde

Fix ckb CMakeLists.txt file for tests.

ee13982

Fix building CK Builder instance traits after the introduction of dir…

28e0d5f

…ect load template parameter in CK.

Merge branch 'vpietila/fix-building-ckb-tests' into vpietila/ckb-gene…

e129843

…ralize-conv-factory

Fix clang-formatting.

c8eac6f

Merge branch 'develop' into jeonghyun/ckb-add-remining-fwd-conv-devic…

a40985c

…e-ops

add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk

0b83a57

Merge branch 'develop' into jeonghyun/ckb-add-remining-fwd-conv-devic…

6cb30a9

…e-ops

JH-Leon-KIM-AMD marked this pull request as ready for review November 5, 2025 08:36

JH-Leon-KIM-AMD requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners November 5, 2025 08:36

vpietila-amd previously approved these changes Nov 5, 2025

View reviewed changes

JH-Leon-KIM-AMD changed the title ~~Jeonghyun/ckb add remining fwd conv device ops~~ [CK_BUILDER]ckb add remining fwd conv device ops Nov 6, 2025

JH-Leon-KIM-AMD dismissed vpietila-amd’s stale review via 43f104a November 6, 2025 08:28

JH-Leon-KIM-AMD added 3 commits November 6, 2025 14:30

Merge branch 'develop' into jeonghyun/ckb-add-remining-fwd-conv-devic…

14f9e1a

…e-ops

fix c++ 18 format

9e334ba

Fix clang-format-18 error in device_op_types.hpp

6a230ae

shumway approved these changes Nov 7, 2025

View reviewed changes

shumway merged commit 5f3cae3 into develop Nov 7, 2025
49 checks passed

shumway deleted the jeonghyun/ckb-add-remining-fwd-conv-device-ops branch November 7, 2025 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_BUILDER]ckb add remining fwd conv device ops #3155

[CK_BUILDER]ckb add remining fwd conv device ops #3155

Uh oh!

JH-Leon-KIM-AMD commented Nov 4, 2025 •

edited

Loading

Uh oh!

vpietila-amd left a comment

Uh oh!

vpietila-amd Nov 5, 2025

Uh oh!

JH-Leon-KIM-AMD Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,69 @@
		// SPDX-License-Identifier: MIT
		// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.

[CK_BUILDER]ckb add remining fwd conv device ops #3155

[CK_BUILDER]ckb add remining fwd conv device ops #3155

Uh oh!

Conversation

JH-Leon-KIM-AMD commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Design Decisions

Uh oh!

vpietila-amd left a comment

Choose a reason for hiding this comment

Uh oh!

vpietila-amd Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

JH-Leon-KIM-AMD Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JH-Leon-KIM-AMD commented Nov 4, 2025 •

edited

Loading