-
Notifications
You must be signed in to change notification settings - Fork 249
[CK_BUILDER]ckb add remining fwd conv device ops #3155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CK_BUILDER]ckb add remining fwd conv device ops #3155
Conversation
…ts and device operations.
…ect load template parameter in CK.
…ralize-conv-factory
- Added 5 DL descriptor structs (39 configurable parameters) - Added 10 C++20 concepts for type-safe validation - Updated factory to read all parameters from descriptors - Updated test helper to populate all descriptors - All tests passing (13/13 including 3 new DL tests)
…huffle_Large_Tensor - Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265) - Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51) - Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor - Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases - Update CMakeLists.txt to include new test file - Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor - Map all 42 template parameters identical to regular XDL CShuffle - All 15 builder tests passing including 2 new Large_Tensor tests Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.
vpietila-amd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. There's a PR #3154 that will reduce the code duplication in the test code. If that PR goes in first, we need to do the same changes for this PR too. But they are relatively straightforward changes. Once this PR is merged in the develop branch, I'll do another refactoring to remove the explicit device op flag from the signature.
| @@ -0,0 +1,69 @@ | |||
| // SPDX-License-Identifier: MIT | |||
| // Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that the Copyright follows the convention introduced in PR #3150. Same for the other new test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates. - Reorder headers: Copyright first, then SPDX-License-Identifier - Updated files: * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp * experimental/builder/include/ck_tile/builder/device_op_types.hpp
Proposed changes
This PR adds factory support for the remaining forward convolution device operations in CK Builder.
Jira Ticket: https://amd-hub.atlassian.net/jira/software/c/projects/ALMIOPEN/boards/319/backlog?selectedIssue=ALMIOPEN-350
Task 350 - Add remaining forward convolution device operations:
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
This completes Task 350 - all 4 forward convolution device operations are now supported in CK Builder:
Checklist
clang-formaton all changed filesDiscussion
Design Decisions
1. DL Algorithm Descriptor (30 parameters)
ConvAlgorithm_DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWKDlThreadConfigDescriptor,DlThreadClusterDescriptor,DlBlockTransferK0M0M1K1Descriptor, etc.2. Large_Tensor Descriptor Reuse (42 parameters)
SplitN=trueflag in device operation (not exposed in factory interface)ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle3. Macro Collision Workaround
device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp(line 41) anddevice_grouped_conv_fwd_multiple_d_xdl_large_tensor_cshuffle.hpp(line 51) defineGridwiseGemmTemplateParametersmacro without#undef#pragma push_macro/#pragma pop_macroto isolate Large_Tensor header's macro scope