Ck tile batched contraction kernel generelizing #3126

msaffari-amd · 2025-10-30T09:32:04Z

Proposed changes

Extends ck-tile batched contraction kernel to support arbitrary multi-dimensional non-contiguous tensor layouts using descriptors.
Extends example to cover testing this new added feature, user can pass any manual stride for input and outputs.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…aware calculation and some code cleanings

…lizing

… tensor layouts

…ontraction

…s batched contraction inputs

…sional stride support

…ction, num_d = 0

…_tensor_view to local RunGemm

…lizing

Copilot

Pull Request Overview

This PR adds comprehensive support for arbitrary multi-dimensional stride patterns and non-contiguous tensor layouts to the batched contraction kernel. Previously, the kernel only supported contiguous row-major layouts with hardcoded strides.

Introduces TensorDescriptorUtils with vectorization support to create stride-aware tensor descriptors
Implements custom RunGemm method using descriptor-based tensor views instead of relying solely on UniversalGemmKernel
Updates reference implementation to use stride-aware indexing for validation
Adds command-line support for custom stride specifications in examples

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tensor_descriptor_utils.hpp	Adds vector size template parameters and updates descriptor creation to support vectorized memory access
batched_contraction_kernel.hpp	Implements descriptor-based architecture with custom RunGemm, adds tensor descriptor storage to kernel args
reference_batched_contraction.hpp	Refactors reference computation to use stride-aware offset calculation, changes from std::vector to std::array for D tensors
run_batched_contraction_example.inc	Adds custom stride parsing, implements runtime dispatch for NumDTensor, creates tensors with non-contiguous layouts
contraction_utils.hpp	Updates argument parsing to support stride specifications and adds help documentation
batched_contraction.cpp	Updates dimension case handling (appears to have duplicate case)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-04T09:53:35Z

example/ck_tile/41_batched_contraction/batched_contraction.cpp

    HANDLE_CASE(2, 2, 2, 1);
    HANDLE_CASE(1, 2, 1, 1);
-    HANDLE_CASE(1, 1, 1, 2);
+    HANDLE_CASE(2, 1, 1, 1);


Duplicate case HANDLE_CASE(2, 1, 1, 1) at lines 219 and 222. The second occurrence at line 222 appears to replace a removed case for (1, 1, 1, 2), which may be intentional removal but the duplicate is incorrect. Remove line 222 or replace it with the intended dimension combination.

Suggested change

HANDLE_CASE(2, 1, 1, 1);

Copilot · 2025-11-04T09:53:36Z

include/ck_tile/host/reference/reference_batched_contraction.hpp

+
+        // Decode G dimensions
+        ck_tile::index_t temp = g_flat;
+        for(ck_tile::index_t i = num_g_dims - 1; i >= 0; --i)


Loop condition i >= 0 with unsigned type ck_tile::index_t will always be true, causing infinite loop when i underflows. This pattern appears in multiple offset computation lambdas (lines 109, 117, 125, 141, 149, 157, 173, 181, 189, 208, 216, 224). Change loop to use int i or rewrite to avoid decrementing below zero.

msaffari-amd added 14 commits October 15, 2025 14:09

Add help for example

356b50f

Refactore the compute reference batched contraction to manage stride-…

b161cd9

…aware calculation and some code cleanings

Merge branch 'develop' into ck_tile_batched_contraction_kernel_genere…

553c05e

…lizing

Add stride-aware reference for batched contraction with independent D…

4027a92

… tensor layouts

Add -num_d argument for runtime D tensor count selection in batched c…

9fc1a8c

…ontraction

Add stride vector arguments in example code for testing non-contiguou…

fec8332

…s batched contraction inputs

Add descriptor-based architecture for batched contraction multi-dimen…

2ecb0bf

…sional stride support

Add multi-dimensional non-contiguous stride support to batched contra…

b8b56d5

…ction, num_d = 0

Add complete multi-dimensional stride support via descriptors

bbfe450

Enable vectorization in descriptor-based batched contraction. Add pad…

6144f5c

…_tensor_view to local RunGemm

Clean up batched contraction: remove old UniversalGemmKernel path

4883883

merge develop

670409c

Clean up batched contraction: remove legacy paths and finalize docs

e7f5f0b

Optimize batched contraction example: pass dimension sizes not vectors

0eb1b55

msaffari-amd requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners October 30, 2025 09:32

msaffari-amd added 2 commits October 30, 2025 10:38

Merge branch 'develop' into ck_tile_batched_contraction_kernel_genere…

11514c7

…lizing

Merge branch 'develop' into ck_tile_batched_contraction_kernel_genere…

c3857ee

…lizing

aosewski requested a review from Copilot November 4, 2025 09:51

Copilot AI reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ck tile batched contraction kernel generelizing #3126

Ck tile batched contraction kernel generelizing #3126

msaffari-amd commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 4, 2025

Uh oh!

Copilot AI Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ck tile batched contraction kernel generelizing #3126

Are you sure you want to change the base?

Ck tile batched contraction kernel generelizing #3126

Conversation

msaffari-amd commented Oct 30, 2025

Proposed changes

Checklist

Discussion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants