[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

LeiWang1999 · 2024-12-12T16:58:09Z

Currently, when we write a set of nested loops to ensure 16-byte vectorized access, the code might look like this:

for i in range(1):
    for v_3 in T.vectorized(16):
        B_shared[tx // 16, tx % 16 // 8, tx % 8 * 2 + v_3 // 8, v_3 % 8] = B[bx * 8 + tx // 16, ko * 2 + tx % 16 // 8, tx % 8 * 2 + v_3 // 8, v_3 % 8]

However, our current legalization pass transforms this into the following form:

for i, v_3 in T.grid(1, 2):
    for vec in T.vectorized(8):
        B_shared[tx // 16, tx % 16 // 8, tx % 8 * 2 + (v_3 * 8 + vec) // 8, (v_3 * 8 + vec) % 8] = B[bx * 8 + tx // 16, ko * 2 + tx % 16 // 8, tx % 8 * 2 + (v_3 * 8 + vec) // 8, (v_3 * 8 + vec) % 8]

While this transformation achieves functional correctness, it introduces additional complexity in the indexing expressions and splits the vectorized loop into smaller chunks (e.g., breaking the 16-element vectorized access into two 8-element accesses). This reduces the efficiency of vectorized memory operations and complicates the generated code.

Proposed Enhancement:
To address this, the legalization pass should be enhanced to maintain the original vectorized structure and ensure that the indexing expressions remain as simple as possible. Specifically:
1. Preserve Single-Level Vectorization: Instead of breaking the 16-element vectorized loop into smaller subloops (e.g., two 8-element loops), the pass should retain the original T.vectorized(16) loop where possible.
2. Simplify Index Calculations: The pass should avoid introducing complex expressions like (v_3 * 8 + vec) for computing indices. Instead, it should aim to directly map the v_3 indices to the original structure (e.g., v_3 // 8 and v_3 % 8).
3. Optimize Performance: By preserving the larger vectorized loop and avoiding unnecessary transformations, the pass can generate more efficient, hardware-friendly code that takes better advantage of vectorized memory access.

LeiWang1999 · 2024-12-16T07:38:39Z

closed as has been implemented by pr #268

LeiWang1999 mentioned this issue Dec 16, 2024

[TileLang][Dev] Enhance Layout Inference Pass to infer with complex parallel primitives #268

Merged

LeiWang1999 self-assigned this Dec 16, 2024

LeiWang1999 added the enhancement New feature or request label Dec 16, 2024

LeiWang1999 closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

LeiWang1999 commented Dec 12, 2024

LeiWang1999 commented Dec 16, 2024 •

edited

Loading

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

Comments

LeiWang1999 commented Dec 12, 2024

LeiWang1999 commented Dec 16, 2024 • edited Loading

LeiWang1999 commented Dec 16, 2024 •

edited

Loading