Enable TileAndFuse pipeline for non-intrinsic sized GEMM shapes #18858

nirvedhmeshram · 2024-10-21T17:35:45Z

This is a tracking issue for all the pieces needed for switching to using TileAndFuse for non-instrinsic sized GEMM shapes. Prototype branch is provided here https://github.com/nirvedhmeshram/iree/tree/bmm_tileandfuse_2

bailout logic in GPUReduceBankConflict pass for collapse shape users:
Currently we can crash in this pass because of this upstream issue. [mlir][memref] Collapse on strided memref is conservative llvm/llvm-project#112994. We will bailout for collapse users to avoid this.
Land Pad support when promoting
We have a prototype commit for padding here but it needs to be refactored/improved in the following ways
nirvedhmeshram@3fc1628
Make the padding part of tileandfuse config rather then generating it on the fly. Also need to handle support of acc type matmuls. as currently the padding generated for it is not getting tiled see dump here
Simplify the promotion logic for C matrix and support both accumulate and non-accumlate type Gemms.
Go back to previous pass order before [LLVMGPU] Use forall workgroup distribution in TileAndFuse pipeline #18565 where we would distribute to workgroups and then do promotion. Currently we end up with copies that dont get distributed with this change. This was done as a workaround for another bug related to dps conversion. This will require additional logic in convert to dps pass which should solve both issues. cc @Max191
Fix barrier placement when there is a result writeback with a different thread distribution.
Currently we generate IR like this for cases with padding after GPUGreedilyDistributeToThreadsPass
https://gist.github.com/nirvedhmeshram/e3b8260fe3d81e2ae6fd928fd4297b28
The problem is that iree_gpu.multi_mma is not in a barrier region and there is a thread distributed write back loop following it that needs to happen after all mma ops in the workgroup are finished. Current thought is that we can insert barrier here
Edit : We realized that barrier insertion after mfma was not necessary and the race we were seeing was most likely due to backend compiler not satisfying the latency constraints of mfma. This issue is fixed by backend and we did not need to write any new logic for this.
Make sure we have functionality parity with SIMT pipeline and performance parity/improvements with the VectorDistribute/PadVectorDistribute pipeline.
We have some nice to have feature requests for easily testing such changes.
Add a batch matmul suite nod-ai/iree-kernel-benchmark#25
Adapt scripts to save lowering configs as part of the result csv nod-ai/iree-kernel-benchmark#26
Turn on the pipeline by default in IREE.

The text was updated successfully, but these errors were encountered:

…er (#18863) This is unsupported by upstream and can lead to a compiler error. llvm/llvm-project#112994 Progress towards: #18858 --------- Signed-off-by: Nirvedh <[email protected]>

…er (#18863) This is unsupported by upstream and can lead to a compiler error. llvm/llvm-project#112994 Progress towards: #18858 --------- Signed-off-by: Nirvedh <[email protected]> Signed-off-by: Elias Joseph <[email protected]>

nirvedhmeshram mentioned this issue Oct 21, 2024

[GPU] Bail out in GPUReduceBankConflicts if we have collapse_shape user #18863

Merged

jerryyin mentioned this issue Nov 5, 2024

'func.func' op uses 873872 bytes of shared memory; exceeded the limit of 65536 bytes using LLVMGPUSIMT #18905

Open

nirvedhmeshram mentioned this issue Nov 12, 2024

Pipeline convergence and default coverage improvement tasks for ROCM backend #19121

Open

jerryyin mentioned this issue Nov 22, 2024

Peel away conditional check from TileAndFuse padding kernel inner K loop #19276

Open

MaheshRavishankar assigned nirvedhmeshram Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable TileAndFuse pipeline for non-intrinsic sized GEMM shapes #18858

Enable TileAndFuse pipeline for non-intrinsic sized GEMM shapes #18858

nirvedhmeshram commented Oct 21, 2024 •

edited

Loading

Enable TileAndFuse pipeline for non-intrinsic sized GEMM shapes #18858

Enable TileAndFuse pipeline for non-intrinsic sized GEMM shapes #18858

Comments

nirvedhmeshram commented Oct 21, 2024 • edited Loading

nirvedhmeshram commented Oct 21, 2024 •

edited

Loading