Skip to content

[Draft] [BACKEND] Enhance the remove layout implementation to reduce the duplicated values with different layout in scf.for. #4527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chengjunlu
Copy link
Contributor

The layout propagation across the scf.for op in RemoveLayout is not implemented well for these aspects:

  1. There is not analysis on the cost model of using different layout for the operations. (Choosing different tiling pattern for Triton ops.). It only rely on the anchors in ad-hoc.
  2. It is not implemented well for ops with multiple results ops.
  3. It is not implemented well for ops with nested basic blocks.
  4. The remove layout doesn't support to propagate the layout through the scf.for ops.

With the limitations, the scf.for operation is the bottle neck of the efficient after the remove layout pass.
This is not issue on NV GPU because the NV GPU convert the layout convert operations to async.cp in software pipeline.

But it is an issue for Intel GPU. We rely on the remove layout to get a simple program with less convert layout operations.

Plan to enhance the remove layout to enhance the limitations of the remove layout.

  1. Refactor the implementation of remove layout to support ops with multiple results and nested basic blocks well.
  2. Support the propagate layout through the scf.for ops on demand.
  3. Add an cost model analysis pass to get an costs of the different tiling patterns across the kernel program.

This is an PR for CI.

@chengjunlu chengjunlu linked an issue Jun 18, 2025 that may be closed by this pull request
@chengjunlu chengjunlu force-pushed the chengjun/enhance_remove_layout branch from 486ed4a to f42bd66 Compare June 18, 2025 07:16
…d values with different layout in scf.for.

Signed-off-by: Lu,Chengjun <[email protected]>
@etiotto etiotto marked this pull request as draft June 18, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BACKEND] Enhance the remove layout for Intel GPU
1 participant