Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support blocked dot operand layout conversion to linear layout #5423

Closed
binarman opened this issue Dec 13, 2024 · 8 comments · Fixed by #5469
Closed

Support blocked dot operand layout conversion to linear layout #5423

binarman opened this issue Dec 13, 2024 · 8 comments · Fixed by #5469

Comments

@binarman
Copy link
Contributor

Goal

Support FMA/Blocked dot operand layout in linear layout converter.

DoD

Implemented LinearLayout converter and related tests are implemented and passing, switch to Linear layout converter do not break python tests.

Existing Linear Layout converters for dot operands

Nvidia MMA dot operand
AMD MFMA dot operand

Both of these examples are fully functional, but I want to refactor MFMA and WMMA converters in similar to Nvidia MMA fashion soon. Please try to follow MMA style.

In progress PRs

WMMA dot operand

Legacy SharedLayout->dotOperand converter examples

Implementation details

  1. Implement linear converter similar to WMMA/MFMA (I am going to refactor these converter a little soon) in LinearLayoutConversions.cpp; Add appropriate call in DotOperandEncodingAttr::toLinearLayout function
  2. Add blocked layout in MemoryOpToLLVM.cpp:isSupportedDotOpLayout to enable shared->blocked dotOperand conversion with LL; check all test_core.py tests work.
  3. Enable LL converter in ConvertLayoutOpToLLVM.cpp:transferWithinBlock to enable ttg.convert_layout operation conversion with LL. Maybe you need to rework some other places as well, we will clarify this during implementation.
  4. Implement ctest tests in LinearLayoutConversionsTest.cpp and few lit tests to specifically verify ttg.convert_layout operation conversion
@binarman
Copy link
Contributor Author

+cc @simonidaa
please, consider this task next

@Jokeren
Copy link
Contributor

Jokeren commented Dec 13, 2024

+@lezcano Maybe you can offload this to AMD? Or there's something you've already done but not pushed yet?

@Jokeren
Copy link
Contributor

Jokeren commented Dec 13, 2024

@binarman btw, something related, we plan to completely abandon the getCSwizzleOffset method and use allocShape in memory descriptor to determine if prefetch has occurred.

@binarman
Copy link
Contributor Author

Thank you @Jokeren! this is good to know, I will consider removing getCSwizzleOffset from our converters.

@minjang
Copy link
Contributor

minjang commented Dec 13, 2024

Very good to hear for this work!

(For my experimental project, using LL for non-GPUs, yeah, this was a missing feature. I hacked badly just to make it passable for unit tests, but looking forward to seeing this support.)

@lezcano
Copy link
Contributor

lezcano commented Dec 13, 2024

I did not start with this, so feel free to take over.

@binarman
Copy link
Contributor Author

@lezcano told me this task is blocking him, so I am going to implement it asap,
@simonidaa I will look for something else, but still related to LL for you

@binarman
Copy link
Contributor Author

I've made a draft PR: #5469
Along the way I understood that current FMA implementation does not align with the rest of layouts, so I decided to rework it. Currently some tests are failing probably will finish this work tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants