[CPU]Mac build for triton-cpu #18

Kuigesi · 2024-06-10T18:05:17Z

Make triton-cpu buildable and runnable on Mac M1, use llvm-config to locate the llvm headers and libraries.

minjang · 2024-06-10T18:20:33Z

third_party/cpu/backend/driver.py

-    os.path.join(llvm_root, "include"),
+# Note: need to use custom llvm build for mac arm, 
+#       the default llvm build downloaded by triton does not work
+#       need to set LLVM_CONFIG environment variable before running


Can you add a quick example of an actual value for LLVM_CONFIG?

Okay, Ruiqi has a nice README for Mac installation. You should put it this file and link from README.md.

When running [convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924) Triton ends up computing a rank of a matrix with 0 columns during linear layout lowering, which trips up f2reduce, and causes undefined behavior, detectable through [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Fix this by returning the rank (0) early in these cases, without calling f2reduce. <details><summary>Stack trace</summary> <p> ``` third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long' #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 #1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9 #2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3 #3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7 #4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41 #5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51 #6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14 #7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8 #8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19 #9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24 #10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7 #11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5 #14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9 #15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10 #16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12 #17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16 #18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5 #19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5 #20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3 #21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26 #22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7 #23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7 #24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5 #25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22 ... UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 ``` </p> </details>

Add header for unique_ptr in CPU launcher.

minjang

LGTM!

…triton-lang#18) Add header for unique_ptr in CPU launcher.

…#18) Add header for unique_ptr in CPU launcher.

Adds experimental rewrite collapsing reduction loop over GEMM into a BRGEMM ukernel. The pattern matches the hand-written kernel using block pointers and is not compatible with IR generated by triton pointer raising. Direct lowering to XSMM allows to bypass triton load restriction when K dimension is not power-of-two. The pattern is quite brittle but functional for the matmul tutorial example. The rewriting is disable by default and can be enabled with environment variable: TRITON_CPU_LOOP_BRGEMM_XSMM=1

…#18) Add header for unique_ptr in CPU launcher.

Kuigesi requested a review from ptillet as a code owner June 10, 2024 18:05

Kuigesi marked this pull request as draft June 10, 2024 18:08

minjang self-requested a review June 10, 2024 18:09

minjang marked this pull request as ready for review June 10, 2024 18:19

minjang reviewed Jun 10, 2024

View reviewed changes

ienkovich mentioned this pull request Jun 20, 2024

[CPU] Use static compilation for kernels. #29

Merged

minjang force-pushed the main branch from e915162 to 3cfb1e5 Compare June 24, 2024 08:07

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1.

b5bdf50

Add header for unique_ptr in CPU launcher.

Kuigesi force-pushed the mac-build branch from 1595abd to b5bdf50 Compare June 24, 2024 21:17

minjang approved these changes Jun 25, 2024

View reviewed changes

minjang merged commit 4a1171a into triton-lang:main Jun 25, 2024
2 of 5 checks passed

vivekvpandya mentioned this pull request Aug 9, 2024

OptimizeMask pass fails on following input #105

Closed

Devjiu pushed a commit to Devjiu/triton-cpu that referenced this pull request Aug 13, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

d7e8675

…triton-lang#18) Add header for unique_ptr in CPU launcher.

int3 pushed a commit that referenced this pull request Aug 29, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

5ba8381

…#18) Add header for unique_ptr in CPU launcher.

minjang pushed a commit that referenced this pull request Sep 22, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

13fd2b6

…#18) Add header for unique_ptr in CPU launcher.

minjang pushed a commit that referenced this pull request Oct 22, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

1fcfd14

…#18) Add header for unique_ptr in CPU launcher.

minjang pushed a commit that referenced this pull request Oct 24, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

6a68734

…#18) Add header for unique_ptr in CPU launcher.

int3 pushed a commit that referenced this pull request Dec 6, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

44ceed5

…#18) Add header for unique_ptr in CPU launcher.

ienkovich pushed a commit that referenced this pull request Dec 6, 2024

[BACKEND][CPU] Make the CPU backend buildable and runnable in Mac M1. (…

a41adf4

…#18) Add header for unique_ptr in CPU launcher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU]Mac build for triton-cpu #18

[CPU]Mac build for triton-cpu #18

Kuigesi commented Jun 10, 2024

minjang Jun 10, 2024

minjang Jun 10, 2024

minjang left a comment

[CPU]Mac build for triton-cpu #18

[CPU]Mac build for triton-cpu #18

Conversation

Kuigesi commented Jun 10, 2024

minjang Jun 10, 2024

Choose a reason for hiding this comment

minjang Jun 10, 2024

Choose a reason for hiding this comment

minjang left a comment

Choose a reason for hiding this comment