Changes for Project MLIR-TensorRT:

26523df3e94cc4de47a744e4e48621b74743dd00 by Sagar Shelke <[email protected]>: [compiler/lib/Conversion] Update `stablehlo.reduce_window` conversion pattern Previously, if `stablehlo.reduce_window` op body has `stablehlo.add` (i.e. reduce_window<add>), we always converted such reduce_window op to `tensorrt.pool` op with average pooling. We also inserted a multiplication operation to balance the fact that `reduce_window<add>` is being replaced with average pooling. This pattern did not consider a case where, if `stablehlo.divide` op is present after `reduce_window<add>` and its the only user of `reduce_window<add>`, it is truly average pooling. In this case, instead of inserting multiplication, we can consume next `stablehlo.divide` op into TensorRT average pooling. This MR updates conversion pattern to accommodate this and adds positive , negative MLIR tests. -- d6631d8be74994ae26c58f4afe9d087329a20985 by Christopher Bate <[email protected]>: Revert "NFC: fix some unused value warnings in Release build" This reverts commit 0a9df7a05ace412a423d72076ccde1380782852f. -- 0a9df7a05ace412a423d72076ccde1380782852f by Christopher Bate <[email protected]>: NFC: fix some unused value warnings in Release build -- 842b0171738aaf14327b6db74ace88733f3f62fe by Chris Bate <[email protected]>: [compiler] Add ability to export large binary data to external file resource This change updates the `cuda-to-llvm` and `tensorrt-runtime-to-llvm` passes in order to add an `artifacts-dir` option. When non-empty, this directory is used to offload binary blobs data otherwise would be encoded into the compiled program. Often the large encoded constants typically are passed directly to an externally defined function (e.g `mtrt_cuda_module_load_from_ptx`) using `llvm.addressof`, so in relevant conversions just add another variant `*_file` variant of the runtime call (e.g `mtrt_cuda_module_load_from_ptx_file`). This then lets the runtime figure out how to load the file data. The one other op where large data is often encoded is `memref.global`, which will be handled in a follow on change. -- c270c740d1032a40e7b34aeaac36c42b8b4298c0 by Chris Bate <[email protected]>: [compiler] Update cuda-to-llvm to create separate dtor/dtor for each global Create separate ctor/dtor for each global. This avoids issues if the pass is run multiple times and prevents empty functions from being created when no globals are created. -- 7d3c310f64b7a5c513c0435a75675003e32f0c58 by Chris Bate <[email protected]>: [compiler] Add promise of ConvertToLLVMPatternInterface to CUDADialect CUDA dialect was not promising `ConvertToLLVMPatternInterface`, which meant that `cuda-to-llvm` patterns were not being populated by `host-to-llvm`. Currently we explicitly run `cuda-to-llvm` prior to `host-to-llvm`, but in the future that will change. -- 7e2e7aaaace73842efdeb6d5e8002e3f7c76b4af by Chris Bate <[email protected]>: [cmake] Improve CMake package/install organization - Creates dedicated CMake install components for easier packaging - Disable upstream MLIR CMake install logic when invoking upstream MLIR cmake commands. -- d786bcdc4bc1c683eabc6a37a902ac7a75568bfe by Chris Bate <[email protected]>: [compiler] Enable end-to-end host LLVM JIT and C++ gen compile & execute test This change helps enables the first integration test which can be compiled and executed in three different ways: 1. compile to Executor IR -> Lua based interpreter 2. compile to LLVM IR -> LLVM JIT runner - We will need to expand the C support libraries for more CUDA, TensorRT, CuBlas, and NCCL module coverage equivalent to what we have for Lua. - In the Lua-based backend, we had the convenience of some auto-magical error reporting mechanisms that get added to the runtime support library functions (e.g. using `lua::this_state` argument). We need to craft something that is compatible with LLVM and EmitC pipelines because right now the C runtime support functions just abort on error, which isn't OK outside of simple integration tests. -- 8936f5b68e4d3efa646884c8ffee2ce6940c76b4 by Christopher Bate <[email protected]>: Cherry-pick upstream EmitC fixes -- fb672f3da68dbdcec4617bddbbaadc0d40b7f48d by Chris Bate <[email protected]>: NFC: [compiler] Untangle conversion passes header inclusion of NvInfer.h -- 2528dd25f80cb838d82181f6b9bc386d0e1bda6d by Chris Bate <[email protected]>: [python] Improve performance of compiler API tests - Migrate some compiler python API tests to the newer APIs. - Reuse existing compiler clients where possible - Don't use "tensorrt-builder-opt-level=3" unnecessarily. - Compiler python test performance is now dramatically improved. -- e1ebdced50ac27c808c4bd4488a3f2d7ee645983 by Chris Bate <[email protected]>: [compiler] Fix two issues in LLVM conversion utilities Fixes an issue where a utility may create a `SymbolTable` in the middle of a dialect conversion. This could cause an assertion to trigger since symbols may not be unique in the middle of a conversion. Additionally, `llvm.global` ops representing string literals now use StringAttr to hold their value. -- fd65b941673b49b6ff9a8b053b845dd1e051eb8c by Chris Bate <[email protected]>: [compiler] Fix some issues with CUDA lowerings - Adds fields to memory alloc/free to communicate whether a buffer is device/host_pinned/managed. -- 9a88722b7ca54fe59c280b70068c98a5a07784ce by Chris Bate <[email protected]>: [compiler] Add additional support for `cuda-to-llvm` Adds some of the major missing ops dealing with stream, device, and memory management to the `convert-cuda-to-llvm` pass. After this change, we have enough support to support runtime testing. -- 0df274addcd00aab6501ecd3c72d1e72008cbe3b by Christopher Bate <[email protected]>: [compiler] Add 'convert-tensorrt-runtime-to-llvm' pass Adds an pass that convers TensorRTRuntime dialect operations/types to LLVM dialect operations/types. -- ca94e04dfc5f0134a25b218ef6de15503a0636cf by Christopher Bate <[email protected]>: [compiler] Simplify workflow for lowering TensorRT engines Adds a global symbol op and corresponding load op to the TensorRTRuntime dialect. These operations represent the TensorRT engine binary and loading of the TensorRT engine into a runtime execution context. The conversions `tensorrt-to-tensorrt-runtime` and `tensorrt-runtime-to-executor` are updated/simplified, and this change helps to reduce complexity for the `tensorrt-runtime-to-llvm` change to be added. Two ops and a type can be dropped from the TensorRTRuntime dialect. -- 5db84d6a239a275b7c29f8c40635622deff29bb8 by Christopher Bate <[email protected]>: [compiler] Fix some issues in cuda-to-llvm conversion Fixes a couple issues in the CUDA-to-LLVM conversion. To make it easier to write similar patterns for other dialects, this change also introduces some utilities that are used to cleanup the code. They help to make it easier to create LLVM globals for string literals and other objects as well as help to ensure there are no symbol name conflicts with rewriting globals from one dialect to another. -- 28ad3e14eed4eb7112876fdf8f7a892532566dae by Chris Bate <[email protected]>: [compiler] Add aggregate "convert-host-to-llvm" conversion pass Adds an aggregate pass "convert-host-to-llvm" which convers the host program IR to LLVM. The purpose is to aggregate patterns and type conversions for CUDA, TensorRTRuntime, and various other upstream dialects in order to enable lowering to LLVM IR in a single pass. Additionally, this change provides ConvertToLLVMPatternInterface for the CUDA and Plan dialects so that they may hook into "convert-host-to-llvm". The CUDA dialect to LLVM conversion is further updated to correct the type of the stream (to pointer, not i8). -- e98df022081e3e648a67c160d329f668c99bdcac by Christopher Bate <[email protected]>: nfc: remove whitespace from cuda-to-llvm test -- bbefdbf03fcbd8a60c1e1c942faef3b67817d57a by Chris Bate <[email protected]>: [cmake] NFC: Remove 'MLIRTensorRTRegistration' library Previously we used a catch-all "registration" library to collect all dependencies of the "registerAllDialects|Passes" functions. However, this actually caused a subtle circular dependency with the StablehloToExecutable library which could manifest as a random build-time error due to headers not being generated in the correct order. This change removes that library and instead declares dependencies in a more fine-grained manner where required. -- cf3a76338ebe3e2c6770947bc8944dd4b6bb4e59 by Sagar Shelke <[email protected]>: Add integer support to `RemoveProdDivPair` canonicalize pattern This MR adds integer support to `RemoveProdDivPair` canonicalize pattern that removes pair of `kPROD` and `kDIV` ops if constant RHS in both multiply and division is 1. Positive and negative MLIR test is added. This fixes OSS issue #457 -- df381b5e747bb79842afdd62aed3cd2dff7fe564 by Christopher Bate <[email protected]>: NFC: fix test file location and pass dependencies Fixes a couple minor issues from f4959954b4daccd270323fa47867cbd12a62f97d. -- f4959954b4daccd270323fa47867cbd12a62f97d by Zixin Huang <[email protected]>: [compiler] New cuda-to-llvm pass This MR converts cuda dialect ops into llvm ops. This will allow us to generate LLVM IR for host code. -- 4343a6c60331de176eafab5fe4c91374e7d62a2e by Chris Bate <[email protected]>: [executor] Fix conversion of function signature metadata When a function lacks a 'executor.function_metadata' attribute, we should create a function signature just using the function's MLIR type. It will lack certain information (e.g. bounds information), but that is better than not serializing a signature at all. Certain methods like the associated runtime API's 'print' method for function flatbuffer objects were not handling the case where the signature could be null. -- 2d4cc749800887f7cf5580ab664a9a18a29f4d2f by Chris Bate <[email protected]>: [compiler] NFC: change 'enable-non-dps-returns' to 'force-entrypoints-return-allocs' in 'plan-alloc-tensors' In the 'plan-alloc-tensors' pass and related pipelines, we had an option previously named 'enable-non-dps-returns'. However, this doesn't accurately reflect the desired effect -- even if this option is off, some tensor results of entrypoint functions may be lowered into a returned allocations. If the shape is not computable from the input parameters, then the user cannot pre-allocate a result buffer, and therefore the tensor must be lowered into a returned allocation. -- 54319fa2fd1093bab1f4a16d85af5a19d6d9a6d3 by Chris Bate <[email protected]>: NFC: fix typo in RuntimeSession member function name -- d7a6e722c4e23e239ea32861ac542d4e943d40d3 by Chris Bate <[email protected]>: NFC: [executor] Fix typo in CAPI type name -- 497647f15587fa927748a66721c77d1df6f6089c by Chris Bate <[email protected]>: [compiler] Add support for non-DPS TensorRT call variants in `plan-outline-clusters` Adds support for outlining `plan.closed_alloc_group` regions targeting TensorRT in the `plan-outline-clusters` pass. Co-authored-by: Jhalak Patel <[email protected]> -- 247de07ad07a017c8ff9408083439488d4f0220d by Chris Bate <[email protected]>: [compiler] Add support for `tensorrt.call_alloc` in `plan-eliminate-shape-ops` Add support for shape-op and argument cleanup in `plan-eliminate-shape-ops` for the non-DPS TensorRT call variant `tensorrt.call_alloc`. Co-authored-by: Jhalak Patel <[email protected]> -- 11621ff176f2503d04a7eba5e59e79ada3560310 by Chris Bate <[email protected]>: [compiler] Fix incorrect conversion in `tensorrt-runtime-to-executor` Fix miscellaneous issues in the conversion of `trtrt.enqueue_alloc` to Executor IR. Previously, the offsets into the output descriptor were not being correctly calculated. Co-authored-by: Jhalak Patel <[email protected]> GitOrigin-RevId: 1151c1999d0aa77991637455673a8f4ba5dd8cf3
NVIDIA · Feb 4, 2025 · 86f11bd · 86f11bd
1 parent ecb8238
commit 86f11bd
Show file tree

Hide file tree

Showing 114 changed files with 7,925 additions and 1,214 deletions.
diff --git a/mlir-tensorrt/CMakeLists.txt b/mlir-tensorrt/CMakeLists.txt
@@ -123,15 +123,30 @@ if(PROJECT_IS_TOP_LEVEL)
     set(LLVM_RUNTIME_OUTPUT_INTDIR ${CMAKE_BINARY_DIR}/bin)
     set(LLVM_LIBRARY_OUTPUT_INTDIR ${CMAKE_BINARY_DIR}/lib)
   elseif(MLIR_TRT_LLVM_COMMIT)
+    set(patch_dir "${CMAKE_CURRENT_LIST_DIR}/build_tools/patches/mlir")
     mtrt_llvm_project(
       NAME llvm_project
       VERSION 0.0.20241126
       URL "https://github.com/llvm/llvm-project/archive/${MLIR_TRT_LLVM_COMMIT}.zip"
       EXCLUDE_FROM_ALL TRUE
       SOURCE_SUBDIR "llvm"
       PATCHES
-        "${CMAKE_CURRENT_LIST_DIR}/build_tools/patches/mlir/000_fix_bufferization_tensor_encoding_memory_spaces.patch"
-        "${CMAKE_CURRENT_LIST_DIR}/build_tools/patches/mlir/001-mlir-Add-a-null-pointer-check-in-symbol-lookup-11516.patch"
+        "${patch_dir}/000_fix_bufferization_tensor_encoding_memory_spaces.patch"
+        "${patch_dir}/001-mlir-Add-a-null-pointer-check-in-symbol-lookup-11516.patch"
+        "${patch_dir}/0003-mlir-EmitC-memref-to-emitc-insert-conversion_casts-1.patch"
+        "${patch_dir}/0004-NFC-mlir-emitc-fix-misspelling-in-description-of-emi.patch"
+        "${patch_dir}/0005-emitc-func-Set-default-dialect-to-emitc-116297.patch"
+        "${patch_dir}/0006-MLIR-EmitC-arith-to-emitc-Fix-lowering-of-fptoui-118.patch"
+        "${patch_dir}/0007-mlir-emitc-Add-support-for-C-API-python-binding-to-E.patch"
+        "${patch_dir}/0008-mlir-emitc-DCE-unimplemented-decls-121253.patch"
+        "${patch_dir}/0009-Re-introduce-Type-Conversion-on-EmitC-121476.patch"
+        "${patch_dir}/0010-mlir-emitc-Fix-invalid-syntax-in-example-of-emitc.re.patch"
+        "${patch_dir}/0011-mlir-emitc-Don-t-emit-extra-semicolon-after-bracket-.patch"
+        "${patch_dir}/0012-mlir-emitc-Expose-emitc-dialect-types-119645.patch"
+        "${patch_dir}/0013-mlir-emitc-Support-convert-arith.extf-and-arith.trun.patch"
+        "${patch_dir}/0014-EmitC-Allow-arrays-of-size-zero-123292.patch"
+        "${patch_dir}/0015-mlir-EmitC-Add-MathToEmitC-pass-for-math-function-lo.patch"
+        "${patch_dir}/0016-mlir-emitc-Set-default-dialect-to-emitc-in-ops-with-.patch"
       OPTIONS
         "LLVM_ENABLE_PROJECTS mlir"
         "MLIR_ENABLE_BINDINGS_PYTHON ${MLIR_TRT_ENABLE_PYTHON}"

diff --git a/mlir-tensorrt/CMakePresets.json b/mlir-tensorrt/CMakePresets.json
@@ -10,7 +10,8 @@
         "CMAKE_BUILD_TYPE": "RelWithDebInfo",
         "LLVM_ENABLE_ASSERTIONS": "ON",
         "CPM_SOURCE_CACHE": "${sourceDir}/.cache.cpm",
-        "CPM_USE_NAMED_CACHE_DIRECTORIES": "ON"
+        "CPM_USE_NAMED_CACHE_DIRECTORIES": "ON",
+        "LLVM_INSTALL_TOOLCHAIN_ONLY": "ON"
       }
     },
     {

diff --git a/.../build_tools/patches/mlir/0003-mlir-EmitC-memref-to-emitc-insert-conversion_casts-1.patch b/.../build_tools/patches/mlir/0003-mlir-EmitC-memref-to-emitc-insert-conversion_casts-1.patch
@@ -0,0 +1,94 @@
+From 32a1faae7c6bd8b3df1aefcce707c9aef1557eea Mon Sep 17 00:00:00 2001
+From: Simon Camphausen <[email protected]>
+Date: Wed, 30 Oct 2024 15:27:23 +0100
+Subject: [PATCH 03/16] [mlir][EmitC] memref-to-emitc: insert conversion_casts
+ (#114204)
+
+Add materializations to the conversion pass, such that types of
+non-converted operands are legalized.
+---
+ .../MemRefToEmitC/MemRefToEmitCPass.cpp       | 13 +++++++
+ .../MemRefToEmitC/memref-to-emitc.mlir        | 35 +++++++++++--------
+ 2 files changed, 34 insertions(+), 14 deletions(-)
+
+diff --git a/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitCPass.cpp b/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitCPass.cpp
+index 11bfde890bce..7f433254e95a 100644
+--- a/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitCPass.cpp
++++ b/mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitCPass.cpp
+@@ -40,6 +40,19 @@ struct ConvertMemRefToEmitCPass
+
+     populateMemRefToEmitCTypeConversion(converter);
+
++    auto materializeAsUnrealizedCast = [](OpBuilder &builder, Type resultType,
++                                          ValueRange inputs,
++                                          Location loc) -> Value {
++      if (inputs.size() != 1)
++        return Value();
++
++      return builder.create<UnrealizedConversionCastOp>(loc, resultType, inputs)
++          .getResult(0);
++    };
++
++    converter.addSourceMaterialization(materializeAsUnrealizedCast);
++    converter.addTargetMaterialization(materializeAsUnrealizedCast);
++
+     RewritePatternSet patterns(&getContext());
+     populateMemRefToEmitCConversionPatterns(patterns, converter);
+
+diff --git a/mlir/test/Conversion/MemRefToEmitC/memref-to-emitc.mlir b/mlir/test/Conversion/MemRefToEmitC/memref-to-emitc.mlir
+index f4722da08cc4..f5ef821cc9c0 100644
+--- a/mlir/test/Conversion/MemRefToEmitC/memref-to-emitc.mlir
++++ b/mlir/test/Conversion/MemRefToEmitC/memref-to-emitc.mlir
+@@ -1,28 +1,35 @@
+ // RUN: mlir-opt -convert-memref-to-emitc %s -split-input-file | FileCheck %s
+
+-// CHECK-LABEL: memref_store
+-// CHECK-SAME:  %[[v:.*]]: f32, %[[i:.*]]: index, %[[j:.*]]: index
+-func.func @memref_store(%v : f32, %i: index, %j: index) {
+-  // CHECK-NEXT: %[[ALLOCA:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.array<4x8xf32>
+-  %0 = memref.alloca() : memref<4x8xf32>
++// CHECK-LABEL: alloca()
++func.func @alloca() {
++  // CHECK-NEXT: %[[ALLOCA:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.array<2xf32>
++  %0 = memref.alloca() : memref<2xf32>
++  return
++}
+
+-  // CHECK-NEXT: %[[SUBSCRIPT:.*]] = emitc.subscript %[[ALLOCA]][%[[i]], %[[j]]] : (!emitc.array<4x8xf32>, index, index) -> !emitc.lvalue<f32>
++// -----
++
++// CHECK-LABEL: memref_store
++// CHECK-SAME:  %[[buff:.*]]: memref<4x8xf32>, %[[v:.*]]: f32, %[[i:.*]]: index, %[[j:.*]]: index
++func.func @memref_store(%buff : memref<4x8xf32>, %v : f32, %i: index, %j: index) {
++  // CHECK-NEXT: %[[BUFFER:.*]] = builtin.unrealized_conversion_cast %[[buff]] : memref<4x8xf32> to !emitc.array<4x8xf32>
++  
++  // CHECK-NEXT: %[[SUBSCRIPT:.*]] = emitc.subscript %[[BUFFER]][%[[i]], %[[j]]] : (!emitc.array<4x8xf32>, index, index) -> !emitc.lvalue<f32>
+   // CHECK-NEXT: emitc.assign %[[v]] : f32 to %[[SUBSCRIPT]] : <f32>
+-  memref.store %v, %0[%i, %j] : memref<4x8xf32>
++  memref.store %v, %buff[%i, %j] : memref<4x8xf32>
+   return
+ }
+
+ // -----
+
+ // CHECK-LABEL: memref_load
+-// CHECK-SAME:  %[[i:.*]]: index, %[[j:.*]]: index
+-func.func @memref_load(%i: index, %j: index) -> f32 {
+-  // CHECK-NEXT: %[[ALLOCA:.*]] = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.array<4x8xf32>
+-  %0 = memref.alloca() : memref<4x8xf32>
+-
+-  // CHECK-NEXT: %[[SUBSCRIPT:.*]] = emitc.subscript %[[ALLOCA]][%[[i]], %[[j]]] : (!emitc.array<4x8xf32>, index, index) -> !emitc.lvalue<f32>
++// CHECK-SAME:  %[[buff:.*]]: memref<4x8xf32>, %[[i:.*]]: index, %[[j:.*]]: index
++func.func @memref_load(%buff : memref<4x8xf32>, %i: index, %j: index) -> f32 {
++  // CHECK-NEXT: %[[BUFFER:.*]] = builtin.unrealized_conversion_cast %[[buff]] : memref<4x8xf32> to !emitc.array<4x8xf32>
++  
++  // CHECK-NEXT: %[[SUBSCRIPT:.*]] = emitc.subscript %[[BUFFER]][%[[i]], %[[j]]] : (!emitc.array<4x8xf32>, index, index) -> !emitc.lvalue<f32>
+   // CHECK-NEXT: %[[LOAD:.*]] = emitc.load %[[SUBSCRIPT]] : <f32>
+-  %1 = memref.load %0[%i, %j] : memref<4x8xf32>
++  %1 = memref.load %buff[%i, %j] : memref<4x8xf32>
+   // CHECK-NEXT: return %[[LOAD]] : f32
+   return %1 : f32
+ }
+-- 
+2.46.0
+
diff --git a/.../build_tools/patches/mlir/0004-NFC-mlir-emitc-fix-misspelling-in-description-of-emi.patch b/.../build_tools/patches/mlir/0004-NFC-mlir-emitc-fix-misspelling-in-description-of-emi.patch
@@ -0,0 +1,32 @@
+From 8ffa6d49dabf4eaef97cfaa1f8438c1cc4396b0b Mon Sep 17 00:00:00 2001
+From: Andrey Timonin <[email protected]>
+Date: Wed, 13 Nov 2024 14:17:00 +0300
+Subject: [PATCH 04/16] [NFC][mlir][emitc] fix misspelling in description of
+ emitc.global (#115548)
+
+Missing `!` before `emitc.global` was added in the `EmitC.td`.
+---
+ mlir/include/mlir/Dialect/EmitC/IR/EmitC.td | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+index 7c84ab4dd39e..071541fa9895 100644
+--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
++++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+@@ -1110,9 +1110,11 @@ def EmitC_GlobalOp : EmitC_Op<"global", [Symbol]> {
+
+     ```mlir
+     // Global variable with an initial value.
+-    emitc.global @x : emitc.array<2xf32> = dense<0.0, 2.0>
++    emitc.global @x : !emitc.array<2xf32> = dense<0.0>
++    // Global variable with an initial values.
++    emitc.global @x : !emitc.array<3xi32> = dense<[0, 1, 2]>
+     // External global variable
+-    emitc.global extern @x : emitc.array<2xf32>
++    emitc.global extern @x : !emitc.array<2xf32>
+     // Constant global variable with internal linkage
+     emitc.global static const @x : i32 = 0
+     ```
+-- 
+2.46.0
+
diff --git a/...nsorrt/build_tools/patches/mlir/0005-emitc-func-Set-default-dialect-to-emitc-116297.patch b/...nsorrt/build_tools/patches/mlir/0005-emitc-func-Set-default-dialect-to-emitc-116297.patch
@@ -0,0 +1,120 @@
+From 45846991cdda3c87d12fb8ad9578b7d73f086ca2 Mon Sep 17 00:00:00 2001
+From: Matthias Gehre <[email protected]>
+Date: Mon, 18 Nov 2024 17:26:21 +0100
+Subject: [PATCH 05/16] emitc: func: Set default dialect to 'emitc' (#116297)
+
+Makes `emitc.func` implement the `OpAsmOpInterface` and overwrite the
+`getDefaultDialect`. This allows ops inside `emitc.func`'s body to omit
+the 'emitc.' prefix in the assembly.
+---
+ mlir/include/mlir/Dialect/EmitC/IR/EmitC.td    | 12 +++++++++++-
+ .../Conversion/FuncToEmitC/func-to-emitc.mlir  | 18 +++++++++---------
+ 2 files changed, 20 insertions(+), 10 deletions(-)
+
+diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+index 071541fa9895..fc5a33541533 100644
+--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
++++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+@@ -21,6 +21,7 @@ include "mlir/Interfaces/CastInterfaces.td"
+ include "mlir/Interfaces/ControlFlowInterfaces.td"
+ include "mlir/Interfaces/FunctionInterfaces.td"
+ include "mlir/Interfaces/SideEffectInterfaces.td"
++include "mlir/IR/OpAsmInterface.td"
+ include "mlir/IR/RegionKindInterface.td"
+
+ //===----------------------------------------------------------------------===//
+@@ -632,7 +633,7 @@ def EmitC_DeclareFuncOp : EmitC_Op<"declare_func", [
+
+ def EmitC_FuncOp : EmitC_Op<"func", [
+   AutomaticAllocationScope,
+-  FunctionOpInterface, IsolatedFromAbove
++  FunctionOpInterface, IsolatedFromAbove, OpAsmOpInterface
+ ]> {
+   let summary = "An operation with a name containing a single `SSACFG` region";
+   let description = [{
+@@ -700,6 +701,15 @@ def EmitC_FuncOp : EmitC_Op<"func", [
+
+     /// Returns the result types of this function.
+     ArrayRef<Type> getResultTypes() { return getFunctionType().getResults(); }
++
++    //===------------------------------------------------------------------===//
++    // OpAsmOpInterface Methods
++    //===------------------------------------------------------------------===//
++
++    /// EmitC ops in the body can omit their 'emitc.' prefix in the assembly.
++    static ::llvm::StringRef getDefaultDialect() {
++      return "emitc";
++    }
+   }];
+   let hasCustomAssemblyFormat = 1;
+   let hasVerifier = 1;
+diff --git a/mlir/test/Conversion/FuncToEmitC/func-to-emitc.mlir b/mlir/test/Conversion/FuncToEmitC/func-to-emitc.mlir
+index 5730f7a4814f..bd48886ed739 100644
+--- a/mlir/test/Conversion/FuncToEmitC/func-to-emitc.mlir
++++ b/mlir/test/Conversion/FuncToEmitC/func-to-emitc.mlir
+@@ -1,7 +1,7 @@
+ // RUN: mlir-opt -split-input-file -convert-func-to-emitc %s | FileCheck %s
+
+ // CHECK-LABEL: emitc.func @foo()
+-// CHECK-NEXT: emitc.return
++// CHECK-NEXT: return
+ func.func @foo() {
+   return
+ }
+@@ -9,7 +9,7 @@ func.func @foo() {
+ // -----
+
+ // CHECK-LABEL: emitc.func private @foo() attributes {specifiers = ["static"]}
+-// CHECK-NEXT: emitc.return
++// CHECK-NEXT: return
+ func.func private @foo() {
+   return
+ }
+@@ -25,7 +25,7 @@ func.func @foo(%arg0: i32) {
+ // -----
+
+ // CHECK-LABEL: emitc.func @foo(%arg0: i32) -> i32
+-// CHECK-NEXT: emitc.return %arg0 : i32
++// CHECK-NEXT: return %arg0 : i32
+ func.func @foo(%arg0: i32) -> i32 {
+   return %arg0 : i32
+ }
+@@ -41,14 +41,14 @@ func.func @foo(%arg0: i32, %arg1: i32) -> i32 {
+ // -----
+
+ // CHECK-LABEL: emitc.func private @return_i32(%arg0: i32) -> i32 attributes {specifiers = ["static"]}
+-// CHECK-NEXT: emitc.return %arg0 : i32
++// CHECK-NEXT: return %arg0 : i32
+ func.func private @return_i32(%arg0: i32) -> i32 {
+   return %arg0 : i32
+ }
+
+ // CHECK-LABEL: emitc.func @call(%arg0: i32) -> i32
+-// CHECK-NEXT: %0 = emitc.call @return_i32(%arg0) : (i32) -> i32
+-// CHECK-NEXT: emitc.return %0 : i32
++// CHECK-NEXT: %0 = call @return_i32(%arg0) : (i32) -> i32
++// CHECK-NEXT: return %0 : i32
+ func.func @call(%arg0: i32) -> i32 {
+   %0 = call @return_i32(%arg0) : (i32) -> (i32)
+   return %0 : i32
+@@ -62,14 +62,14 @@ func.func private @return_i32(%arg0: i32) -> i32
+ // -----
+
+ // CHECK-LABEL: emitc.func private @return_void() attributes {specifiers = ["static"]}
+-// CHECK-NEXT: emitc.return
++// CHECK-NEXT: return
+ func.func private @return_void() {
+   return
+ }
+
+ // CHECK-LABEL: emitc.func @call()
+-// CHECK-NEXT: emitc.call @return_void() : () -> ()
+-// CHECK-NEXT: emitc.return
++// CHECK-NEXT: call @return_void() : () -> ()
++// CHECK-NEXT: return
+ func.func @call() {
+   call @return_void() : () -> ()
+   return
+-- 
+2.46.0
+
diff --git a/.../build_tools/patches/mlir/0006-MLIR-EmitC-arith-to-emitc-Fix-lowering-of-fptoui-118.patch b/.../build_tools/patches/mlir/0006-MLIR-EmitC-arith-to-emitc-Fix-lowering-of-fptoui-118.patch
@@ -0,0 +1,52 @@
+From ae59bead48221d52aec82b8059885f02451949df Mon Sep 17 00:00:00 2001
+From: Matthias Gehre <[email protected]>
+Date: Thu, 5 Dec 2024 14:50:35 +0100
+Subject: [PATCH 06/16] [MLIR][EmitC] arith-to-emitc: Fix lowering of fptoui
+ (#118504)
+
+`arith.fptoui %arg0 : f32 to i16` was lowered to
+```
+%0 = emitc.cast %arg0 : f32 to ui32
+emitc.cast %0 : ui32 to i16
+```
+and is now lowered to
+```
+%0 = emitc.cast %arg0 : f32 to ui16
+emitc.cast %0 : ui16 to i16
+```
+---
+ mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp     | 2 +-
+ mlir/test/Conversion/ArithToEmitC/arith-to-emitc.mlir | 4 ++++
+ 2 files changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp b/mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp
+index 50384d9a08e5..ccbc1669b7a9 100644
+--- a/mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp
++++ b/mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp
+@@ -674,7 +674,7 @@ public:
+     Type actualResultType = dstType;
+     if (isa<arith::FPToUIOp>(castOp)) {
+       actualResultType =
+-          rewriter.getIntegerType(operandType.getIntOrFloatBitWidth(),
++          rewriter.getIntegerType(dstType.getIntOrFloatBitWidth(),
+                                   /*isSigned=*/false);
+     }
+
+diff --git a/mlir/test/Conversion/ArithToEmitC/arith-to-emitc.mlir b/mlir/test/Conversion/ArithToEmitC/arith-to-emitc.mlir
+index afd1198ede0f..1728c3a2557e 100644
+--- a/mlir/test/Conversion/ArithToEmitC/arith-to-emitc.mlir
++++ b/mlir/test/Conversion/ArithToEmitC/arith-to-emitc.mlir
+@@ -587,6 +587,10 @@ func.func @arith_float_to_int_cast_ops(%arg0: f32, %arg1: f64) {
+   // CHECK: emitc.cast %[[CAST0]] : ui32 to i32
+   %4 = arith.fptoui %arg0 : f32 to i32
+
++  // CHECK: %[[CAST0:.*]] = emitc.cast %arg0 : f32 to ui16
++  // CHECK: emitc.cast %[[CAST0]] : ui16 to i16
++  %5 = arith.fptoui %arg0 : f32 to i16
++
+   return
+ }
+
+-- 
+2.46.0
+