Add scalar reduction codegen schedule #1284

Yancey1989 · 2024-03-01T07:06:35Z

add scalar-reduction codegen template , the algorithm comes from https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

eedalong

LGTM

yunzhongOvO · 2024-03-22T03:45:35Z

tao_compiler/mlir/disc/transforms/lhlo_legalize_roots_to_loops.cc

+ *      shm[tid] += inputs[j] + inputs[j + block_size];
+ *    }
+ *    __syncthreads();
+ *    for (int stride = block_size / 2; stride > 0; stride /= 2) {


here missing the logic of warpReduce.

yunzhongOvO · 2024-03-22T04:22:57Z

tao_compiler/mlir/disc/transforms/lhlo_legalize_roots_to_loops.cc

+  }
+  {
+    SmallVector<Value, 4> init_values = {};
+    for (int stride = 128; stride > 16; stride /= 2) {


warp_size=32, it is better to set the stop condition to stride > 32

yunzhongOvO · 2024-03-22T04:25:42Z

tao_compiler/mlir/disc/transforms/lhlo_legalize_roots_to_loops.cc

+            b.create<memref::LoadOp>(loc, shared_mem_map[root_op], strid_tid);
+        Value sum = accum_factory[idx](shm_val_1, shm_val_2);
+        b.create<memref::StoreOp>(loc, sum, shared_mem_map[root_op], tid);
+        b.create<gpu::BarrierOp>(loc);


BarrierOp is not necessary, threads in a warp are synchronized all the time.

I rewrite the wrap reduction section with shuffle inst, and will update this PR later.

yunzhongOvO · 2024-03-22T04:26:21Z

tao_compiler/mlir/disc/transforms/lhlo_legalize_roots_to_loops.cc

+        /*hasElseRegion*/ false);
+    b.setInsertionPointToStart(&if_tid_valid_op.getThenRegion().front());
+    SmallVector<Value, 4> yield_values;
+    for (int stride = 16; stride > 0; stride /= 2) {


Start with stride = 32.

tao_compiler/mlir/disc/transforms/lhlo_legalize_roots_to_loops.cc

Yancey1989 added 2 commits March 5, 2024 15:40

support scalar reduction

efcc56b

update

6fc8883

Pokemons386 force-pushed the scalar_reduction branch from 5009a3f to 6fc8883 Compare March 5, 2024 07:54

Yancey1989 added 3 commits March 5, 2024 18:58

update

274db22

update

86bf7d9

update

d537941

Yancey1989 changed the title ~~[WIP]support scalar reduction~~ support scalar reduction Mar 8, 2024

fix ut

9dba20a

Yancey1989 requested review from qiuxiafei, yunzhongOvO and eedalong March 12, 2024 07:24

eedalong previously approved these changes Mar 12, 2024

View reviewed changes

Yancey1989 dismissed eedalong’s stale review via aee2026 March 19, 2024 12:00

fix ut

81aa949

Yancey1989 force-pushed the scalar_reduction branch from aee2026 to 81aa949 Compare March 19, 2024 12:25

fix ut

330dda4

Yancey1989 changed the title ~~support scalar reduction~~ Add scalar reduction codegen schedule Mar 20, 2024

eedalong self-requested a review March 22, 2024 02:08

eedalong approved these changes Mar 22, 2024

View reviewed changes

yunzhongOvO reviewed Mar 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scalar reduction codegen schedule #1284

Add scalar reduction codegen schedule #1284

Yancey1989 commented Mar 1, 2024 •

edited

Loading

eedalong left a comment

yunzhongOvO Mar 22, 2024

yunzhongOvO Mar 22, 2024

yunzhongOvO Mar 22, 2024

Yancey1989 Apr 1, 2024

yunzhongOvO Mar 22, 2024

Add scalar reduction codegen schedule #1284

Are you sure you want to change the base?

Add scalar reduction codegen schedule #1284

Conversation

Yancey1989 commented Mar 1, 2024 • edited Loading

eedalong left a comment

Choose a reason for hiding this comment

yunzhongOvO Mar 22, 2024

Choose a reason for hiding this comment

yunzhongOvO Mar 22, 2024

Choose a reason for hiding this comment

yunzhongOvO Mar 22, 2024

Choose a reason for hiding this comment

Yancey1989 Apr 1, 2024

Choose a reason for hiding this comment

yunzhongOvO Mar 22, 2024

Choose a reason for hiding this comment

Yancey1989 commented Mar 1, 2024 •

edited

Loading