[SYCL] Add barrier optimization pass #19353

MrSidims · 2025-07-09T00:09:42Z

It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in
SYCLOptimizeBackToBackBarrier.cpp for more details.

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

wenju-he · 2025-07-10T05:39:03Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+  Changed |= eliminateBoundaryBarriers(BarrierPtrs);
+  // Then remove redundant barriers within a single basic block.
+  for (auto &BarrierBBPair : BarriersByBB)
+    Changed = eliminateBackToBackInBB(BarrierBBPair.first, BarrierBBPair.second,


can eliminateBackToBackInBB be merged into eliminateDominatedBarriers? eliminateBackToBackInBB is just a special case of the latter in that all barriers are in a single BB, right?

Yes, it can be merged. Yet I've left them split as eliminate back to back barriers function is algorithmic-wise simpler, then CFG elimination. And I though that it's a good idea to first optimize back-to-back barriers, then (not yet implemented) hoist 2 or more barriers into one in case if their appropriate blocks share the same predecessor and their semantics match, and only then do CFG-aware removal/downgrade on the remaining barriers).

wenju-he · 2025-07-10T06:32:36Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+
+      // If identical then drop Cur.
+      if (CmpExec == CompareRes::EQUAL && CmpMem == CompareRes::EQUAL) {
+        if (noFencedMemAccessesBetween(Last.CI, Cur.CI, FenceLast, BBMemInfo)) {


just a note: there could be repeated classifyMemScope calculation of somes instructions in noFencedMemAccessesBetween, e.g. following case:

barrier(CrossDevice) Instruction Set 1 (RegionMemScope == None) barrier(Device) Instruction Set 2 (RegionMemScope == None) barrier(Workgroup) Instruction Set 3 (RegionMemScope == None) barrier(Subgroup)

I guess the case is rare, so probably no need to optimize.

True. But this would require to do extra memorization (by default), which might be worse comparing extra calculus in the rare case.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

Sounding interesting, is there a link to the work?

@wenju-he I meant CPU middle end pass, which while is not doing the same as this pass, yet have quite interesting idea for function preparation :)

I see. Right it is not the same. I agree that a region based algorithm would be better. Basically the pass here is merging equivalent regions.

wenju-he · 2025-07-10T06:57:55Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+      if (Fence == RegionMemScope::Unknown)
+        continue;
+
+      if (DT.dominates(B1->CI, B2->CI)) {


is there repeated calculation for the case

B1 = A0, B2 = A1, A0 dominates A1, noFencedAccessesCFG returns false

B1 = A1, B2 = A0, A1 post-dominates A0, noFencedAccessesCFG is called again on the same instructions

another potential repeated calculation case is:
A0 dominates A1, A1 dominates A2. noFencedAccessesCFG is called twice on the instructions between A0 and A1.

Yeah, this is something I'm refactoring and will continue to refactor by merging elimination in CFG and downgrade functions.

Should be partially resolved.

It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in SYCLOptimizeBackToBackBarrier.cpp for more details. Signed-off-by: Sidorov, Dmitry <[email protected]>

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

TODO: merge CFG elimination and barrier downgrade Signed-off-by: Sidorov, Dmitry <[email protected]>

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

clang/lib/CodeGen/BackendUtil.cpp

maarquitos14

LGTM, just one nit about a redundant comment.

Signed-off-by: Sidorov, Dmitry <[email protected]>

wenju-he

LGTM

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims · 2025-07-22T11:42:45Z

@intel/llvm-gatekeepers not sure if @intel/dpcpp-cfe-reviewers approval is mandatory here. If you agree, please help with the merge.

MrSidims had a problem deploying to WindowsCILock July 9, 2025 00:09 — with GitHub Actions Failure

MrSidims force-pushed the optimize-barrier-2 branch from 1a48836 to e5b70b9 Compare July 9, 2025 08:36

MrSidims temporarily deployed to WindowsCILock July 9, 2025 08:36 — with GitHub Actions Inactive

MrSidims commented Jul 9, 2025

View reviewed changes

MrSidims temporarily deployed to WindowsCILock July 9, 2025 09:45 — with GitHub Actions Inactive

wenju-he reviewed Jul 10, 2025

View reviewed changes

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp Outdated Show resolved Hide resolved

wenju-he reviewed Jul 10, 2025

View reviewed changes

MrSidims force-pushed the optimize-barrier-2 branch from e5b70b9 to 6d51dad Compare July 13, 2025 11:21

MrSidims had a problem deploying to WindowsCILock July 13, 2025 11:21 — with GitHub Actions Error

MrSidims force-pushed the optimize-barrier-2 branch from 6d51dad to 7710333 Compare July 13, 2025 11:22

MrSidims marked this pull request as ready for review July 13, 2025 11:22

MrSidims requested review from a team as code owners July 13, 2025 11:22

MrSidims requested a review from aelovikov-intel July 13, 2025 11:22

MrSidims had a problem deploying to WindowsCILock July 13, 2025 11:23 — with GitHub Actions Failure

MrSidims changed the title ~~[testing for now][SYCL] Add barrier optimization pass~~ [SYCL] Add barrier optimization pass Jul 13, 2025

MrSidims temporarily deployed to WindowsCILock July 13, 2025 12:04 — with GitHub Actions Inactive

MrSidims marked this pull request as draft July 13, 2025 12:05

MrSidims commented Jul 13, 2025

View reviewed changes

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp Outdated Show resolved Hide resolved

Rewrite logic for fenced memory detection and many more

1397c5b

TODO: merge CFG elimination and barrier downgrade Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims force-pushed the optimize-barrier-2 branch from 7710333 to 527e8e3 Compare July 14, 2025 10:00

MrSidims temporarily deployed to WindowsCILock July 14, 2025 10:00 — with GitHub Actions Inactive

MrSidims commented Jul 14, 2025

View reviewed changes

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp Outdated Show resolved Hide resolved

MrSidims requested a review from wenju-he July 14, 2025 10:06

MrSidims temporarily deployed to WindowsCILock July 20, 2025 10:00 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock July 20, 2025 10:36 — with GitHub Actions Inactive

wenju-he reviewed Jul 21, 2025

View reviewed changes

clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved

maarquitos14 approved these changes Jul 21, 2025

View reviewed changes

Address few changes, simplify CFG scan

f1143e3

MrSidims had a problem deploying to WindowsCILock July 21, 2025 23:53 — with GitHub Actions Error

Add a test

f42c3c1

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims had a problem deploying to WindowsCILock July 22, 2025 00:27 — with GitHub Actions Failure

MrSidims requested a review from wenju-he July 22, 2025 00:52

MrSidims temporarily deployed to WindowsCILock July 22, 2025 01:05 — with GitHub Actions Inactive

wenju-he approved these changes Jul 22, 2025

View reviewed changes

restore part of at exit/entry checks (no scanning though)

3e2e76a

MrSidims had a problem deploying to WindowsCILock July 22, 2025 01:49 — with GitHub Actions Error

format

266c9f1

MrSidims had a problem deploying to WindowsCILock July 22, 2025 01:50 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock July 22, 2025 02:26 — with GitHub Actions Inactive

fix test

f306ab7

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims had a problem deploying to WindowsCILock July 22, 2025 02:40 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock July 22, 2025 03:22 — with GitHub Actions Inactive

DenseMap -> MapVector to solve non deterministic behaviour

ed7d638

MrSidims temporarily deployed to WindowsCILock July 22, 2025 09:06 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock July 22, 2025 09:42 — with GitHub Actions Inactive

MrSidims requested a review from a team July 22, 2025 11:41

steffenlarsen merged commit 0231525 into intel:sycl Jul 22, 2025
26 checks passed

[SYCL] Add barrier optimization pass #19353

[SYCL] Add barrier optimization pass #19353

Uh oh!

Conversation

MrSidims commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrSidims Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrSidims Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maarquitos14 left a comment

Choose a reason for hiding this comment

Uh oh!

wenju-he left a comment

Choose a reason for hiding this comment

Uh oh!

MrSidims commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

MrSidims Jul 13, 2025 •

edited

Loading

MrSidims Jul 15, 2025 •

edited

Loading