Skip to content

[SYCL] Add barrier optimization pass #19353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 22, 2025

Conversation

MrSidims
Copy link
Contributor

@MrSidims MrSidims commented Jul 9, 2025

It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in
SYCLOptimizeBackToBackBarrier.cpp for more details.

Changed |= eliminateBoundaryBarriers(BarrierPtrs);
// Then remove redundant barriers within a single basic block.
for (auto &BarrierBBPair : BarriersByBB)
Changed = eliminateBackToBackInBB(BarrierBBPair.first, BarrierBBPair.second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can eliminateBackToBackInBB be merged into eliminateDominatedBarriers? eliminateBackToBackInBB is just a special case of the latter in that all barriers are in a single BB, right?

Copy link
Contributor Author

@MrSidims MrSidims Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be merged. Yet I've left them split as eliminate back to back barriers function is algorithmic-wise simpler, then CFG elimination. And I though that it's a good idea to first optimize back-to-back barriers, then (not yet implemented) hoist 2 or more barriers into one in case if their appropriate blocks share the same predecessor and their semantics match, and only then do CFG-aware removal/downgrade on the remaining barriers).


// If identical then drop Cur.
if (CmpExec == CompareRes::EQUAL && CmpMem == CompareRes::EQUAL) {
if (noFencedMemAccessesBetween(Last.CI, Cur.CI, FenceLast, BBMemInfo)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note: there could be repeated classifyMemScope calculation of somes instructions in noFencedMemAccessesBetween, e.g. following case:

barrier(CrossDevice)
Instruction Set 1 (RegionMemScope == None)
barrier(Device)
Instruction Set 2 (RegionMemScope == None)
barrier(Workgroup)
Instruction Set 3 (RegionMemScope == None)
barrier(Subgroup)

I guess the case is rare, so probably no need to optimize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But this would require to do extra memorization (by default), which might be worse comparing extra calculus in the rare case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

Sounding interesting, is there a link to the work?

Copy link
Contributor Author

@MrSidims MrSidims Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenju-he I meant CPU middle end pass, which while is not doing the same as this pass, yet have quite interesting idea for function preparation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Right it is not the same. I agree that a region based algorithm would be better. Basically the pass here is merging equivalent regions.

if (Fence == RegionMemScope::Unknown)
continue;

if (DT.dominates(B1->CI, B2->CI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there repeated calculation for the case

  • B1 = A0, B2 = A1, A0 dominates A1, noFencedAccessesCFG returns false
  • B1 = A1, B2 = A0, A1 post-dominates A0, noFencedAccessesCFG is called again on the same instructions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another potential repeated calculation case is:
A0 dominates A1, A1 dominates A2. noFencedAccessesCFG is called twice on the instructions between A0 and A1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is something I'm refactoring and will continue to refactor by merging elimination in CFG and downgrade functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be partially resolved.

It removes redundant barriers (both back-to-back and in general in CFG)
and downgrades global barrier to local if there are no global memory
accesses 'between' them. See description in
SYCLOptimizeBackToBackBarrier.cpp for more details.

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims MrSidims force-pushed the optimize-barrier-2 branch from e5b70b9 to 6d51dad Compare July 13, 2025 11:21
@MrSidims MrSidims force-pushed the optimize-barrier-2 branch from 6d51dad to 7710333 Compare July 13, 2025 11:22
@MrSidims MrSidims marked this pull request as ready for review July 13, 2025 11:22
@MrSidims MrSidims requested review from a team as code owners July 13, 2025 11:22
@MrSidims MrSidims requested a review from aelovikov-intel July 13, 2025 11:22
@MrSidims MrSidims changed the title [testing for now][SYCL] Add barrier optimization pass [SYCL] Add barrier optimization pass Jul 13, 2025
@MrSidims MrSidims marked this pull request as draft July 13, 2025 12:05
TODO: merge CFG elimination and barrier downgrade

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims MrSidims force-pushed the optimize-barrier-2 branch from 7710333 to 527e8e3 Compare July 14, 2025 10:00
@MrSidims MrSidims requested a review from wenju-he July 14, 2025 10:06
Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit about a redundant comment.

Signed-off-by: Sidorov, Dmitry <[email protected]>
Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims MrSidims requested a review from a team July 22, 2025 11:41
@MrSidims
Copy link
Contributor Author

@intel/llvm-gatekeepers not sure if @intel/dpcpp-cfe-reviewers approval is mandatory here. If you agree, please help with the merge.

@steffenlarsen steffenlarsen merged commit 0231525 into intel:sycl Jul 22, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants