-
Notifications
You must be signed in to change notification settings - Fork 795
[SYCL] Add barrier optimization pass #19353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1a48836
to
e5b70b9
Compare
Changed |= eliminateBoundaryBarriers(BarrierPtrs); | ||
// Then remove redundant barriers within a single basic block. | ||
for (auto &BarrierBBPair : BarriersByBB) | ||
Changed = eliminateBackToBackInBB(BarrierBBPair.first, BarrierBBPair.second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can eliminateBackToBackInBB
be merged into eliminateDominatedBarriers
? eliminateBackToBackInBB
is just a special case of the latter in that all barriers are in a single BB, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be merged. Yet I've left them split as eliminate back to back barriers function is algorithmic-wise simpler, then CFG elimination. And I though that it's a good idea to first optimize back-to-back barriers, then (not yet implemented) hoist 2 or more barriers into one in case if their appropriate blocks share the same predecessor and their semantics match, and only then do CFG-aware removal/downgrade on the remaining barriers).
|
||
// If identical then drop Cur. | ||
if (CmpExec == CompareRes::EQUAL && CmpMem == CompareRes::EQUAL) { | ||
if (noFencedMemAccessesBetween(Last.CI, Cur.CI, FenceLast, BBMemInfo)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: there could be repeated classifyMemScope calculation of somes instructions in noFencedMemAccessesBetween, e.g. following case:
barrier(CrossDevice)
Instruction Set 1 (RegionMemScope == None)
barrier(Device)
Instruction Set 2 (RegionMemScope == None)
barrier(Workgroup)
Instruction Set 3 (RegionMemScope == None)
barrier(Subgroup)
I guess the case is rare, so probably no need to optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But this would require to do extra memorization (by default), which might be worse comparing extra calculus in the rare case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.
Sounding interesting, is there a link to the work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenju-he I meant CPU middle end pass, which while is not doing the same as this pass, yet have quite interesting idea for function preparation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Right it is not the same. I agree that a region based algorithm would be better. Basically the pass here is merging equivalent regions.
if (Fence == RegionMemScope::Unknown) | ||
continue; | ||
|
||
if (DT.dominates(B1->CI, B2->CI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there repeated calculation for the case
- B1 = A0, B2 = A1, A0 dominates A1, noFencedAccessesCFG returns false
- B1 = A1, B2 = A0, A1 post-dominates A0, noFencedAccessesCFG is called again on the same instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another potential repeated calculation case is:
A0 dominates A1, A1 dominates A2. noFencedAccessesCFG is called twice on the instructions between A0 and A1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is something I'm refactoring and will continue to refactor by merging elimination in CFG and downgrade functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be partially resolved.
It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in SYCLOptimizeBackToBackBarrier.cpp for more details. Signed-off-by: Sidorov, Dmitry <[email protected]>
e5b70b9
to
6d51dad
Compare
6d51dad
to
7710333
Compare
TODO: merge CFG elimination and barrier downgrade Signed-off-by: Sidorov, Dmitry <[email protected]>
7710333
to
527e8e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one nit about a redundant comment.
Signed-off-by: Sidorov, Dmitry <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Sidorov, Dmitry <[email protected]>
@intel/llvm-gatekeepers not sure if @intel/dpcpp-cfe-reviewers approval is mandatory here. If you agree, please help with the merge. |
It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in
SYCLOptimizeBackToBackBarrier.cpp for more details.