-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Shenandoah support #10904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Shenandoah support #10904
Conversation
If you have any questions about how to structure your changes feel free to ping me over slack as I was the primary author of the new LIR support for barriers. |
Thanks, Tom! I will do that whenever I get stuck or have questions. So far I'm making progress. Structurally, Shenandoah will be look like a mix of ZGC (for the load-barrier, even though I am modeling it as a Node that consumes the loaded value, instead of replacing the ReadNode altogether) and G1 (for the SATB parts), and likely Serial/Parallel for the card-table parts. |
Sounds good. There's still some more work to finish out the switch to LIR only barriers but I think supporting G1 and ZGC covers the required strategies in a fairly pragmatic way. |
@tkrodriguez can you please take another look at this. Once done, we can ask @davleopo and @gergo- to look at it. |
I'll take a look. |
Are the gate failures actual problems? It would be good to see a clean gate. Also, we should squash the history before committing. |
I don't know what those problems are. I fixed everything that looked related to my changes. Those failures look like infra problems, some volumes seem to have run out of memory or something. I doubt that it is related.
|
compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/phases/LowTier.java
Outdated
Show resolved
Hide resolved
.../src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/gc/shenandoah/ShenandoahBarrierSet.java
Outdated
Show resolved
Hide resolved
@tkrodriguez There seem to be GHA failures that report SerialWriteBarriers not being Lowerable, coming from SubstrateVM. Could this be related to moving the barrier addition phase from mid- to low-tier? I don't think I have changed anything in SerialWriteBarrier or related code. |
f8dac0f
to
9f2c78d
Compare
Yes this is something I mentioned in our slack discussions. Moving The options are to conditionalize the placement of I'm going to try putting |
As far as I can see, it's only the SerialWriteBarrierNode which depends on snippets (is that right?). That should be relatively straightforward to implement without snippets and would look almost exactly like ShenandoahCardBarrierNode implementation - even a little simpler. I could work on implementing that, if you think that'd help. |
It would be easy to convert the HotSpot serial barrier to LIR but native image uses snippets for its barriers and its serial barrier is non-trivial. So we'll need to live with this mixed model for a little while I think. I'm beginning to think we might just need an early and late phase. I think the way barriers for vector writes work, we might have to do barrier addition for them before LowTierLowering, or at least before VectorLoweringPhase, which is before FixReadsPhase. It might be too much to try to resolve all these issues in this PR. Since Shenandoah is currently HotSpot only maybe it would be best to special case its barrier insertion. I'll will play some more with this to see what would be best. |
I implemented barrier addition to trigger in both mid- and low-tier, and the BarrierSet implementation gets to choose which one (or both, if it wishes) is appropriate. This choice defaults to mid-tier, and can be overridden in implementations, like I did in ShenandoahBarrierSet. This way we get a clean gate :-) |
Thanks. I'll see whether that works ok in our full gate. I'd tried something slightly different but the |
Your fix works though I don't love some of the details. I'll just put comments on those places. I was able to get a clean internal gate with just minor changes in enterprise. I wasn't actually able to test Shenandoah because we don't have a labsjdk that includes it at the moment. |
...jdk.graal.compiler.test/src/jdk/graal/compiler/hotspot/test/DeferredBarrierAdditionTest.java
Outdated
Show resolved
Hide resolved
...graal.compiler/src/jdk/graal/compiler/nodes/gc/shenandoah/ShenandoahPreWriteBarrierNode.java
Outdated
Show resolved
Hide resolved
...graal.compiler/src/jdk/graal/compiler/nodes/gc/shenandoah/ShenandoahPreWriteBarrierNode.java
Outdated
Show resolved
Hide resolved
compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/GraphState.java
Outdated
Show resolved
Hide resolved
...r/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/WriteBarrierAdditionPhase.java
Outdated
Show resolved
Hide resolved
5c691ed
to
e9de7db
Compare
Thanks for updates. There are various style problems that need to be fixed. Unused imports, guarantee should use
The CodeFormatCheck step requires a downloaded eclipse 4.26 binary to work but you can probably skip that. The BuildWithEcj runs separately because we don't want to overwrite the javac compiled files. |
@tkrodriguez The finagle-http benchmark failed because of missing stuff in the CAS-ref barriers. I implemented those parts now. I also took a stab at the truffle thing you mentioned, at least on arm64, but I am not even sure that it is correct. Is there any way to test it? The x86 part of that is still missing, and I'll be off to vacation for the next two weeks. I'll try to squeeze it in later tonight, or maybe tomorrow, but if I can't, it'll have to wait. Same for the gate failures (at least part of them are waiting for openjdk/jdk#25552 from OpenJDK upstream anyway). |
No rush on any of this, but I just wanted keep you posted on what I'm seeing with our testing. As far as the truffle part, you can at least minimally exercise it with the truffle unit tests in the compiler suite. More extensive testing requires running the language tests. I've got tasks in our CI to exercise shenandoah that will test that out. So you can make an initial implementation that passes the basic unit tests and we can stress it in our gate. The implementation should just be a directly invokeable entry point into the normal load barrier LIR op. So it should be mostly just be refactoring the op and maybe some manual tmp selection that's compatible with the live registers on entry. I can help you work out any kinks. It's an annoying part of the implementation. |
I made a blind/best-effort implementation for the Truffle frame-setup parts, based on what I could understand from the context and the ZGC implementation. It's pretty straightforward. An open question in the x86 parts is: I need two temporary registers, and I am not sure which ones would be free to use. I used r9 and r11, they are caller-saved in normal calling conventions, but I am not sure if this is the correct assumption in that code. Also, I haven't tested it, yet. Let me know if you want anything else changed. I might not get to it before June 23, though... |
fffd7b1
to
dcaafee
Compare
@tkrodriguez @mur47x111 @dougxc @davleopo I merged latest master, the missing symbol problem disappeared, as expected. Then I fixed a problem with a unit-test complaining about guarantee() calls with string concats, which made the gate tests clean. I also squashed all changesets into a single commit (again). I think I addressed all review comments so far, it should be ready for re-review now. |
Thanks. I'll launch our internal gate and see where we stand on testing. |
if (setConditionFlags) { | ||
masm.bind(resultNullFailure); | ||
// Clear zero flag to indicate failure. | ||
masm.subs(32, zr, zr, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be a valid instruction. GCBarrierEmissionTest and UnsafeSubstitutionsTest fail because of this on aarch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. I pushed a fix - please re-test. Thank you!
@tkrodriguez what is the status of this testing? |
Mostly the gate looks good. There are problems with the amd64 truffle barrier that I'm fixing. I don't know if I can push here but I can provide the patch once it's working if not. Then I can launch the full benchmark suites too. |
Ok, thank you! What is the status of it? If you have a patch, I will happily intergrate it into this PR. |
@tkrodriguez @thomaswue ping - what is the status of this PR? I'm waiting for a fix for aarch64/truffle from Tom, and it also is waiting for an approval and hopefully can be integrated soon (in time for jdk25?) |
@rkennke Sorry for not being enough responsive here. I will check where this stands. |
|
||
@Override | ||
public Value emitAtomicReadAndWrite(LIRGeneratorTool tool, LIRKind readKind, Value address, Value newValue, BarrierType barrierType) { | ||
// We insert the necessary barriers in the node graph, at that level it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this comment is a bit mislead IMHO - I guess it means the necessary barriers have been added to GraalIR in the frontend of the compiler already?
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not joining earlier review rounds. I left some style/documentation comments, looks good overall.
// @formatter:on | ||
protected void emitCode(CompilationResultBuilder crb, AArch64MacroAssembler masm) { | ||
try (AArch64MacroAssembler.ScratchRegister tmp2 = masm.getScratchRegister(); | ||
AArch64MacroAssembler.ScratchRegister tmp3 = masm.getScratchRegister()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using the same naming as the HotSpot stub, i.e., rename tmp2
to rScratch1
and tmp3
to rScratch2
.
// @formatter:on | ||
public void emitCode(CompilationResultBuilder crb, AArch64MacroAssembler masm) { | ||
Register address = asRegister(addressValue); | ||
Register result = asRegister(resultValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this more meaningful name than the original's tmp2
, but please add a comment indicating the renaming.
|
||
// If expected equals null but result does not equal null, the | ||
// step2 branches to done to report failure of CAS. If both | ||
// expected and tmp2 equal null, the following branches to done to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp2
-> result
Label done = new Label(); | ||
GraalError.guarantee(accessKind == AArch64Kind.QWORD || accessKind == AArch64Kind.DWORD, "must be 64 or 32 bit access"); | ||
int size = (accessKind == AArch64Kind.QWORD) ? 64 : 32; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a brief comment that the control flow is different from the HotSpot stub, namely that step 4 is inlined in the slow path below.
// Two tests because HAS_FORWARDED | WEAK_ROOTS currently is not representable | ||
// as a single immediate. | ||
masm.tst(64, rscratch1, config.shenandoahGCStateHasForwarded); | ||
masm.branchConditionally(AArch64Assembler.ConditionFlag.NE, slowPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64MacroAssembler
has tbz
and tbnz
macro-instructions, could those be used here?
// | ||
// Try to CAS with given arguments. If successful, then we are done. | ||
|
||
// There are two ways to reach this label. Initial entry into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misplaced comment from the AArch64 version?
GraalError.guarantee(node != null, "input value must not be null"); | ||
StructuredGraph graph = node.graph(); | ||
boolean narrow = node.stamp(NodeView.DEFAULT) instanceof NarrowOopStamp; | ||
ValueNode uncompressed = maybeUncompressReference(node, narrow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems really important here that the maybeUncompressReference
is done without unique because we don't want to reuse an existing uncompress node that is a usage of node
. The implementation does this correctly, but you could make the requirement more explicit like this:
Graph.Mark beforeUncompression = graph.getMark();
ValueNode uncompressed = maybeUncompressReference(node, narrow);
if (uncompressed != node) {
GraalError.guarantee(graph.isNew(beforeUncompression, uncompressed), "we must not reuse an existing uncompress node");
}
NotApplicable.ifApplied(this, StageFlag.BARRIER_ADDITION, graphState), | ||
NotApplicable.unlessRunAfter(this, StageFlag.MID_TIER_LOWERING, graphState), | ||
NotApplicable.ifApplied(this, stage, graphState), | ||
NotApplicable.unlessRunAfter(this, stage == StageFlag.BARRIER_ADDITION ? StageFlag.MID_TIER_LOWERING : StageFlag.LOW_TIER_LOWERING, graphState), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would look nicer if we renamed BARRIER_ADDITION
to MID_TIER_BARRIER_ADDITION
for symmetry. The constructor should check that the stage flag passed in is one of MID_TIER_BARRIER_ADDITION
or LOW_TIER_BARRIER_ADDITION
.
So I have working truffle entry point read barrier changes but that exposed a deeper problem. The current changes rely on running after FixReadsPhase or not having any floating reads but the economy configuration doesn't run FixReadsPhase. We're also not strict about never emitting FloatingReadNodes in the mode so some reads end up without a barrier because they are floating. We're moving towards enforcing no floating reads in economy but it's not how it works yet. So I'm looking at moving the read barrier into the LIR so it matches the ZGC implemenation. This would also remove all the changes to where WriteBarrierAdditionPhase is run. The way I'm approaching this is changing the oop reading node to always return uncompressed oops when a read barrier is used, which avoids any dancing around with compression in the barrier itself. It is largely straightforward and is mostly passing the unit tests. I'm chasing a crash or two but I should have something later today that we can evaluate. |
That sounds great, thank you! Let me know if I can help with anything! |
That idea didn't really pan out as it would have required changes to FloatingReadNode that I didn't really want to make. So I'm looking into enforcing the rule that there should be no floating reads in economy mode. That's likely to be true soon as part of another PR but it's not being enforced there. It's mostly straightforward to enforce and we can resolve any conflicts in how it's done once we're ready to merge. I kind of wanted to push this kind of enforcement anyway, and this gives me a good reason. |
I have a set of changes that disallow floating reads from reaching the backend which makes the read barrier strategy for shenandoah work in economy. It's not completely clear whether that will be pushed as part of this PR or if we might want to separate it. Could you rebase to the latest master? I need to update the CI tasks which recently changed. Once you've rebased I can update my internal PR and mirror that to github with my fixes on top. Then we can finish any required work there. Sound good? |
2832afe
to
cf0d37d
Compare
Sounds good! I rebased my branch to latest master. |
So I've taken your changes and applied some fixes including the changes to have only fixed reads in backend, and the combined PR is at #11941. The fix reads change will be addressed separately for clarity under #11942 and I will rebase once that merges. I can address Gergos final comments there but it seems like it's all working. I had a full clean gate including all benchmarks with shenanadoah last week though I had seem a crash or two during early testing so there might still be some problems lurking. Once it's all clean and ready I'll do a final squash since the history is getting very ugly. Does that sound all good? |
Yep, perfect! Thank you for your help! |
This implements the barriers that are needed to run with Shenandoah GC in the Graal compiler. (Issue: #3472)
There are 3 basic kinds of barriers needed for Shenandoah:
Notice that none of the barriers are implemented as snippets (like Serial/Parallel's card-barriers) or in the backend-only (as ZGC's read-barriers). We needed a way to efficiently deal with compressed-oops, which is not (easily) possible to do in the backend. In the node-graph this is pretty easy: insert the LRB with preceding and succeeding uncompress/compress after any load and before the (potential) uncompress (i.e. turn load->uncompress into load -> (uncompress -> lrb -> compress) -> uncompress) and then let the optimizer optimize away the trailing compress -> uncompress pairs.
In order to support this, we needed a few additions:
X86 port contributed by @JohnTortugo.
Testing:
(We have run those workloads for correctness testing only, we have not (yet) conducted a performance study.)