-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CRaC POC #8743
base: main
Are you sure you want to change the base?
[WIP] CRaC POC #8743
Conversation
Workaround for: Error (criu/cr-dump.c:203): 18 has rseq but kernel lacks get_rseq_conf feature Signed-off-by: Daniel Kec <[email protected]>
Signed-off-by: Daniel Kec <[email protected]>
Signed-off-by: Daniel Kec <[email protected]>
Hi @danielkec , I've played with this a bit to let CRaC checkpoint the webserver after it starts: https://github.com/rvansa/helidon/tree/crac-poc |
Referring to my changes ^: Actually, the case where a |
Signed-off-by: Daniel Kec <[email protected]>
|
||
curl --retry 10 --retry-all-errors --retry-delay 1 http://localhost:7001 | ||
printf "\n==== Warming up ...\n" | ||
wrk -c 16 -t 16 -d 10s http://localhost:7001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rvansa thx for cool fix and sorry for the delay. It seems to be working for me, but when I do a little warmup before the snapshot, snapshot fails with:
An exception during a checkpoint operation:
jdk.internal.crac.mirror.CheckpointException
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=165 type=unknown path=anon_inode:[eventpoll]
at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:299)
Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=183 type=unknown path=anon_inode:[eventpoll]
at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:299)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a native component opening those epoll FD; I wonder why this didn't pop up with a single request for warmup. Native FDs ask for investigation through strace
: https://github.com/CRaC/docs/blob/master/debugging.md#file-descriptors-in-native-code
I'll try to reproduce locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielkec, I can confirm this is an issue on JDK (CRaC) side. There are some codepaths in sun.nio
triggered when the socket is created from a virtual threads, and we did not have test coverage for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielkec , sorry for a delay. The fix should be in the last release on https://www.azul.com/downloads/#downloads-table-zulu - can you try with https://cdn.azul.com/zulu/bin/zulu22.32.17-ca-crac-jdk22.0.2-linux_x64.tar.gz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any luck with the latest release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that 22.32.17 fixes the issue
Helidon MP on CRaC
Coordinated Restore at Checkpoint
Helidon MP Implicit example on CRaC
examples/crac/README.md