-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Checkpointing for T8codeMesh #1980
base: main
Are you sure you want to change the base?
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1980 +/- ##
==========================================
- Coverage 96.23% 87.12% -9.10%
==========================================
Files 462 462
Lines 37075 37233 +158
==========================================
- Hits 35676 32439 -3237
- Misses 1399 4794 +3395
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature. Thank a lot!
examples/t8code_2d_dgsem/elixir_advection_amr_unstructured_flag.jl
Outdated
Show resolved
Hide resolved
I get the feeling that the MPI tests are too big now and take too long. We probably have to split them up similar to the serial tests. |
Co-authored-by: Joshua Lampert <[email protected]>
Yes, could be related to OOM issues, cf. #1471. |
I could narrow it down. It has something to do with Julia 10.1.4. With Julia 10.1.2 it does not stall. Investigating ... |
Are you able to reproduce the problem locally? |
Yes! With Julia 1.10.2 the t8code MPI tests run successfully. However, with Julia 1.10.4 the MPI test for |
Are you sure it's related to the patch version bump? Are you using an identical Manifest.toml for both tests? |
Yes! Working from the exact same project folder. Just pointing the Julia binary to either 1.10.2 or 1.10.4. |
So it consistently stalls with Julia 1.10.4, but consistently works with Julia 1.10.2 in multiple runs? Did you monitor RAM usage during the simulation? |
Yes! RAM usage is not out of ordinary. |
I think I found the bug causing the stalls in the MPI runs. It was a silent memory leak/segfault. I added the fixes in the last commit. Furthermore, I changed the t8code C interface a tiny bit to simplify the code on Trixi side. This PR has to wait for the next breaking t8code release and specifically for the merge of this PR: DLR-AMR/t8code#1115. I'll try to push for a major t8code release by the end of next week. |
This PR adds checkpointing for
T8codeMesh
. By this, routines likesave_mesh
andload_mesh
are supported.Closes #2044