cleanup redirected stdout crash before re-redirecting #2311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #2269 by merging any per-rank log files leftover from a previous crash before proceeding with a new stdout/stderr log redirect per-rank. Previously these logfiles would get overwritten, losing the record of the crash.
I tested this with $CFS/desi/users/sjbailey/dev/stdout_redirect/mpi_test_redirect which purposefully crashes a requested rank. e.g.
Detail: previously the per-rank files had names like
output.log_0
which was confusingly similar to the backup names likeoutput.log.0
. I changed this tooutput.log-rank0
instead to make it clearer which files are leftover per-rank files and which are the backups.Note: if a crash happens cleanly with an exception,
stdouterr_redirected
recovers and is able to merge the per-rank files. The leftover per-rank logfiles case comes from hard crashes like OOM or in this test case, from a purposeful segfault.