gather_dict on local error is big bottleneck for large datasets #527
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By default errors for each view are saved at the end of each block of iterations into a dictionary. Those dictionaries are then gathered across all MPI ranks into a global dictionary and might be saved into the .ptyr file if
record_local_error
is true in the engine params.For the high-perfomance engines, the dictionary MPI gathering of the errors can be a major bottleneck. In this PR I have made the collection of per-view error metrics optional (using the existing
record_local_error
parameter) and if not needed, the errors are first reduced on each rank with a subsequent MPI allreduce across all the ranks. This completely removes the bottleneck but still allows collecting the per-view errors if required. By defaultrecord_local_error
is false.