Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate implementation of MaxConcurrentIO parameter #41

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jchelly
Copy link
Collaborator

@jchelly jchelly commented May 31, 2024

On the FLAMINGO 10k run I've been finding that if not all ranks are allowed to read at the same time then the code is very slow. I think this might be because if the system is busy and a few ranks suffer long delays then the others are forced to wait. The current implementation divides the MPI ranks into groups and only one group at a time may read. None of the ranks in the next group can start until ALL ranks in the previous group finish.

This pull request modifies the code so that as soon as any one rank finishes reading another is immediately allowed to start. This is implemented by having the first rank which finishes reading become responsible for signalling the others to start.

jchelly added 8 commits May 29, 2024 16:25
The previous implementation split the MPI ranks into groups and
prevented any ranks in the next group from proceeding until all
ranks in the current group have finished. This will waste a lot
of time if one (or a few) ranks are very slow.

This new implementation tries to make sure that we always have
MaxConcurrentIO ranks reading. When any one rank finishes the
next is allowed to start immediately.
@jchelly
Copy link
Collaborator Author

jchelly commented Dec 16, 2024

I also found that on COLIBRE L400M7 DMO the code suddenly started taking a long time to write the output at a particular snapshot. This pull request might help a bit if it's due to certain OSTs being slow.

I think it might also be worth looking into increasing the metadata block size to avoid small writes (H5Pset_meta_block_size) and paged file space management (H5Pset_file_space_strategy) to align data blocks with lustre stripes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant