-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Subfile Read Issue #62
Open
yzanhua
wants to merge
22
commits into
master
Choose a base branch
from
subfile-read
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We need subfiling test programs to test this PR. |
when opening a subfile, fp->nldset and fp->nmdset are not read from subfiles. Instead, they are read from the master file. This commit fix this issue and read from the subfile correctly.
It is possible that, for example, a file is created with 8 subfiles. But when openning and reading the file, we only use 4 processes. In the original implementation before this fix, the info of 8 subfiles is not saved. Only the first fp->ngroup subfiles will be opened for read, where fp->ngroup is a number bounded by the number of processes (i.e. <= 4 in this case). In this fix, we use fp->nsubfiles to store the number of subfiles for an opened file. All fp->nsubfiles subfiles will be opened for read.
MPI_FILE_set_view is a collective call. Befroe this fix, not all processes call this function during dataset read, introducing possible hangs. A subfile read test case may trigger this issue more easiliy. This commit fix this issue.
Test the following: 1. nsbufile > nproc 2. nsubfile == nproc 3. nsubfile < nproc For each of the above, test: 1. read pattern same as write pattern (row wise) 2. read pattern is row wise, but each process read a different row than it writes. (read from one subfile that process is not responsible for) 3. read pattern is column wise (read from several subfiles) 4. read all dataset. This means a total of 12 scenarios are tested. Also, we test each scenario using 1 to 12 number of processes. This makes sures Log VOL also works for odd number of processes.
yzanhua
force-pushed
the
subfile-read
branch
3 times, most recently
from
June 5, 2023 11:52
f2020f9
to
c15c360
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently several issues exist in subfile read feature.
Running the following E3SM-IO benchmark using commit newer than e3aef7c will result in several errors:
# e3sm-io commands, this enables subfile feature ./e3sm_io -g 1 -k -a hdf5_log -x log -o ./test_output/hdf5_log_log_map_i_case_16p.h5 path-to-datasets/map_i_case_16p.h5
H5CX_xx
call complains about invalid IDs. This issue is related toH5VL_logi_reset_lib_stat
calls. Adding only one (pair of)H5VL_logi_reset_lib_stat
aroundH5VL_log_filei_flush
is not enough. We need to add two (pairs) insideH5VL_log_filei_flush
, one forH5VL_log_nb_flush_write_reqs
and one forH5VL_log_nb_flush_read_reqs
H5VL_log_filei_open_subfile
will occur. This error complains about an invalid VOL id, which should also be related to lib_stat calls.H5VL_log_filei_open_subfile
is not correct. This issue is fixed.H5VL_log_filei_create_subfile
. VOL: Cannot Create and Write an Attribute at File Create Time hdf5#2220 should be a similar issue. We already followed their advice to move everything to the post open callback but is still having the issue.Currently the subfile read feature is disabled and using HDF5 1.14.0 production mode should not give errors.