-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rest_vol_dataset: (feature) Fixed memory-side hyperslab #58
rest_vol_dataset: (feature) Fixed memory-side hyperslab #58
Conversation
08a12d8
to
a3974cb
Compare
|
||
/* For contiguous, all faster running dimensions than the current dimension should be selected | ||
* completely */ | ||
whole = (start[ndims - 1] == 0) && (count[ndims - 1] * block[ndims - 1] == dims[ndims - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to require that the fastest running/last dimension be selected completely, which I don't think is necessary for the selection to be contiguous.
Should the comment say that all slower running dimensions than the last dimension must be contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought yesterday until I did a small test which shows it was a wrong thought. Now I read contiguous selection as a selection which is contiguous in memory. If the fastest running/last dimension is not completely selected you need additional jumps to get the right location in memory. A way to visual this, at least for me, is by considering a 2D example and make various selections. For each selection translate it to which memory locations you need to get the selected data.
The current tests don't cover any cases where a selection is non-contiguous, or where the value returned from |
6b09cf9
to
cb4ac45
Compare
In my latest commit, I several improvements of the |
src/rest_vol_dataset.c
Outdated
else { | ||
if ((offset = RV_convert_start_to_offset(mem_space_id[0])) < 0) | ||
FUNC_GOTO_ERROR(H5E_DATASET, H5E_BADVALUE, FAIL, "Unable to determine memory offset value"); | ||
buf[0] = buf[0] + offset * dtype_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for this? I think library functions should already take this into account, since they'll have access to the memory dataspace, and this could potentially lead to issues if the library expects the empty space at the start of the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of binary transfer, the variable buf[0]
is used to set uinfo.buffer
and it assumes that is the starting location in memory of the data that needs to be transferred. The amount of data that needs to be transferred is specified by uinfo.buffer_size
which is set to write_body_len
. In the case of contiguous dataspace selection this starting point does not have to be at the input memory location of buf[0]
but has some offset to it. Therefore, one needs to compute this offset and add it to buf[0]
. This is done all to avoid unnecessary overhead as requested by @jhendersonHDF in PR #50. A way to visual this case is to consider a 1D example and make a selection which is contiguous but does not start at the first memory position.
If desired I can modify one of the hyperslab tests such that the offset is non-zero. Be aware, coming week I have no time to make this modification but week after that is no problem for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. It would be best to include a test with a nonzero offset, since this is new functionality.
4853e85
to
769f118
Compare
I have implemented an additional test in which the memory hyperslab is continuous and the memory offset is non-zero. Furthermore, I have re-implemented my changes based on the latest master branch. The main reason for this, is that due to the recent added support of multi-datasets it was easier to reimplement it than rebasing my old branch to the latest master. |
src/rest_vol_dataset.c
Outdated
FUNC_GOTO_DONE(TRUE); | ||
|
||
if (H5S_SEL_HYPERSLABS == H5Sget_select_type(space_id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will report that any point selection is contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is right. That was the original implementation.
Reading at the documentation of H5Sselect_elements()
it is not complete clearly to me how it works. Looking at the given 1D example, should I read it like one selects 1st, 14th, 17th, 23rd and 8th element of 1D array which holds the data at these points. When calling H5Dwrite()
this data array should have a length of at least 23 elements. If this is the correct way of looking at it then any point selection should be non-contiguous. In this case the correct data will be written to the right location. Of course it is sub-optimal for the case when the points are continuous in memory. I can implement this like this or is it desired to also check if the selected points are contiguous in memory to possible overhead caused by additional the memory copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're right. When using a point selection, the memory buffer and memory dataspace should be contiguous (see Example 7 here) even if the file selection isn't, so this behavior makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we do the following to resolve the point selection case:
- Implement any point selection is non-contiguous such this PR can be merged.
- Next, I create a separate PR in which I improve the point selection case by taking into account if it is continuous and if so compute the corresponding offset in memory. Similar as the hyperslab case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I spoke with Jordan about this, and most point selections are not contiguous in memory, so it's safest to assume they are non-contiguous and perform the memory management in H5Dgather
One thing about your previous example - when calling H5Dwrite()
, the data array would only need a length of 5 elements, to hold the 1st, 14th, 17th, 23rd, and 8th elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information. It makes me realize that I have to take a closer at the point selection to get a better understanding how it actually works.
In my latest commit, I have implemented the assumption that any point selection is non-contiguous in memory.
Non-contiguous and nonzero offset cases look good. Should be good to merge once the point selection |
- Improved the implementation of memory-side hyperslab when writing data by taking into account if the selected data is contiguous and starting memory offset. - Fixed various hyperslab and point tests. - Implemented a test in which the memory-side hyperslab is non-contiguous in memory. - Implemented a test in which the memory-side hyperslab is contiguous in memory but with a non-zero offset in memory.
769f118
to
0f147d5
Compare
- Implement the assumption that any point selection is non-contiguous in memory in both RV_dataspace_selection_is_contiguous() and RV_convert_start_to_offset() function.
0f147d5
to
3267575
Compare
block[0] = 1; | ||
if ((mspace_id = H5Screate_simple(1, mdims, NULL)) < 0) | ||
TEST_ERROR | ||
if (H5Sselect_hyperslab(mspace_id, H5S_SELECT_SET, start, stride, count, block) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that selecting a hyperslab in the memory space here isn't really necessary since it's just a contiguous block across all dims[1] * 2
elements in the memory space that is created, but it's still ok to do so here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know this. In our code, we decided to always explicitly select a hyperslab to avoid the confusion which points have been selected.
for (i = 0; i < dims[1]; i++) | ||
((int *)write_buf)[i] = 68; | ||
for (i = dims[1]; i < data_size / DATASET_DATA_VERIFY_WRITE_TEST_DSET_DTYPESIZE; i++) | ||
((int *)write_buf)[i] = 67; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, shouldn't the condition in the second for loop here also be i < dims[1]
? Otherwise it looks like the first loop sets half the elements in write_buf
and then the second loop starts at an offset of dims[1]
in write_buf
and sets the full number of elements, which would write dims[1]
elements past the end of write_buf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my mistake, ignore. I see what you're doing here now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to ignore this comment.
src/rest_vol_dataset.c
Outdated
"can't allocate space for the 'write_body' values"); | ||
if (H5Dgather(transfer_info[i].mem_space_id, buf[i], transfer_info[i].mem_type_id, | ||
write_body_len, transfer_info[i].u.write_info.write_body, NULL, | ||
transfer_info[i].u.write_info.write_body) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue really, but you can probably pass NULL as the last parameter here since NULL was provided for H5Dgather's callback function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that is much better. I have implemented the change.
src/rest_vol_dataset.c
Outdated
FUNC_GOTO_ERROR(H5E_DATASPACE, H5E_CANTGET, FAIL, "can't get number of hyperslab blocks"); | ||
|
||
if (H5Sget_regular_hyperslab(space_id, start, stride, count, block) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be a good idea to first check that the hyperslab is regular by calling H5Sis_regular_hyperslab
and then only call H5Sget_regular_hyperslab
if it is regular. Otherwise, we could either reject non-regular hyperslabs for now or you could refer to the code in https://github.com/HDFGroup/hdf5/blob/develop/src/H5Shyper.c#L5123 to implement checking if a non-regular hyperslab is contiguous. It's tricky though because we could only really support contiguous, non-regular hyperslabs by converting them to a series of points before sending to HSDS since there's currently no way to send non-regular hyperslab selections directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, as suggested I implemented the check H5Sis_regular_hyperslab()
and if it is a non-regular hyperslab the function will return it is non-contiguous in memory. Furthermore, it sounds like to properly support non-regular hyperslabs require much more work. I think it is better to create a separate PR for it.
src/rest_vol_dataset.c
Outdated
else if (H5S_SEL_POINTS == H5Sget_select_type(space_id)) { | ||
/* Assumption: any point selection is non-contiguous in memory */ | ||
FUNC_GOTO_DONE(FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function mirrors what the library is currently doing (https://github.com/HDFGroup/hdf5/blob/develop/src/H5Spoint.c#L1817), so I'm content with this. I created HDFGroup/hdf5#3536 because I think this whole function is something that HDF5 can very easily expose and would save people from needing to spend effort on implementing this functionality when it already exists in the library and just isn't exposed publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that would be great improvement. In that case everyone benefit from it.
src/rest_vol_dataset.c
Outdated
"can't allocate space for hyperslab selection 'start' values"); | ||
|
||
if (H5Sget_regular_hyperslab(space_id, start, NULL, NULL, NULL) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about checking for a regular hyperslab with H5Sis_regular_hyperslab
. Alternatively, I think that H5Sget_select_bounds
may still fit this use case and should work with both regular and non-regular hyperslabs so that you don't have to worry about how to handle non-regular hyperslabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been implemented.
src/rest_vol_dataset.c
Outdated
FUNC_GOTO_ERROR(H5E_DATASPACE, H5E_BADVALUE, -1, | ||
"for point selection, computing the offset is not supported"); | ||
} /* end else if */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to handle the other selection types (all, none, error, N) just for completeness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. These cases have been added.
src/rest_vol_dataset.c
Outdated
/* Assumption: any point selection is non-contiguous in memory */ | ||
FUNC_GOTO_DONE(FALSE); | ||
} /* end else if */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to handle the other selection types (all, none, error, N) just for completeness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other cases have been added.
@jwsblokland Thanks for all your work on this PR! I think it's very close with just a few minor comments on improvements that could be made. |
- The function RV_dataspace_is_contiguous() will return a non-regular hyperslab as non-contiguous in memory. - Generalized the function RV_convert_start_to_offset() such it works for both regular and non-regular hyperslab. - For completeness handle the other selection types explicitly in the function RV_dataspace_is_contiguous() and RV_convert_start_to_offset().
7e8274d
to
f22aa9c
Compare
@jhendersonHDF Similar as the improvement I made for ROS3, it was fun implementing this improvement. Again, I got a better understanding of HDF5 but now about dataspaces. I addressed your comments. At least that is my intent. Regarding the support of non-regular hyperslabs, I think it is better to create a separate PR for it. Reading your comment about it, it sounds like this is not trivial at all. |
case H5S_SEL_ERROR: | ||
case H5S_SEL_N: | ||
case H5S_SEL_NONE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to get this PR merged, but if you could make the following changes in a future PR I'd appreciate it:
- H5S_SEL_ALL should return true for is_contiguous
- H5S_SEL_NONE should return false for is_contiguous
- H5S_SEL_ERROR and H5S_SEL_N should make this function return FAIL since it shouldn't get called with those
case H5S_SEL_NONE: | ||
default: | ||
ret_value = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about H5S_SEL_ERROR and H5S_SEL_N making this function return a failure value
- Improved the function RV_dataspace_selection_is_contiguous() and RV_convert_start_to_offset at suggested in pull request HDFGroup#58 on Github. In short, now the switch statements handle the various dataspace types correctly.
- Improved the function RV_dataspace_selection_is_contiguous() and RV_convert_start_to_offset at suggested in pull request HDFGroup#58 on Github. In short, now the switch statements handle the various dataspace types correctly.
…us() and convert_start_to_offset() (#78) * rest_vol_dataset: (fix) is_contiguous() and offset - Improved the function RV_dataspace_selection_is_contiguous() and RV_convert_start_to_offset at suggested in pull request #58 on Github. In short, now the switch statements handle the various dataspace types correctly. * General: (fix) Corrected typos - Corrected a typo in the file README.md and CMakeInstallation.cmake such that codespell does not report any errors.
This is improved version of PR #50. Related PR #57. Most likely a combination of PR #57 and #58 will result in the best solution.