Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rest_vol_dataset: (fix) Write hyperslab (memory) #50

Closed

Conversation

jwsblokland
Copy link
Contributor

  • Improved the function RV_dataset_write() such it makes use of the function H5Dgather() to gather the data needed for writing a hyperslab (memory) to a file. This approach is similar to the one found in the function RV_dataset_read().
  • Adapted the hyperslab and point selection test.

@mattjala
Copy link
Collaborator

The CI failures are due to an edge case in random datatype testing trying to create a zero-length string, not an issue with this PR.

Otherwise, LGTM.

mattjala
mattjala previously approved these changes Aug 24, 2023
@@ -829,6 +829,13 @@ RV_dataset_write(size_t count, void *dset[], hid_t mem_type_id[], hid_t mem_spac
FUNC_GOTO_ERROR(H5E_DATATYPE, H5E_BADVALUE, FAIL, "memory datatype is invalid");

write_body_len = (size_t)file_select_npoints * dtype_size;
if (NULL == (write_body = (char *)RV_malloc(write_body_len)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth only allocating this buffer and calling H5Dgather if the memory space wasn't set to H5S_ALL and has a non-contiguous selection in it. Otherwise, the H5Dgather call could potentially add overhead for no reason. Checking for H5S_ALL should be straightforward, but detecting a non-contiguous selection may take a little effort in the way of H5S calls.

I also noticed that the code below for point selections allocates the same write_body buffer. If I'm not mistaken, it looks like writing with a point selection could cause the memory allocated here to be leaked.

Copy link
Collaborator

@mattjala mattjala Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like point selection does cause a memory leak. From test_rest_vol:

Testing small write to dataset w/ point selection                         
==309263== Invalid read of size 2
==309263==    at 0x48529E0: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==309263==    by 0x497F43D: H5D__gather_mem (H5Dscatgath.c:417)
==309263==    by 0x496BB55: H5Dgather (H5D.c:1756)
==309263==    by 0x48F44E2: RV_dataset_write (rest_vol_dataset.c:835)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263==  Address 0x777fcf0 is 32 bytes before a block of size 48 in arena "client"
==309263== 
==309263== Invalid read of size 1
==309263==    at 0x48528F1: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==309263==    by 0x497F43D: H5D__gather_mem (H5Dscatgath.c:417)
==309263==    by 0x496BB55: H5Dgather (H5D.c:1756)
==309263==    by 0x48F44E2: RV_dataset_write (rest_vol_dataset.c:835)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263==  Address 0x777feac is 28 bytes after a block of size 48 in arena "client"
==309263== 
==309263== Use of uninitialised value of size 8
==309263==    at 0x48C8218: RV_base64_encode (rest_vol.c:3100)
==309263==    by 0x48F4A30: RV_dataset_write (rest_vol_dataset.c:900)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263== 
==309263== Use of uninitialised value of size 8
==309263==    at 0x48C8298: RV_base64_encode (rest_vol.c:3105)
==309263==    by 0x48F4A30: RV_dataset_write (rest_vol_dataset.c:900)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263== 
==309263== Use of uninitialised value of size 8
==309263==    at 0x48C82DB: RV_base64_encode (rest_vol.c:3111)
==309263==    by 0x48F4A30: RV_dataset_write (rest_vol_dataset.c:900)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263== 
==309263== Use of uninitialised value of size 8
==309263==    at 0x48C8204: RV_base64_encode (rest_vol.c:3099)
==309263==    by 0x48F4A30: RV_dataset_write (rest_vol_dataset.c:900)
==309263==    by 0x4B1E977: H5VL__dataset_write (H5VLcallback.c:2236)
==309263==    by 0x4B24335: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309263==    by 0x496767B: H5D__write_api_common (H5D.c:1331)
==309263==    by 0x496A71F: H5Dwrite (H5D.c:1388)
==309263==    by 0x132BFE: test_write_dataset_small_point_selection (test_rest_vol.c:7535)
==309263==    by 0x10F869: main (test_rest_vol.c:18250)
==309263== 

And the vol-tests:

==309052== 40 bytes in 1 blocks are definitely lost in loss record 2 of 7
==309052==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==309052==    by 0x541B4B2: ???
==309052==    by 0x4B2EBA7: H5VL__dataset_write (H5VLcallback.c:2236)
==309052==    by 0x4B34565: H5VL_dataset_write_direct (H5VLcallback.c:2280)
==309052==    by 0x4906D3B: H5D__write_api_common (H5D.c:1331)
==309052==    by 0x4909DDF: H5Dwrite (H5D.c:1388)
==309052==    by 0x149AF1: test_write_dataset_small_point_selection (vol_dataset_test.c:5758)
==309052==    by 0x161B98: vol_dataset_test (vol_dataset_test.c:11257)
==309052==    by 0x10FC1E: vol_test_run (vol_test.c:118)
==309052==    by 0x10FC1E: main (vol_test.c:279)

Copy link
Contributor Author

@jwsblokland jwsblokland Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhendersonHDF and @mattjala, thanks for the review and reminding me that I can use valgrind.

The mentioned memory leak of write_body has been address and also the strange behavior of test_write_dataset_small_point_selection test. For the test, it turns out I forgot the modify the H5Dwrite() call by including a memory dataspace.

The only thing left to implement is checking for H5S_ALL and non-contiguous selection. Currently, if H5S_ALL == mem_space_id and H5S_ALL != mem_space_id the selection of the file_space_id will be copied to mem_space_id. Is this the intended behavior. I assume it but I ask to make sure. This is not what I would naively expect. With this behavior, I think, you implicitly assume that the memory dataspace is exactly the same as the file dataspace.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you made a typo - did you mean to ask about the case where H5S_ALL == mem_space_id and H5S_ALL != file_space_id? If so, I believe that copying the selection from file space is the intended behavior of the library - see the second row of the mem_space_id/file_space_id table in the documentation for H5Dwrite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are right. Thanks for pointing me to the documentation of H5Dwrite().

@mattjala mattjala self-requested a review August 24, 2023 20:40
@mattjala mattjala dismissed their stale review August 24, 2023 20:43

Memory leak on point selection

- Improved the function RV_dataset_write() such it makes use
  of the function H5Dgather() to gather the data needed
  for writing a hyperslab (memory) to a file. This approach
  is similar to the one found in the function RV_dataset_read().
- Adapted the hyperslab and point selection test.
@mattjala
Copy link
Collaborator

Moved to #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants