Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory leak by calling H5close() in NDFileHDF5::closeFile() #390

Merged
merged 4 commits into from
Mar 28, 2019
Merged

Conversation

hinxx
Copy link
Contributor

@hinxx hinxx commented Mar 28, 2019

This a change to fix memory leak in the HDF5 plugin, see #385.

Warning!

It appears that calling H5close() after H5Fclose() plugs the hole, but the core source of the the leak might still be at large!

@hinxx
Copy link
Contributor Author

hinxx commented Mar 28, 2019

Last change avoids print the error message if the object count is 1 at the time/place of the check. Assumption is that the file object is closed in the next lines.

There is no leak, after this change, as expected.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
hinxx    30325 15.1  0.3 4459868 57460 pts/6   Sl+  11:58   0:11 /opt/bde/R3.15.5/artifacts/simapp-master/bin/linux-x86_64/iocApp st.cmd
hinxx    30325 15.6  0.3 4459868 60628 pts/6   Sl+  11:58   0:22 /opt/bde/R3.15.5/artifacts/simapp-master/bin/linux-x86_64/iocApp st.cmd
hinxx    30325 15.6  0.3 4459868 63796 pts/6   Sl+  11:58   0:33 /opt/bde/R3.15.5/artifacts/simapp-master/bin/linux-x86_64/iocApp st.cmd
hinxx    30325 15.6  0.4 4459868 66964 pts/6   Sl+  11:58   0:44 /opt/bde/R3.15.5/artifacts/simapp-master/bin/linux-x86_64/iocApp st.cmd
hinxx    30325 15.7  0.4 4459868 70136 pts/6   Sl+  11:58   0:55 /opt/bde/R3.15.5/artifacts/simapp-master/bin/linux-x86_64/iocApp st.cmd

@coveralls
Copy link

coveralls commented Mar 28, 2019

Coverage Status

Coverage increased (+0.02%) to 40.544% when pulling 4cfb14b on hinxx:master into 0ad060b on areaDetector:master.

Hinko Kocevar added 2 commits March 28, 2019 17:17
==21315== 4,000 bytes in 1,000 blocks are definitely lost in loss record 6,503 of 6,903
==21315==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21315==    by 0x510CE7E: NDFileHDF5::createAttributeDataset(NDArray*) (NDFileHDF5.cpp:2652)
==21315==    by 0x510E7C9: NDFileHDF5::openFile(char const*, int, NDArray*) (NDFileHDF5.cpp:282)
==21315==    by 0x50FDEF5: NDPluginFile::openFileBase(int, NDArray*) (NDPluginFile.cpp:73)
..
@MarkRivers MarkRivers merged commit 79909f5 into areaDetector:master Mar 28, 2019
@MarkRivers
Copy link
Member

Resolves #385. Callling H5close eliminates memory leak. Not entirely clear if this is a leak internal to HDF5 library or something we are not closing correctly.

@ulrikpedersen
Copy link
Member

Callling H5close eliminates memory leak. Not entirely clear if this is a leak internal to HDF5 library or something we are not closing correctly.

Exactly. It's could be that we're not closing some hdf5 objects - but the object count at the end is just 1 (the file) so not clear where that memory is leaked...

@hinxx
Copy link
Contributor Author

hinxx commented Mar 29, 2019

Any thoughts on the performance impact this call to H5close() makes?

@xiaoqiangwang
Copy link
Contributor

I was worrying about calling H5close might have huge impact. But I perceive no changes by looking at the RunTime filed.

@hinxx
Copy link
Contributor Author

hinxx commented Mar 30, 2019

the object count at the end is just 1 (the file) so not clear where that memory is leaked...

I've added the following lines to NDFileHDF5.cpp, before the call to H5Fget_obj_count(this->file, H5F_OBJ_ALL), with #390 fix applied:

  obj_count = (int)H5Fget_obj_count(this->file, H5F_OBJ_FILE);
  if (obj_count > 0){
    fprintf(stderr,
              "%s::%s Closing file not totally clean.  Files remaining=%d\n",
              driverName, functionName, obj_count);
  }
  obj_count = (int)H5Fget_obj_count(this->file, H5F_OBJ_FILE|H5F_OBJ_LOCAL);
  if (obj_count > 0){
    fprintf(stderr,
              "%s::%s Closing file not totally clean.  Local files remaining=%d\n",
              driverName, functionName, obj_count);
  }

I get these lines for each iteration of of the 1000 images generated/saved:

NDFileHDF5::closeFile Closing file not totally clean.  Files remaining=1
NDFileHDF5::closeFile Closing file not totally clean.  Local files remaining=1
NDFileHDF5::closeFile Closing file not totally clean.  All remaining=1

NDFileHDF5::closeFile Closing file not totally clean.  Files remaining=1
NDFileHDF5::closeFile Closing file not totally clean.  Local files remaining=1
NDFileHDF5::closeFile Closing file not totally clean.  All remaining=1
...

As @ulrikpedersen already suspected:

and that's not an error condition because it is the file object itself. Maybe.

I think this maybe has been answered.

@hinxx
Copy link
Contributor Author

hinxx commented Mar 30, 2019

Why is H5Fclose() not closing the file and cleaning up as it should?

After all, the H5close() is considered as 'close the complete HDF5 library and free the resources', supposed to be used at app exit according to the docs (or for explicit library termination).

Looking at the docs for H5Fclose():

Delayed close:
Note the following deviation from the above-described behavior. If H5Fclose is called for a file but one or more objects within the file remain open, those objects will remain accessible until they are individually closed. Thus, if the dataset data_sample is open when H5Fclose is called for the file containing it, data_sample will remain open and accessible (including writable) until it is explicitely closed. The file will be automatically closed once all objects in the file have been closed.

We must be hitting this case and this might make H5Fclose() not fully effective at the time of the call.

@ulrikpedersen
Copy link
Member

Delayed close:
Note the following deviation from the above-described behavior. If H5Fclose is called for a file but one or more objects within the file remain open, those objects will remain accessible until they are individually closed. Thus, if the dataset data_sample is open when H5Fclose is called for the file containing it, data_sample will remain open and accessible (including writable) until it is explicitely closed. The file will be automatically closed once all objects in the file have been closed.

We must be hitting this case and this might make H5Fclose() not fully effective at the time of the call.

I don't think that's the case: we actually do set the "close degree" to "strong" which means all objects are forced close when the file is closed. See #385 (comment) for links to relevant code & docs.

It still leaves a bit of a mystery as to why we are leaking memory in HDF5 (and need to H5close() to release that) while opening/closing files...

@xiaoqiangwang
Copy link
Contributor

Just a reminder that this should be considered a temporary workaround even it causes no noticeable performance penalty.

It assumes that only one NDFileHDF5 plugin in the IOC. In case of two plugins, or just another usage of HDF5 library, one of them will fail and eventually crash the IOC.

 HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 222 in H5G_loc(): invalid data ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5S.c line 431 in H5Sclose(): not a dataspace
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 222 in H5G_loc(): invalid data ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5G.c line 462 in H5Gopen2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 171 in H5G_loc(): invalid file ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5D.c line 119 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5G.c line 723 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 263 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 632 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5A.c line 1653 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5G.c line 723 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5G.c line 462 in H5Gopen2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 171 in H5G_loc(): invalid file ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5D.c line 119 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 251 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5G.c line 723 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5L.c line 532 in H5Lcreate_hard(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../H5Gloc.c line 171 in H5G_loc(): invalid file ID
    major: Invalid arguments to routine
    minor: Bad value
2019/04/03 20:23:10.376 NDFileHDF5::createHardLinks error creating hard link from: /entry/instrument/detector/data to /entry/data/data
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5D.c line 906 in H5Dset_extent(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
2019/04/03 20:23:10.376 NDFileHDF5Dataset::writeFile ERROR Increasing the size of the dataset [data] failed
2019/04/03 20:23:10.376 NDFileHDF5::writeFile ERROR: could not write to dataset. Aborting
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5P.c line 1517 in H5Pclose(): can't close
    major: Property lists
    minor: Unable to free object
  #001: ../H5I.c line 1352 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
    minor: Unable to decrement reference count
  #002: ../H5I.c line 1282 in H5I_dec_ref(): can't locate ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
2019/04/03 20:23:10.376 NDFileHDF5::writeFile ERROR: Cparms did not close cleanly.
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 123145616744448:
  #000: ../H5F.c line 769 in H5Fclose(): invalid file identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
2019/04/03 20:23:10.376 NDFileHDF5::writeFile ERROR: File did not close cleanly.
2019/04/03 20:23:10.376 NDPluginFile::writeFileBase Error writing file, status=3

@MarkRivers
Copy link
Member

I think the right way to track this problem down is to create a stand-alone program that duplicates what the NDFileHDF5 plugin is doing. We can see if it has a memory leak. If it does then we can contact the HDF5 group to see if there is an issue with their library.

@ulrikpedersen
Copy link
Member

I think the right way to track this problem down is to create a stand-alone program that duplicates what the NDFileHDF5 plugin is doing. We can see if it has a memory leak. If it does then we can contact the HDF5 group to see if there is an issue with their library.

I agree with the idea - but in practice this is quite difficult to recreate a simple application as the plugin does alot... I can start by sending them an email to explain the issue and ask if they have any clues as to what could leak - or what we could further do to debug.

@hinxx
Copy link
Contributor Author

hinxx commented Apr 7, 2019

It still leaves a bit of a mystery as to why we are leaking memory in HDF5 (and need to H5close() to release that) while opening/closing files...

Here is a new pull request #397 that removes introduced H5close() and adds H5xclose() on specific objects that were identified as memory leaks.

bsobhani pushed a commit to bsobhani/ADCore that referenced this pull request Apr 1, 2024
fix memory leak by calling H5close() in NDFileHDF5::closeFile()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants