-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading weights from the HDF5 is slow #123
Comments
This is related to #37 , but not entirely. |
Can you post a specific snippet just so we can separate the different pieces that might be slow. I suspect there is multiple bottlenecks, one being #37. The other problem which this issue addresses, is that the weights (or any field really) are appended to the datastructures as the simulation proceeds and we don't really know how much to pre-allocate ahead of time (usually). Under the hood every dataset/array in HDF5 is chunked into smaller arrays and storage gets allocated a chunk at a time even if you don't use the whole chunk. Having a single weights group Solutions: First we add support for specifying the chunk sizes for each field when creating a new HDF5 file. Then with that we can have the following strategies (not mutually exclusive):
I was thinking in general that we need a CLI tool for merging HDF5s, extracting data, listing info, etc. and this kind of tool should fit in nicely to that. |
This is pretty high on priorities for me as well and the pre-allocation solution probably wouldn't be too hard to implement. |
For weighted ensemble analyses it is common to require access to all of the weights at once. Currently, this takes tens of minutes to read the weights from reasonably-sized HDF5 files. In contrast, reading a newly computed observable can be done in seconds.
Could this potentially be helped by arranging them all in their own folder? E.g. ['runs/0/weights'] Or perhaps there is another way of ensuring that the weights are written to some contiguous region of the disk?
The text was updated successfully, but these errors were encountered: