Replies: 2 comments 4 replies
-
Thanks for starting this discussion. I believe you've captured all the options that currently exist. These range from "safe and slow" ( Thanks for bringing up the point about compression complicating these options. For your example: with asdf.open('myfile.asdf', copy_arrays=True) as af:
a = af['arr1'][:] The array would be decompressed and copied to memory when any element of with asdf.open('myfile.asdf') as af:
a = af['arr1'].copy() has the same result. with asdf.open('myfile.asdf', lazy_load=False, copy_arrays=True) as af:
a = af['arr1'] avoids the extra copy. However as you noted this loads every array in the file into memory during Keeping the file open can avoid the extra copy. with asdf.open('myfile.asdf') as af:
a = af['arr1']
# do stuff with a, no extra copy is needed However scope of def get_arr():
with asdf.open('myfile.asdf') as af:
return af['arr1'] # note that af falls out of scope
a = get_arr()
print(a[0]) # error! the above example will fail with a |
Beta Was this translation helpful? Give feedback.
-
Do all options of explicit loading give (Dealing with ketozhang/asdf-pydantic#13) |
Beta Was this translation helpful? Give feedback.
-
In a project I work on, I often want to read one value or array out of an ASDF file and then close the file. My first instinct is to write something like:
But then I remember that the arrays are lazy-loaded (and mem-mapped), so I use
[:]
to force a non-lazy load andcopy_arrays=True
to ensure this isn't just creating a mem-mapped view that will become invalid whenaf
is closed:I could simplify this by using
.copy()
instead of[:]
and removingcopy_arrays=True
:Except I sometimes work with compressed data (which can't be mem-mapped), so I think this would first decompress the array, then create a copy of it (correct me if I'm wrong here!), and I'd rather not eat the performance/peak memory penalty.
Furthermore, when reading my collaborator's code, I'm not always sure what's an array and what's not. So if I see a pattern like:
I'm not always sure if it's a latent bug, or valid code (if it's referring to, e.g., an int or other non-array data).
So my question is: is there a pattern I can use that will ensure array data is immediately loaded (and not memory-mapped) on first reference? I can't always do
lazy_load=False, copy_arrays=True
, because I don't always want all arrays to be loaded—just referenced ones. How do others approach this problem?Thanks!
Beta Was this translation helpful? Give feedback.
All reactions