Memory usage #17

fjorka · 2021-10-27T14:42:54Z

nd2 version: 0.1.4
Python version: 3.7.10
Operating System: Windows10

Description

I try to load selected parts of nd2 files but too much memory is allocated for the objects that need to be computed. As a consequence, it fails to load objects that are bigger than ~4 times available memory.

What I Did

Test on a time lapse experiment:

Test on a single time point big image:

In the second example the memory allocation is correct when it has to compute the whole file.

It may be related to the problem of calculating object size incorrectly as shown here:

tlambert03 · 2021-11-12T21:15:34Z

Hi @fjorka

let's first explore our memory profiling options. I just created a script and ran using memory_profiler.

# script.py
import nd2
from memory_profiler import profile
import numpy as np

@profile
def main():
    f = nd2.ND2File("big.nd2")
    x = f.to_xarray()

    # instead of for loop... easier to see effect of each line in report
    a = x.isel(C=0, Z=0, T=np.arange(0, 10)).compute()
    b = x.isel(C=0, Z=0, T=np.arange(10, 20)).compute()
    c = x.isel(C=0, Z=0, T=np.arange(20, 30)).compute()
    d = x.isel(C=0, Z=0, T=np.arange(30, 40)).compute()
    e = x.isel(C=0, Z=0, T=np.arange(40, 50)).compute()
    f = x.isel(C=0, Z=0, T=np.arange(50, 60)).compute()
    g = x.isel(C=0, Z=0, T=np.arange(60, 70)).compute()
    h = x.isel(C=0, Z=0, T=np.arange(70, 80)).compute()
    i = x.isel(C=0, Z=0, T=np.arange(80, 90)).compute()
    j = x.isel(C=0, Z=0, T=np.arange(90, 100)).compute()


if __name__ == "__main__":
    main()

then run with python script.py

I get the following output

Filename: /Users/talley/Dropbox (HMS)/Python/nd2/script.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     6     51.2 MiB     51.2 MiB           1   @profile
     7                                         def main():
     8     54.0 MiB      2.8 MiB           2       f = nd2.ND2File("big.nd2")
     9
    11    116.8 MiB     62.8 MiB           1       x = f.to_xarray()
    12    150.1 MiB     33.3 MiB           1       a = x.isel(C=0, Z=0, T=np.arange(0, 10)).compute()
    13    177.6 MiB     27.5 MiB           1       b = x.isel(C=0, Z=0, T=np.arange(10, 20)).compute()
    14    204.5 MiB     26.8 MiB           1       c = x.isel(C=0, Z=0, T=np.arange(20, 30)).compute()
    15    230.5 MiB     26.1 MiB           1       d = x.isel(C=0, Z=0, T=np.arange(30, 40)).compute()
    16    257.6 MiB     27.1 MiB           1       e = x.isel(C=0, Z=0, T=np.arange(40, 50)).compute()
    17    283.7 MiB     26.1 MiB           1       f = x.isel(C=0, Z=0, T=np.arange(50, 60)).compute()
    18    309.2 MiB     25.5 MiB           1       g = x.isel(C=0, Z=0, T=np.arange(60, 70)).compute()
    19    334.3 MiB     25.1 MiB           1       h = x.isel(C=0, Z=0, T=np.arange(70, 80)).compute()
    20    359.3 MiB     25.0 MiB           1       i = x.isel(C=0, Z=0, T=np.arange(80, 90)).compute()
    21    384.9 MiB     25.6 MiB           1       j = x.isel(C=0, Z=0, T=np.arange(90, 100)).compute()

Though the file is 15GB, it looks to be allocating about what I'd expect for each chunk.
Can you try this with your file? (just want to rule out that psutil is giving something funny).

If you get something dramatically different with your file, I might want to play with it? 😬... i know it's a lot to ask, but let me know if you can share it somehow (dropbox, etc...)

fjorka · 2021-11-15T18:35:39Z

Hi @tlambert03
Unfortunately, it seems the same when I profile with memory_profiler. For example:

nd2_file = r'DBSP12D20#1_20X.nd2'
file_path = os.path.join(nd2_dir,nd2_file)

@profile
def main():
    f = nd2.ND2File(file_path)
    x = f.to_xarray()

    a = x.isel(C=0).compute()
    print(f'object size: {a.nbytes/1e9} GB')

if __name__ == "__main__":
    main()

Gives the profile of:

object size: 0.807561216 GB
Filename: D:\BARC\nd2_memory\memory_test_slide.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    10     34.4 MiB     34.4 MiB           1   @profile
    11                                         def main():
    12     35.6 MiB      1.2 MiB           1       f = nd2.ND2File(file_path)
    13    101.9 MiB     66.3 MiB           1       x = f.to_xarray()
    14
    15                                             # instead of for loop... easier to see effect of each line in report
    16   2412.7 MiB   2310.8 MiB           1       a = x.isel(C=0).compute()
    17   2412.7 MiB      0.0 MiB           1       print(f'object size: {a.nbytes/1e9} GB')

I shared with you the file from the above example. The one from the previous example is ~0.5TB (multi position time-lapse) but I can figure out sharing it too if you would like to work with it.

tlambert03 · 2021-11-15T19:38:29Z

thanks! I downloaded it.

You know... one thing that is probably important to mention here, which I should have thought of earlier... is that nd2 files are not (natively) chunked along the channel axis. So when you load 1 channel for a given timepoint, you load them all.

you should be able be able to save memory by only loading a Z, or T subset... but chunking in channels will require some additional functionality that isn't natively supported by the nd2 format. (still possible).

one additional observation: try leaving xarray out of the loop. Use just f.asarray() or f.to_dask(). For dask, you can then subchunk using indexing:

print(f.sizes)  # see axes in order
d = f.to_dask()
d[0, 0].compute()  # just get the first index along the first two dimensions

...and remember that if any of those dimensions are XY or C, it won't save memory (until that's added)

fjorka · 2021-11-23T20:26:03Z

Thanks for the explanation @tlambert03!

I re-wrote the code to actually load only single time points and arrange them later.
The loop looks as follows:

C_list = [1,2,3]
P_list = [0,1,2]
T = np.arange(288)

im_nd2_reader = nd2.ND2File(file_path) # expected shape (T,P,C,X,Y) - (577, 15, 4, 2765, 2765) 
im_nd2_dask = im_nd2_reader.to_dask()

for P in P_list:
    for C in C_list:

        # create empty container
        im = np.empty(shape=[len(T),im_nd2_reader.shape[3],im_nd2_reader.shape[4]],dtype='uint16')

        for ind in T:

            frame = im_nd2_dask[ind,P,C,:,:].compute()
            im[ind,:,:] = frame
            
        # save im

A single 'im' is around 4GB but this loop takes ~18-24GB of RAM to execute (never less than 18GB after initial loading). In my mind, it should never open more than a single time point and in general require around 4GB of RAM. Do you have any insights about what I can do better here.

elgw · 2022-11-28T16:20:06Z

you should be able be able to save memory by only loading a Z, or T subset... but chunking in channels will require some additional functionality that isn't natively supported by the nd2 format. (still possible).

Something in lines of that would be a nice addition to this library.

It is possible to read just one xy-plane at a time from the nd2 file and discard the data for the irrelevant channels, the downside is that all data has to be re-read for each channel (not sure how it works with time series). To reduce RAM usage even more the data could be streamed to disk as it is read; but that might be out of scope here.

tlambert03 · 2022-11-28T21:53:00Z

Something in lines of that would be a nice addition to this library.

Thanks @elgw, this feature is being tracked at #85

tlambert03 added the bug Something isn't working label Nov 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage #17

Memory usage #17

fjorka commented Oct 27, 2021

tlambert03 commented Nov 12, 2021

fjorka commented Nov 15, 2021

tlambert03 commented Nov 15, 2021

fjorka commented Nov 23, 2021

elgw commented Nov 28, 2022

tlambert03 commented Nov 28, 2022

Memory usage #17

Memory usage #17

Comments

fjorka commented Oct 27, 2021

Description

What I Did

tlambert03 commented Nov 12, 2021

fjorka commented Nov 15, 2021

tlambert03 commented Nov 15, 2021

fjorka commented Nov 23, 2021

elgw commented Nov 28, 2022

tlambert03 commented Nov 28, 2022