-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reading nd arrays takes long time #32
Comments
Hi @pr4deepr I also "played" around with the delayed things to improve my experimental tools. I will modify your code a bit and then create a new branch where you can review how I incorporated you nice code into czitools |
Can you test your CZI using this branch: work_in_progress |
It is basically this code: def read_6darray(
filepath: Union[str, os.PathLike[str]], output_order: str = "STCZYX", **kwargs: int
) -> Tuple[Optional[Union[np.ndarray, da.Array]], czimd.CziMetadata, str]:
"""Read a CZI image file as 6D numpy or dask array.
Important: Currently supported are only scenes with equal size and CZIs with consistent pixel types.
Args:
filepath (str | Path): filepath for the CZI image
output_order (str, optional): Order of dimensions for the output array. Defaults to "STCZYX".
kwargs (int, optional): Allowed kwargs are S, T, Z, C and values must be >=0 (zero-based).
Will be used to read only a substack from the array
Returns:
Tuple[array6d, mdata, dim_string]: output as 6D dask array, metadata and dimstring
"""
if isinstance(filepath, Path):
# convert to string
filepath = str(filepath)
# get the complete metadata at once as one big class
mdata = czimd.CziMetadata(filepath)
if not mdata.consistent_pixeltypes:
print("Detected PixelTypes ar not consistent. Cannot create array6d")
return None, mdata, ""
if mdata.consistent_pixeltypes:
# use pixel type from first channel
use_pixeltype = mdata.npdtype[0]
valid_order = ["STCZYX", "STZCYX"]
if output_order not in valid_order:
print("Invalid dimension order 6D:", output_order)
return None, mdata, ""
if not mdata.scene_shape_is_consistent:
print("Scenes have inconsistent shape. Cannot read 6D array")
return None, mdata, ""
# open the CZI document to read the
with pyczi.open_czi(filepath) as czidoc:
if mdata.image.SizeS is not None:
# get size for a single scene using the 1st
# works only if scene shape is consistent
# use the size of the 1st scenes_bounding_rectangle
size_x = czidoc.scenes_bounding_rectangle[0].w
size_y = czidoc.scenes_bounding_rectangle[0].h
if mdata.image.SizeS is None:
# use the size of the total_bounding_rectangle
size_x = czidoc.total_bounding_rectangle.w
size_y = czidoc.total_bounding_rectangle.h
# check if dimensions are None (because they do not exist for that image)
size_c = misc.check_dimsize(mdata.image.SizeC, set2value=1)
size_z = misc.check_dimsize(mdata.image.SizeZ, set2value=1)
size_t = misc.check_dimsize(mdata.image.SizeT, set2value=1)
size_s = misc.check_dimsize(mdata.image.SizeS, set2value=1)
# check for additional **kwargs to create substacks
if kwargs is not None and mdata.image.SizeS is not None and "S" in kwargs:
size_s = kwargs["S"] + 1
mdata.image.SizeS = 1
if kwargs is not None and "T" in kwargs:
size_t = kwargs["T"] + 1
mdata.image.SizeT = 1
if kwargs is not None and "Z" in kwargs:
size_z = kwargs["Z"] + 1
mdata.image.SizeZ = 1
if kwargs is not None and "C" in kwargs:
size_c = kwargs["C"] + 1
mdata.image.SizeC = 1
if mdata.isRGB:
shape2d = (size_y, size_x, 3)
if not mdata.isRGB:
shape2d = (size_y, size_x)
remove_adim = False if mdata.isRGB else True
# initialise empty list to hold the dask arrays
img = []
for s in trange(size_s):
time_stack = []
for time in range(size_t):
ch_stack = []
for ch in trange(size_c):
z_stack = []
for z in range(size_z):
if mdata.image.SizeS is not None:
# z_slice = da.from_delayed(
# da_delayed(czidoc.read)(plane={'C': ch, 'T': time, 'Z': z}, scene=s), shape=shape2d, dtype=mdata.npdtype[0]
# )
z_slice = da.from_delayed(
read_plane(
czidoc,
s=s,
t=time,
c=ch,
z=z,
has_scenes=True,
remove_adim=remove_adim,
),
shape=shape2d,
dtype=mdata.npdtype[0],
)
if mdata.image.SizeS is None:
# z_slice = da.from_delayed(
# da_delayed(czidoc.read)(plane={'C': ch, 'T': time, 'Z': z}), shape=shape2d, dtype=mdata.npdtype[0]
# )
z_slice = da.from_delayed(
read_plane(
czidoc,
s=s,
t=time,
c=ch,
z=z,
has_scenes=False,
remove_adim=remove_adim,
),
shape=shape2d,
dtype=mdata.npdtype[0],
)
# create 2d array
z_slice = da.squeeze(z_slice)
# append to the z-stack
z_stack.append(z_slice)
# stack the array and create new dask array
z_stack = da.stack(z_stack, axis=0)
# create CZYX list of dask array
ch_stack.append(z_stack)
# create TCZYX list of dask array
time_stack.append(ch_stack)
# create STCZYX list of dask array
img.append(time_stack)
# create final STCZYX dask arry
array6d = da.stack(img, axis=0)
# change the dimension order if needed
if output_order == "STZCYX":
dim_string = "STZCYXA"
array6d = np.swapaxes(array6d, 2, 3)
if output_order == "STCZYX":
dim_string = "STCZYXA"
if remove_adim:
dim_string = dim_string.replace("A", "")
return array6d, mdata, dim_string
@dask.delayed
def read_plane(
czidoc: pyczi.CziReader,
s: int = 0,
t: int = 0,
c: int = 0,
z: int = 0,
has_scenes: bool = True,
remove_adim: bool = True,
):
"""Dask delayed function to read a 2d plane from a CZI image, which has the shape
(Y, X, 1) or (Y, X, 3).
If the image is a "grayscale image the last array dimension will be removed, when
the option is ste to True.
Args:
czidoc (pyczi.CziReader): Czireader objects
s (int, optional): Scene index. Defaults to 0.
t (int, optional): Time index. Defaults to 0.
c (int, optional): Channel index. Defaults to 0.
z (int, optional): Z-Plane index. Defaults to 0.
has_scenes (bool, optional): Defines if the CZI actually contains scenes. Defaults to True.
remove_adim (bool, optional): Option to remove the last dimension of the 2D array. Defaults to True.
Returns:
dask.array: 6d dask.array with delayed reading for individual 2d planes
"""
# intialize 2d array with some values
image2d = np.zeros([10, 10], dtype=np.int16)
if has_scenes:
# read a 2d plane using the scene index
image2d = czidoc.read(plane={"T": t, "C": c, "Z": z}, scene=s)
if not has_scenes:
# reading a 2d plane in case the CZI has no scenes
image2d = czidoc.read(plane={"T": t, "C": c, "Z": z})
# remove a last "A" dimension when desired
if remove_adim:
return image2d[..., 0]
if not remove_adim:
return image2d |
For some reason the time is read as -1 and the channel as 0 for a multitimepoint and two channel czi image. This is a file split by scene after LLS imaging. I was trying to troubleshoot by running but, i get mdata value as 0 when testing this in a notebook. However, the mdata value in your function above seems to work when I debug and run read_6darray function.. |
Hi @pr4deepr the CZI from the LLS7 - how did you split scenes? Did you use the "Spli Scenes" already during acquisition? If yes, this might be the reason since this function creates individual CZI plus a "satellite file", where the individual CZIs have strange metadata. I have not tested my code with those since I do not really like or recommend this function. Rather use "Spit Scenes Write Files" after acquisition is finished. This takes longer but should at least have correct metadata. Can you somehow share this file? And is it OK for you, when I add the code above to this package since a major part is based on your post? |
yea, feel free to add the code in the package. An acknowledgment in the release would be enough... With LLS7, I was told we use ""Split Scenes" already during acquisition option for now.. We've had issues with metadata being incorrect pertaining to the scenes.. I'll see if we can share any of these files. Do you have a Zeiss dropbox or storage link where we can drop some data? |
We do not have dropbox at ZEISS. But if you share the link anyway I will find a way to download the file. |
Hi @pr4deepr I incorporated you changes and release czitools 0.4.1 with the delayed reading. I also create a 100GB test CZI stack and tried to open it in napari. But what I observed was that my RAM (128GB) was filling up over time completely until I ran into strange error messages. What I do not understand is what is actually filling up the memory since this should be delayed reading. But Napari never even reached the state where to display any data. I used this code to test it: from czitools import read_tools
from czitools import napari_tools
from czitools import misc_tools
import napari
from pathlib import Path
# adapt to your needs
defaultdir = Path(Path(__file__).resolve().parents[2]) / "data"
# open s simple dialog to select a CZI file
filepath = misc_tools.openfile(directory=defaultdir,
title="Open CZI Image File",
ftypename="CZI Files",
extension="*.czi")
# return a array with dimension order STZCYX(A)
array6d, mdata, dim_string6d = read_tools.read_6darray(filepath,
output_order="STCZYX",
use_dask=True,
chunk_zyx=False,
# T=0,
# Z=0
)
# show array inside napari viewer
viewer = napari.Viewer()
layers = napari_tools.show(viewer, array6d, mdata,
dim_string=dim_string6d,
blending="additive",
contrast='from_czi',
gamma=0.85,
add_mdtable=True,
name_sliders=True)
napari.run() Do you have any idea what this issue could be? |
It's possible it's something to do with how napari handles dask cache. What happens if you turn it on and off or reconfigure the cache size? |
It works much faster than before after the changes. But feel free to reopen if needed. |
Sorry @sebi06 , been quite busy.. |
No it seems to work now "good enough". Thx for asking. I think it was a 100GB dataset. |
Your RAM usage is 100 GB above. Looks like ~ entire dataset has been loaded into memory? |
I did not check yet, but do you have an idea why the code should load the entire dataset into memory? |
Hi @pr4deepr I checked with a smaller file and it seems that the memory "increase" occurs inside So creating the "final" stack actual leads to everything being loaded into memory? Does that make sense (I am not a dask expert)? What I do not understand is why the memory only seems to increase in line 210 but not in other lines using the same command. |
I'll have a look into this. Need to see if specifying chunk sizes explicitly earlier may help.. BTW, how do you profile memory for each line like that? |
HI @pr4deepr First you need to do: from memory_profiler import profile
# instantiating the decorator
@profile
# code for which memory has to
# be monitored
def read_6darray(
filepath: Union[str, os.PathLike[str]],
output_order: str = "STCZYX",
use_dask: bool = False,
chunk_zyx=False,
**kwargs: int
) -> Tuple[Optional[Union[np.ndarray, da.Array]], czimd.CziMetadata, str]:
... |
I searched on the web and found this: dask/dask#4448 but I am not sure if that something to do with what I see. And this one might also help: |
Hi Sebastian
Thanks for making this resource available and maintaining it.
I am trying to read nd arrays from large datasets with pylibczirw_tools.read_5darray,. I set this flag
output_dask=True
, so it doesn't read in the entire dataset. However, it takes a very long time...Is this the right way to do it? It takes more than 1 hour..
However, if I use the following code to initialize a dask array it takes ~5 minutes for a 255GB dataset
The text was updated successfully, but these errors were encountered: