Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageArray Initialization #261

Closed
wants to merge 3 commits into from
Closed

ImageArray Initialization #261

wants to merge 3 commits into from

Conversation

ieivanov
Copy link
Contributor

@ieivanov ieivanov commented Nov 26, 2024

ImageArray is designed to be initialized with a zarr array. This causes problems when using ImageArrays in parallel compute using dask, where arrays are initialized using key-value arguments, as in https://zarr.readthedocs.io/en/stable/_modules/zarr/core.html#Array. Here is a minimal example

import dask.array as da
from iohub import open_ome_zarr
from multiprocessing import Pool

data_path = path_to_data  # path to any zarr store

with open_ome_zarr(data_path) as ds:
    data = da.from_zarr(ds.data)

func = sum  # you favorite funciton

with Pool(2) as p:
    output = p.map(func, data)

This example will fail, likely because of the way dask initializes the zarr arrays in the workers. In this PR, I make the zarray argument optional and allow for additional keyword arguments.

Fixed #226

)
def __init__(self, zarray: zarr.Array = None, **kwargs):
if zarray is not None:
kwargs.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kwargs.update(
# Dynamically prepare kwargs for the base class to support Dask
kwargs.update(

Copy link
Contributor

@edyoshikun edyoshikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I would only add a sentence in the dosctring or comment why we added this (i.e supporting/compatibility with dask).

meta_array=zarray._meta_array,
)
def __init__(self, zarray: zarr.Array = None, **kwargs):
if zarray is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain when this method should be called without a argument? It is an internal wrapper, and if you don't need the wrapper, just doing Position.zgroup["0"] would give you the plain zarr.Array object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dask may expect exactly zarr.Array and not a wrapped subclass for some of their internal logic. I think it's safer to match their advertised API than hacking our own wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an example use case at the top and a short docstring to the __init__ function:

    def __init__(self, zarray: zarr.Array = None, **kwargs):
        """Keyword arguments are passed to the zarr.Array constructor.
        If a zarr.Array is provided, the constructor will use its attributes
        to initialize the ImageArray.
        """

I think this is a common enough use case, such that we should make iohub arrays work with the dask arrays in multiprocessing applications

@ziw-liu
Copy link
Collaborator

ziw-liu commented Nov 27, 2024

Another thought on API design: if the goal is to get a convenient handle of a dask array out of an OME-Zarr array, would it be better to instead add an explicit constructor like we did for ImageArray.numpy() and #242?

@ieivanov ieivanov closed this Dec 3, 2024
@ieivanov ieivanov deleted the ImageArray_init branch December 3, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: ImageArray.__init__() got an unexpected keyword argument 'store'
3 participants