Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask arrays are not recognized as array in is_ndarray_like #624

Open
whatnick opened this issue Oct 31, 2024 · 1 comment
Open

Dask arrays are not recognized as array in is_ndarray_like #624

whatnick opened this issue Oct 31, 2024 · 1 comment

Comments

@whatnick
Copy link

Minimal, reproducible code sample, a copy-pastable example if possible

from numcodecs.ndarray_like import is_ndarray_like
import numpy as np
import dask.array as da
np_array = np.zeros((10,10))
da_array = da.zeros((10,10))
cast_da_array = np.array(da_array)
print(is_ndarray_like(da_array))
print(is_ndarray_like(np_array))
print(is_ndarray_like(cast_da_array))
False
True
True

Problem description

Not supporting compatibility with dask arrays as being array like for especially for Zarr persistence prevents chaining large computations and saving them gradually to zarr.

Version and installation information

Please provide the following:

  • Value of numcodecs.__version__ : 0.13.1
  • Version of Python interpreter : 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
  • Operating system (Linux/Windows/Mac) : Linux
  • How NumCodecs was installed (e.g., "using pip into virtual environment", or "using conda") : using uv into virtualenv

Also, if you think it might be relevant, please provide the output from pip list or
conda list depending on which was used to install NumCodecs.

@whatnick whatnick changed the title Dask arrays are not recognized as is_ndarray_like Dask arrays are not recognized as array in is_ndarray_like Oct 31, 2024
@martindurant
Copy link
Member

dask.array is not numpy-like: it generally doesn't have data in memory. However, when you act on the array, the operation is broken up to act on its partitions, so that within a task, you see only a real numpy array. This is exactly how to_zarr works, including partition encoding via numcodecs.

Do you have an actual workflow where you have an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants