Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is path required for opening Zarr v3 groups? #1039

Closed
shoyer opened this issue May 25, 2022 · 5 comments · Fixed by #1085
Closed

Why is path required for opening Zarr v3 groups? #1039

shoyer opened this issue May 25, 2022 · 5 comments · Fixed by #1085

Comments

@shoyer
Copy link
Contributor

shoyer commented May 25, 2022

With Zarr v2, I can open a group by passing either a valid Zarr store or with a path specified as a string, i.e., like zarr.open_group(store_or_path). As I understand it, paths get normalized into store objects, e.g., to a local filesystem or via fsspec.

With Zarr v3, as currently implemented, the path argument is apparently now required, per pydata/xarray#6475. This feels like a small step backwards in terms of usability. I'm wondering if I'm missing some broader context here? Maybe some examples of how users would canonically create a group, add an array and then access the data in the new v3 API would be helpful.

@joshmoore
Copy link
Member

The short version is that previously all (meta)data needed to open a group was at one location but now there is root metadata and the metadata tree is separated from the data tree. The result is that it's much more like opening a Zip file: "open foo.zip and then load /some/file"

cc: @grlee77 for a more complete backstory.

@jbms has proposed notation of some form to simplify the usage. Taking the zip example, foo.zip#/some/file or foo.zip//some/file.

@shoyer
Copy link
Contributor Author

shoyer commented Jun 22, 2022

It seems like the v3 spec has support for a "root" group or array:
https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html#storage

Could we simply make that the default for zarr.open_group? I.e., group='/'?

I would be much happier with this sort of default in Zarr rather than in Xarray. We already have one too many domain specific extensions to Zarr in Xarray!

@jbms
Copy link

jbms commented Jun 22, 2022

Making the root group the default seems like a reasonable choice, but I think it would be nice to more generally be able to specify a zarr group or array to open with just a single string.

On that broader point we could continue the discussion here:
zarr-developers/zarr-specs#132

@grlee77
Copy link
Contributor

grlee77 commented Jun 29, 2022

Could we simply make that the default for zarr.open_group? I.e., group='/'?

I did overlook this following statement about root.group.json or root.array.json and agree that we should add that to the v3 support here. I will try to add that in the coming week or so.

If the root node is a group, the metadata key is “meta/root.group.json”. If the root node is an array, the metadata key is “meta/root.array.json”, and the data keys are formed by concatenating “data/root/” and the chunk identifier.

The zarrita and xtensor-zarr implementations implement a Hierarchy class that represents the root and can be opened by just giving the directory name. This hierarchy is just a collection of nodes, where each node can be an array or group. So, the hierarchy object can be opened without specifying any particular path as it represents the root of the zarr store.

The concept of a hierarchy is discussed in the spec, although specific definition of methods present on a hierarchy are not given. I am aware of the following two Hierarchy class implementations: (zarrita Hierarchy definition and xtensor-zarr Hierarchy definition). There is a lot of similarity to the Group class itself in having methods to create arrays or groups, and it was not clear to me if the standard requires presence of a Hierarchy class independent of Group.

@grlee77
Copy link
Contributor

grlee77 commented Jul 18, 2022

quick update: I have a PR nearly done for root array/group support in v3. I just need to take a look again later today at one failing test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants