Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested chunk storage #88

Closed
wants to merge 2 commits into from

Conversation

will-moore
Copy link

This adds support for nested zarr stores, where the chunks are in nested directories.

Tested locally in the browser with data generated from latest bioformats2raw 0.3.0rc1 release with and without the --nested flag.

@manzt
Copy link
Collaborator

manzt commented Mar 26, 2021

Hi will, thanks for opening a PR! Last time I checked in zarr-python, key_separator was added to deal with a nested directory at the store level (via FSStore). Do you know if there is a plan to add nested formally to the array metadata spec? I don't see it listed here: https://zarr.readthedocs.io/en/stable/spec/v2.html#metadata

@joshmoore
Copy link

It's not yet. I opened zarr-developers/zarr-python#707 to discuss.

@manzt
Copy link
Collaborator

manzt commented Mar 26, 2021

Ah, thanks for clarifying Josh. Hmm. I see two options. Ideally as shown in this PR, ZarrArray could use a new array metadata to determine what key_separator to use automatically. However, it seems that this metadata field isn't fully decided yet, and I worry about running into the case where we need to check for both nested and "<future_key>" in the future.

Another option, similar to the current solution in zarr-python, is to make the end user explicitly read the store using a different key separator. This could be accomplished most simply using a es6 Proxy:

function nested(store: Store): Store {
    const get = (target: Store, key: any) => {
      if (key === 'getItem' || key === 'setItem' || key === 'containsItem') {
        return (path, ...args) {
          if (path.endsWith('.zarray') || path.endsWith('.zattrs') || path.endsWith('.zgroup')) {
            return target[key](path, ...args);
          }
          const [...prefix, chunkKey] = path.split('/');
          const newPath = [...prefix, chunkKey.replaceAll('.', '/')].join('/')
          return target[key](newPath, ...args);
        }
      }
      return Reflect.get(target, key);
    };
    return new Proxy(store, { get });
}
let store = new HTTPStore('http://localhost:8080/data.zarr');
store = nested(store);
store.getItem('path/to/chunk/0.0'); // --> store.getItem('path/to/chunk/0/0');

The proxy acts as a key-transformation layer between ZarrArray and the Store. The pro here is that we don't need to touch the internals of zarr.js until something is decided, but the con is that it's a very round-about way of getting the same outcome and a library user needs to determine when the nested store is needed.

@joshmoore
Copy link

I definitely understand the hesitation of going adhoc on the metadata name. For what it's worth, we'll be implementing the "automate the user-choice based on the OME-Zarr metadata" (i.e. the version), likely next week.

@manzt
Copy link
Collaborator

manzt commented Mar 26, 2021

That's what I was curious about. I'd like to avoid zarr.js reading/writing adhoc .zarray metadata if possible, and rather 1.) expose an API for a user to explicitly provide a key separator, or 2.) not touch any of zarr.js internally and use a store Proxy (above). In both cases, the user has to explicitly check for some user-defined metadata and determine whether a different separator is needed (in our case, this would be checking the OME-Zarr version).

For option 1.), I could see an API like: ZarrArray.unstable_setKeySeparator('/'). I'm slightly more inclined towards Option 2 until something is decided on formally, since it's a solution that works without changing zarr.js currently and doesn't require exposing a new API or implicitly relying on adhoc metadata.

Curious what you think, @gzuidhof

@joshmoore
Copy link

Certainly happy to have this, e.g., at the vizarr level. Just a heads up that we'll be starting to open PRs across the various libraries to assume 0.2 == / as key (or rather, dimension) separator per ome/ngff#40

@will-moore
Copy link
Author

@manzt Thanks for that. I'll close this for now and have a look at using your code sample in vizarr. Cheers.

@will-moore will-moore closed this Mar 29, 2021
will-moore added a commit to will-moore/vizarr that referenced this pull request Mar 30, 2021
Code adapted from gzuidhof/zarr.js#88 (comment)
No Typescript types yet
manzt pushed a commit to hms-dbmi/vizarr that referenced this pull request Apr 20, 2021
* Use Proxy store for nested chunks

Code adapted from gzuidhof/zarr.js#88 (comment)
No Typescript types yet

* Only use nested(store) for correct version of OME-Zarr

* format fixes

* Move nested handling to io.ts

* Revert changes to ome.ts, removing isNested duplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants