Store type of store in top level metadata ? #65

Carreau · 2020-05-06T22:55:40Z

I'm not 100% sure this is a spec question, or an implementation one, and it is mostly being driven by this morning questions on the community call.

Right now it looks to me that users need to specify which store they want to use, and that some guesses can be done depending on extensions (normalize_store_arg ?).

Currently this prevent to deeply change or experiment with store with similar structure without being aware of the kind of store one is working on.

Would it be interesting to have the (top level?) metadata to have a description of the kind of store that should be expected ?

Obviously for some of the stores it's hard, but for url-based or directory based stores, it should be pretty easy and give some flexibility WRT change of internal data structure, and/or bug fixes.

The text was updated successfully, but these errors were encountered:

alimanfoo · 2020-05-07T12:29:23Z

Hi Matthias, I may be missing something in your suggestion, but the way I've thought of it is that the type of storage is entirely hidden behind the store interface. So what you record in the metadata for a hierarchy or group or array should be completely independent of what type of store is being used, where "type of store" means something like directory store or zip store or google cloud store. To put it another way, in order to begin reading some zarr data, you first need to know what type of store is being used. I.e., you can't read any of the metadata until you know how to retrieve it. I think there is still plenty of freedom to experiment, a store implementation can do whatever it wants to internally, as long as it exposes the store interface. To provide convenience, some zarr implementations might support a URL-style protocol where the type of store is somehow encoded in a URL-style string (e.g., fsspec will recognise a string starting with "gs://..." as pointing to google cloud storage). And packages like intake might support this within data catalogs, so a user doesn't need to "know" what type of store is being used in order to read some data. But those are all beyond the scope of the zarr protocol I think, they are implementation considerations. Not sure that directly answers but happy to follow up. Cheers, Alistair

…

On Wed, 6 May 2020 at 23:55, Matthias Bussonnier ***@***.***> wrote: I'm not 100% sure this is a spec question, or an implementation one, and it is mostly being driven by this morning questions on the community call. Right now it looks to me that users need to specify which store they want to use, and that some guesses can be done depending on extensions ( normalize_store_arg ?). Currently this prevent to deeply change or experiment with store with similar structure without being aware of the kind of store one is working on. Would it be interesting to have the (top level?) metadata to have a description of the kind of store that should be expected ? Obviously for some of the stores it's hard, but for url-based or directory based stores, it should be pretty easy and give some flexibility WRT change of internal data structure, and/or bug fixes. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#65>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFLYQXHFEKJW2KHNT64ZV3RQHTHTANCNFSM4M223OVA> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health Big Data Institute Li Ka Shing Centre for Health Information and Discovery University of Oxford Old Road Campus Headington Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 or +44 (0)7866 541624 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: @alimanfoo <https://twitter.com/alimanfoo> Please feel free to resend your email and/or contact me by other means if you need an urgent reply.

Carreau · 2020-05-07T14:57:30Z

Thanks,

Let me withdraw the "store in top level medata" and rephrase:

For really similar stores, for example, bug fixes between store versions how can a project like intake detect what kind /version of a store it is talking to ?

While for low level usage it is reasonable to specify a store/protocol, having a robust way to detect might be nice. It might be that a store has (had) a bug, and you might need to know which version of a store created a hierarchy/dataset and follow a different codepath.

Even if we avoid this use case, I feel like having a given way of saying "I am a ..." would be really useful for non technical users, who just want to call "open()", or drag and drop folders/files onto GUI.

Now it might not be .zgroup, and it might not be available for all store, but a would a convention like having a .store that have store-specific informations.

alimanfoo · 2020-05-11T16:10:00Z

Hi @Carreau, I see where you're coming from I think, and these are valid considerations. I think we just need to figure out where they should live within the zarr architecture.

In the zarr architecture, a "store" is something that implements key/value operations, where keys are strings and values are arbitrary byte sequences. That is it. A "store" is completely agnostic to what is stored there. I.e., you could use a "store" to store any kind of data, not necessarily zarr data. It is just a common abstraction over a set of storage technologies, which includes file systems, cloud object stores, and key/value databases.

Now it might not be .zgroup, and it might not be available for all store, but a would a convention like having a .store that have store-specific informations.

This probably just needs some clarification regarding exactly what we mean by "store".

E.g., I use "store" to simply mean something that implements key/value operations. So you need to know what type of store it is before you can start retrieving any data from it.

Carreau · 2020-05-11T22:00:31Z

Thanks, yes that make sens, and I think we need to have better separation between the store as an API and the internal of the storage system.

I think it is perfectly fine to know which store we are dealing with before opening a Zarr "connection" with this store, now can we come up with a better mechanism for discovering the type of store when those a behind a URL/Filesystem.

I feel like a high level zarr.open() should be able to have some extra logic to not have the user aware of the type of store, but that zarr.core.open() must be given a type of store explicitetly.

alimanfoo · 2020-05-11T22:13:12Z

Inferring the type of the store from a URL-like string would seem like a reasonable approach to me, and should work for at least some store types. It could get a bit tricky in some cases. Also relevant is fsspec on URL chaining.

joshmoore · 2020-05-12T08:11:36Z

For what it's worth, I've been pondering recently whether Zarr v2 couldn't be made to (optionally) detect consolidated and nesting, or at least to try one location and then fallback to the other.

jstriebel · 2022-11-16T17:05:27Z

For v3 the goal is to have a clearly addressable URI (see #132), and all relevant metadata how to open the hierarchy/group/array should be stored in clearly defined metadata. This should help to avoid specifying storage details when opening an array, without needing to specify the type of store in the metadata itself. It would rather be encoded in the URI, and further store-specific settings would be part of the metadata, e.g. as storage transformers.

joshmoore changed the title ~~Store type of store in top level medata ?~~ Store type of store in top level metadata ? Nov 17, 2022

joshmoore mentioned this issue Nov 17, 2022

Native metadata storage #90

Closed

jstriebel mentioned this issue Nov 24, 2022

Issue overview: From URI to open array #178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store type of store in top level metadata ? #65

Store type of store in top level metadata ? #65

Carreau commented May 6, 2020

alimanfoo commented May 7, 2020 via email

Carreau commented May 7, 2020

alimanfoo commented May 11, 2020

Carreau commented May 11, 2020

alimanfoo commented May 11, 2020

joshmoore commented May 12, 2020

jstriebel commented Nov 16, 2022

Store type of store in top level metadata ? #65

Store type of store in top level metadata ? #65

Comments

Carreau commented May 6, 2020

alimanfoo commented May 7, 2020 via email

Carreau commented May 7, 2020

alimanfoo commented May 11, 2020

Carreau commented May 11, 2020

alimanfoo commented May 11, 2020

joshmoore commented May 12, 2020

jstriebel commented Nov 16, 2022