-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store type of store in top level metadata ? #65
Comments
Hi Matthias,
I may be missing something in your suggestion, but the way I've thought of
it is that the type of storage is entirely hidden behind the store
interface.
So what you record in the metadata for a hierarchy or group or array should
be completely independent of what type of store is being used, where "type
of store" means something like directory store or zip store or google cloud
store.
To put it another way, in order to begin reading some zarr data, you first
need to know what type of store is being used. I.e., you can't read any of
the metadata until you know how to retrieve it.
I think there is still plenty of freedom to experiment, a store
implementation can do whatever it wants to internally, as long as it
exposes the store interface.
To provide convenience, some zarr implementations might support a URL-style
protocol where the type of store is somehow encoded in a URL-style string
(e.g., fsspec will recognise a string starting with "gs://..." as pointing
to google cloud storage). And packages like intake might support this
within data catalogs, so a user doesn't need to "know" what type of store
is being used in order to read some data. But those are all beyond the
scope of the zarr protocol I think, they are implementation considerations.
Not sure that directly answers but happy to follow up.
Cheers,
Alistair
…On Wed, 6 May 2020 at 23:55, Matthias Bussonnier ***@***.***> wrote:
I'm not 100% sure this is a spec question, or an implementation one, and
it is mostly being driven by this morning questions on the community call.
Right now it looks to me that users need to specify which store they want
to use, and that some guesses can be done depending on extensions (
normalize_store_arg ?).
Currently this prevent to deeply change or experiment with store with
similar structure without being aware of the kind of store one is working
on.
Would it be interesting to have the (top level?) metadata to have a
description of the kind of store that should be expected ?
Obviously for some of the stores it's hard, but for url-based or directory
based stores, it should be pretty easy and give some flexibility WRT change
of internal data structure, and/or bug fixes.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#65>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFLYQXHFEKJW2KHNT64ZV3RQHTHTANCNFSM4M223OVA>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health
Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
University of Oxford
Old Road Campus
Headington
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: @alimanfoo <https://twitter.com/alimanfoo>
Please feel free to resend your email and/or contact me by other means if
you need an urgent reply.
|
Thanks, Let me withdraw the "store in top level medata" and rephrase:
While for low level usage it is reasonable to specify a store/protocol, having a robust way to detect might be nice. It might be that a store has (had) a bug, and you might need to know which version of a store created a hierarchy/dataset and follow a different codepath. Even if we avoid this use case, I feel like having a given way of saying "I am a ..." would be really useful for non technical users, who just want to call "open()", or drag and drop folders/files onto GUI. Now it might not be |
Hi @Carreau, I see where you're coming from I think, and these are valid considerations. I think we just need to figure out where they should live within the zarr architecture. In the zarr architecture, a "store" is something that implements key/value operations, where keys are strings and values are arbitrary byte sequences. That is it. A "store" is completely agnostic to what is stored there. I.e., you could use a "store" to store any kind of data, not necessarily zarr data. It is just a common abstraction over a set of storage technologies, which includes file systems, cloud object stores, and key/value databases.
This probably just needs some clarification regarding exactly what we mean by "store". E.g., I use "store" to simply mean something that implements key/value operations. So you need to know what type of store it is before you can start retrieving any data from it. |
Thanks, yes that make sens, and I think we need to have better separation between the store as an API and the internal of the storage system. I think it is perfectly fine to know which store we are dealing with before opening a Zarr "connection" with this store, now can we come up with a better mechanism for discovering the type of store when those a behind a URL/Filesystem. I feel like a high level |
Inferring the type of the store from a URL-like string would seem like a reasonable approach to me, and should work for at least some store types. It could get a bit tricky in some cases. Also relevant is fsspec on URL chaining. |
For what it's worth, I've been pondering recently whether Zarr v2 couldn't be made to (optionally) detect consolidated and nesting, or at least to try one location and then fallback to the other. |
For v3 the goal is to have a clearly addressable URI (see #132), and all relevant metadata how to open the hierarchy/group/array should be stored in clearly defined metadata. This should help to avoid specifying storage details when opening an array, without needing to specify the type of store in the metadata itself. It would rather be encoded in the URI, and further store-specific settings would be part of the metadata, e.g. as storage transformers. |
I'm not 100% sure this is a spec question, or an implementation one, and it is mostly being driven by this morning questions on the community call.
Right now it looks to me that users need to specify which store they want to use, and that some guesses can be done depending on extensions (
normalize_store_arg
?).Currently this prevent to deeply change or experiment with store with similar structure without being aware of the kind of store one is working on.
Would it be interesting to have the (top level?) metadata to have a description of the kind of store that should be expected ?
Obviously for some of the stores it's hard, but for url-based or directory based stores, it should be pretty easy and give some flexibility WRT change of internal data structure, and/or bug fixes.
The text was updated successfully, but these errors were encountered: