Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply more ZEP 1 feedback #171

Merged
merged 23 commits into from
Dec 1, 2022
Merged
Changes from 7 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
40da911
update questions to be resolved
jstriebel Nov 21, 2022
f889dad
chunk-grid: add note about chunks at the array border
jstriebel Nov 21, 2022
75c44ce
remove metadata_encoding key and associate metadata_key_suffix with e…
jstriebel Nov 21, 2022
5fd7e8a
link to issue #62 in varlen comment, minor formatting fix
jstriebel Nov 21, 2022
a92fe99
note about reserved names in metadata
jstriebel Nov 21, 2022
74e1667
apply feedback about paths
jstriebel Nov 21, 2022
0a60d82
remove array_ may not have been fully initialized.
jstriebel Nov 21, 2022
e16963e
Apply suggestions from code review
jstriebel Nov 22, 2022
91063be
Update docs/core/v3.0.rst
jstriebel Nov 22, 2022
2aa02f9
add note to sharding reference about ZEP2
jstriebel Nov 22, 2022
2b18d0d
clarify that paths never end with a slash
jstriebel Nov 22, 2022
d0c3594
apply ryans wording corrections
jstriebel Nov 22, 2022
1d793b1
re-add metadata_encoding as an explicit extension point, containing t…
jstriebel Nov 24, 2022
4389e12
recommend to use fill-value for outside elements in border chunks
jstriebel Nov 24, 2022
124dc0e
apply more of ryan's review
jstriebel Nov 24, 2022
e5297ae
add todo note for operations
jstriebel Nov 24, 2022
1593c2f
clarify that fallback is optional for data_type
jstriebel Nov 24, 2022
7805c62
Merge remote-tracking branch 'origin/main' into apply-more-zep1-feedback
jstriebel Nov 25, 2022
dfc501b
add changelog entry
jstriebel Nov 25, 2022
53ae507
fix changelog formatting
jstriebel Nov 25, 2022
b7423eb
Update docs/core/v3.0.rst
jstriebel Nov 28, 2022
2040c59
clarifications about race conditions with implicit groups
jstriebel Nov 28, 2022
b3e40e7
Merge branch 'main' into apply-more-zep1-feedback
jstriebel Nov 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 30 additions & 35 deletions docs/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Stability Policy
----------------

This core specification adheres to a ``MAJOR.MINOR`` version
number format. A zarr implementation provides the read and write API by
number format. A zarr implementation that provides the read and write API by
implementing this specification can be considered compatible with all
datasets following the specification with the same major version number.

Expand All @@ -169,20 +169,15 @@ For details, please see the `zarr_format`_ metadata entry.
Questions that still need to be resolved
----------------------------------------

We solicit feedback on the following area during the RFC period of this first
draft.
We solicit feedback on the following area during the review period:

- Should core metadata and user attributes be stored together or separate documents?
(See https://github.com/zarr-developers/zarr-specs/issues/72)
- extensions and ``must_understand = True`` might be too restrictive.
We propose to develop a draft implementation with extensions and
see how far we can go. A possible list of extensions to include:

- Datetime
- Named dimensions
- Awkward arrays

See https://github.com/zarr-developers/zarr-specs/issues/89 for discussion on
- We want to verify if the extension mechanisms fit different use cases or if
they are too restrictive. See
https://github.com/zarr-developers/zarr-specs/issues/89 and
https://github.com/zarr-developers/zarr-specs/issues/169 for discussion on
the topic.

- Node name case sensitivity: The node name is now case sensitive. This may
Expand All @@ -197,6 +192,7 @@ draft.

- Should named dimensions be part of the core metadata spec?
https://github.com/zarr-developers/zarr-specs/issues/73
https://github.com/zarr-developers/zarr-specs/pull/162


Document conventions
Expand Down Expand Up @@ -263,7 +259,7 @@ The following figure illustrates the first part of the terminology:
Each node in a hierarchy_ has a name, which is a string of
characters with some additional constraints defined in the section
on `node names`_ below. Two sibling nodes cannot have the same
name. The root node does not have a name.
name. The root node does not have a name and is the empty string ``""``.
rabernat marked this conversation as resolved.
Show resolved Hide resolved

.. _path:
.. _paths:
Expand All @@ -272,7 +268,7 @@ The following figure illustrates the first part of the terminology:

Each node in a hierarchy_ has a path which uniquely identifies
that node and defines its location within the hierarchy_. The path
is formed by joining together the "/" character, followed by the
is a string, formed by joining together the "/" character, followed by the
name_ of each ancestor node separated by the "/" character,
followed by the name_ of the node itself. For example, the path
"/foo/bar" identifies a node named "bar", whose parent is named
Expand Down Expand Up @@ -314,8 +310,7 @@ The following figure illustrates the first part of the terminology:
identified by a tuple of integer coordinates, one for each
dimension_ of the array_. If all dimensions_ of an array_ have
finite length, then the number of elements in the array_ is given
by the product of the dimension_ lengths. An array_ may not have
been fully initialized.
by the product of the dimension_ lengths.

.. _data type:

Expand Down Expand Up @@ -589,6 +584,7 @@ other type sizes in later versions of this specification.
arrays, one with the actual variable length data and one with fixed size
(pointer + length) to the variable size data, we do not want to commit to such
a structure.
See https://github.com/zarr-developers/zarr-specs/issues/62.


Chunk grids
Expand Down Expand Up @@ -691,6 +687,12 @@ arbitrary length in a "negative" direction along any dimension.
in which case the coordinate of a chunk is the empty tuple, and the chunk key
will consist of the string ``c``.

.. note:: Chunks at the border of an array always have the full chunk size, even when
the array only covers parts of it. For example, having and array with ``shape=(30, 30)``,
jstriebel marked this conversation as resolved.
Show resolved Hide resolved
``chunks=(16, 16)``, the chunk ``0,1`` would also contain unused values for the indices
``(0-16, 30-31)``. Since reading anything beyond the array shape is undefined behavior,
jstriebel marked this conversation as resolved.
Show resolved Hide resolved
those values may be passed on, be replaced with the fill values or other undefined behavior.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

Chunk memory layouts
====================

Expand Down Expand Up @@ -934,25 +936,21 @@ containing the following names:
version number string to help with discovery of this
specification.

``metadata_encoding``
^^^^^^^^^^^^^^^^^^^^^

A string containing the URI pointing to a document describing the method
used for encoding group and array metadata documents.

For document using the default JSON encoding and format describe in this document
then the value must be ``"https://purl.org/zarr/spec/core/3.0``.

``metadata_key_suffix``
^^^^^^^^^^^^^^^^^^^^^^^

A string containing a suffix to add to the metadata keys when saving into
the store. By default ``".json"``.
A string containing a suffix to add to the array and group metadata keys
jstriebel marked this conversation as resolved.
Show resolved Hide resolved
when saving into the store, associated with a single encoding. By default
only ``".json"`` is allowed and used with JSON encoding.

.. note::
This suffix is used to allow non hierarchy browsing and editing by
non-zarr-aware tools.

This suffix is used to allow non hierarchy
browsing and editing by non-zarr-aware tools.
.. note::
This is a possible extension point, where an extension which is
listed in ``extensions`` (see below) may add new valid suffixes with their
associated encodings.

``extensions``
^^^^^^^^^^^^^^
Expand Down Expand Up @@ -980,7 +978,6 @@ JSON is being used for encoding of group and array metadata::

{
"zarr_format": "https://purl.org/zarr/spec/core/3.0",
"metadata_encoding": "https://purl.org/zarr/spec/core/3.0",
"metadata_key_suffix" : ".json",
"extensions": []
}
Expand All @@ -991,7 +988,6 @@ ignored if not understood::

{
"zarr_format": "https://purl.org/zarr/spec/core/3.0",
"metadata_encoding": "https://purl.org/zarr/spec/core/3.0",
"metadata_key_suffix" : ".json",
"extensions": [
{
Expand Down Expand Up @@ -1162,8 +1158,8 @@ The following members are optional:
transformer is used, same for an empty list.


All other names within the array metadata object are reserved for
future versions of this specification.
The array metadata object must not contain any other names.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved
Those are reserved for future versions of this specification.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

For example, the array metadata JSON document below defines a
two-dimensional array of 64-bit little-endian floating point numbers,
Expand Down Expand Up @@ -1269,9 +1265,8 @@ Metadata encoding
-----------------

The entry point metadata document must be encoded as JSON. The array (``*.array`` s) and
group metadata documents (``*.group`` s) must be encoded as per the type given in
the ``metadata_encoding`` field in the entry point metadata document
(described below).
group metadata documents (``*.group`` s) must be encoded as per the type associated with
the ``metadata_key_suffix`` field in the entry point metadata document (described below).

Stores
======
Expand Down