Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move codecs into separate (versioned documents), update urls #187

Merged
merged 3 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 7 additions & 240 deletions docs/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,245 +2,12 @@
Codecs
======

**Editor's Draft 21 October 2020**
Under construction.

Specification URI:
https://purl.org/zarr/specs/codec
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/HEAD/docs/codecs.rst>`_
.. toctree::
:glob:
:maxdepth: 1
:titlesonly:
:caption: Contents:

Copyright 2020 `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
is licensed under a `Creative Commons Attribution 3.0 Unported License
<https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

This document defines codecs for Zarr implementations.


Status of this document
=======================

.. warning::
This document is a **Work in Progress**. It may be updated, replaced
or obsoleted by other documents at any time. It is inappapropriate to
cite this document as other than work in progress.

Comments, questions or contributions to this document are very
welcome. Comments and questions should be raised via `GitHub issues
<https://github.com/zarr-developers/zarr-specs/labels/codec>`_.

This document is maintained by the `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.


Document conventions
====================

This document lists a collection of codecs. For each codec, the
following information is provided:

* A URI which can be used to uniquely identify the codec in Zarr array
metadata.
* Any configuration parameters which can be set in Zarr array
metadata.
* A definition of encoding/decoding algorithm and the encoded format,
or a citation to an existing specification where this is defined.
* Any additional headers added to the encoded data.

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Codecs
======

Gzip
----

Codec URI:
https://purl.org/zarr/spec/codec/gzip


Configuration parameters
~~~~~~~~~~~~~~~~~~~~~~~~

level:
An integer from 0 to 9 which controls the speed and level of
compression. A level of 1 is the fastest compression method and
produces the least compressions, while 9 is slowest and produces
the most compression. Compression is turned off completely when
level is 0.

For example, the array metadata below specifies that the compressor is
the Gzip codec configured with a compression level of 1::

{
"codecs": [{
"type": "https://purl.org/zarr/spec/codec/gzip",
"configuration": {
"level": 1
}
}],
}


Format and algorithm
~~~~~~~~~~~~~~~~~~~~

Encoding and decoding is performed using the algorithm defined in
[RFC1951]_.

Encoded data should conform to the Gzip file format [RFC1952]_.


Blosc
-----

Codec URI:
https://purl.org/zarr/spec/codec/blosc


Configuration parameters
~~~~~~~~~~~~~~~~~~~~~~~~

cname:
A string identifying the internal compression algorithm to be
used. At the time of writing, the following values are supported
by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd",
"snappy", "zlib".

clevel:
An integer from 0 to 9 which controls the speed and level of
compression. A level of 1 is the fastest compression method and
produces the least compressions, while 9 is slowest and produces
the most compression. Compression is turned off completely when
level is 0.

shuffle:
An integer value in the set {0, 1, 2, -1} indicating the way
bytes or bits are rearranged, which can lead to faster
and/or greater compression. A value of 1
indicates that byte-wise shuffling is performed prior to
compression. A value of 2 indicates the bit-wise shuffling is
performed prior to compression. If a value of -1 is given,
then default shuffling is used: bit-wise shuffling for buffers
with item size of 1 byte, byte-wise shuffling otherwise.
Shuffling is turned off completely when the value is 0.

blocksize:
An integer giving the size in bytes of blocks into which a
buffer is divided before compression. A value of 0
indicates that an automatic size will be used.

For example, the array metadata document below specifies that the
compressor is the Blosc codec configured with a compression level of
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
default block size::

{
"codecs": [{
"type": "https://purl.org/zarr/spec/codec/blosc",
"configuration": {
"cname": "lz4",
"clevel": 1,
"shuffle": 1,
"blocksize": 0
}
}],
}


Format and algorithm
~~~~~~~~~~~~~~~~~~~~

Blosc is a meta-compressor, which divides an input buffer into blocks,
then applies an internal compression algorithm to each block, then
packs the encoded blocks together into a single output buffer with a
header. The format of the encoded buffer is defined in [BLOSC]_. The
reference implementation is provided by the `c-blosc library
<https://github.com/Blosc/c-blosc>`_.

.. _endian-codec:

Endian
------

Codec URI:
https://purl.org/zarr/spec/codec/endian

Encodes array elements using the specified endianness.

Configuration parameters
~~~~~~~~~~~~~~~~~~~~~~~~

endian:
Required. A string equal to either ``"big"`` or ``"little"``.

Format and algorithm
~~~~~~~~~~~~~~~~~~~~

Each element of the array is encoded using the specified endian variant of its
default binary representation. Array elements are encoded in lexicographical
order. For example, with ``endian`` specified as ``big``, the ``int32`` data
type is encoded as a 4-byte big endian two's complement integer, and the
``complex128`` data type is encoded as two consecutive 8-byte big endian IEEE
754 binary64 values.

.. note::

Since the default binary representation of all data types is little endian,
specifying this codec with ``endian`` equal to ``"little"`` is equivalent to
omitting this codec, because if this codec is omitted, the default binary
representation of the data type, which is always little endian, is used
instead.

Deprecated codecs
=================

There are no deprecated codecs at this time.


References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119

.. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version
1.3. Requirement Levels. May 1996. Informational. URL:
https://tools.ietf.org/html/rfc1951

.. [RFC1952] P. Deutsch. GZIP file format specification version 4.3.
Requirement Levels. May 1996. Informational. URL:
https://tools.ietf.org/html/rfc1952

.. [BLOSC] F. Alted. Blosc Chunk Format. URL:
https://github.com/Blosc/c-blosc/blob/HEAD/README_CHUNK_FORMAT.rst


Change log
==========

Editor's Draft 21 October 2020
------------------------------

* Added Gzip codec.
* Added Blosc codec.
codecs/*/*
131 changes: 131 additions & 0 deletions docs/codecs/blosc/v1.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
===========================
Blosc codec (version 1.0)
===========================

**Editor's draft 26 July 2019**

Specification URI:
https://purl.org/zarr/spec/codecs/blosc/1.0
Corresponding ZEP:
`ZEP 1 — Zarr specification version 3 <https://zarr.dev/zeps/draft/ZEP0001.html>`_
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/codecs/blosc/v1.0.rst>`_

Copyright 2020 `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
is licensed under a `Creative Commons Attribution 3.0 Unported License
<https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

This specification defines an implementation of the Zarr abstract
store API using a file system.


Status of this document
=======================

.. warning::
This document is a draft for review and subject to changes.
It will become final when the `Zarr Enhancement Proposal (ZEP) 1 <https://zarr.dev/zeps/draft/ZEP0001.html>`_
is approved via the `ZEP process <https://zarr.dev/zeps/active/ZEP0000.html>`_.


Document conventions
====================

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Configuration parameters
========================

cname:
A string identifying the internal compression algorithm to be
used. At the time of writing, the following values are supported
by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd",
"snappy", "zlib".

clevel:
An integer from 0 to 9 which controls the speed and level of
compression. A level of 1 is the fastest compression method and
produces the least compressions, while 9 is slowest and produces
the most compression. Compression is turned off completely when
level is 0.

shuffle:
An integer value in the set {0, 1, 2, -1} indicating the way
bytes or bits are rearranged, which can lead to faster
and/or greater compression. A value of 1
indicates that byte-wise shuffling is performed prior to
compression. A value of 2 indicates the bit-wise shuffling is
performed prior to compression. If a value of -1 is given,
then default shuffling is used: bit-wise shuffling for buffers
with item size of 1 byte, byte-wise shuffling otherwise.
Shuffling is turned off completely when the value is 0.

blocksize:
An integer giving the size in bytes of blocks into which a
buffer is divided before compression. A value of 0
indicates that an automatic size will be used.

For example, the array metadata document below specifies that the
compressor is the Blosc codec configured with a compression level of
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
default block size::

{
"codecs": [{
"type": "https://purl.org/zarr/spec/codecs/blosc/1.0",
"configuration": {
"cname": "lz4",
"clevel": 1,
"shuffle": 1,
"blocksize": 0
}
}],
}


Format and algorithm
====================

Blosc is a meta-compressor, which divides an input buffer into blocks,
then applies an internal compression algorithm to each block, then
packs the encoded blocks together into a single output buffer with a
header. The format of the encoded buffer is defined in [BLOSC]_. The
reference implementation is provided by the `c-blosc library
<https://github.com/Blosc/c-blosc>`_.


References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119

.. [BLOSC] F. Alted. Blosc Chunk Format. URL:
https://github.com/Blosc/c-blosc/blob/HEAD/README_CHUNK_FORMAT.rst


Change log
==========

No changes yet.
Loading