zappend
is a tool written in Python that is used for robustly creating and
updating Zarr datacubes from smaller dataset slices. It is built on top of the
awesome Python packages xarray and zarr.
The objective of zappend
is enabling geodata scientists and developers to
robustly create large data cubes. The tool performs transaction-based dataset
appends to existing data cubes in the
Zarr format. If an error
occurs during an append step — typically due to I/O problems or out-of-memory
conditions — zappend
will automatically roll back the operation, ensuring that
the existing data cube maintains its structural integrity. The design drivers
behind zappend are first ease of use and secondly, high configurability
regarding filesystems, data source types, data cube outline and encoding.
The tool comprises a command-line interface, a Python API for programmatic
control, and a comprehensible documentation to guide users effectively.
You can easily install zappend
as a plain Python package using either
pip install zappend
or conda install -conda-forge zappend
.
The zappend
tool provides the following features:
- Locking: While the target dataset is being modified, a file lock is created, effectively preventing concurrent dataset modifications.
- Transaction-based dataset appends: On failure during an append step, the transaction is rolled back, so that the target dataset remains valid and preserves its integrity.
- Filesystem transparency: The target dataset may be generated and updated in any writable filesystems supported by the fsspec package. The same holds for the slice datasets to be appended.
- Dataset polling: The tool can be configured to wait for slice datasets to become available.
- Dynamic attributes: Use syntax
{{ expression }}
to update the target dataset with dynamically computed attribute values. - CLI and Python API: The tool can be used in a shell using the
zappend
command or from Python. When used from Python using thezappend()
function, slice datasets can be passed as local file paths, URIs, as datasets of type xarray.Dataset, or as custom slice sources.
More about zappend can be found in its documentation.