Skip to content

Commit

Permalink
Merge pull request #81 from bcdev/forman-prep_v06
Browse files Browse the repository at this point in the history
Preparing 0.6 release
  • Loading branch information
forman authored Mar 12, 2024
2 parents f67c6eb + 8e55ed5 commit 70e739c
Show file tree
Hide file tree
Showing 58 changed files with 378 additions and 251 deletions.
82 changes: 43 additions & 39 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,42 @@
## Version 0.6.0 (in development)
## Version 0.6.0 (from 2024-03-12)

### Enhancements

* Added configuration setting `force_new`, which forces creation of a new
target dataset. An existing target dataset (and its lock) will be
permanently deleted before appending of slice datasets begins. [#72]
permanently deleted before appending of slice datasets begins. (#72)

* Chunk sizes can now be `null` for a given dimension. In this case the actual
chunk size used is the size of the array's shape in that dimension. (#77)

### API Changes

* Simplified writing of custom slice sources for users. The configuration setting
`slice_source` can now be a `SliceSource` class or any function that returns a
_slice item_: an `xarray.Dataset` object, a `SliceSource` object or
local file path or URI of type `str` or `FileObj`.
Dropped concept of _slice factories_ entirely. [#78]

* Chunk sizes can now be `null` for a given dimension. In this case the actual
chunk size used is the size of the array's shape in that dimension. [#77]
_slice item_: a local file path or URI, an `xarray.Dataset`,
a `SliceSource` object.
Dropped concept of _slice factories_ entirely, including functions
`to_slice_factory()` and `to_slice_factories()`. (#78)

* Internal refactoring: Extracted `Config` class out of `Context` and
made available via new `Context.config: Config` property.
The change concerns any usages of the `ctx: Context` argument passed to
user slice factories. [#74]
* Extracted `Config` class out of `Context` and made available via new
`Context.config: Config` property. The change concerns any usages of the
`ctx: Context` argument passed to user slice factories. (#74)

## Version 0.5.1 (2024-02-23)
## Version 0.5.1 (from 2024-02-23)

* Fixed rollback for situations where writing to Zarr fails shortly after the
Zarr directory has been created. [#69]
Zarr directory has been created. (#69)

In this case the error message was
```TypeError: Transaction._delete_dir() missing 1 required positional argument: 'target_path'```.


## Version 0.5.0 (2024-02-19)
## Version 0.5.0 (from 2024-02-19)

### Enhancements

* The configuration setting `attrs` can now be used to define dynamically
computed dataset attributes using the syntax `{{ expression }}`. [#60]
computed dataset attributes using the syntax `{{ expression }}`. (#60)

Example:
```yaml
Expand All @@ -44,16 +48,16 @@
```
* Introduced new configuration setting `attrs_update_mode` that controls
how dataset attributes are updated. [#59]
how dataset attributes are updated. (#59)

* Simplified logging to console. You can now set configuration setting
`logging` to a log level which will implicitly enable console logging with
given log level. [#64]
given log level. (#64)

* Added a section in the notebook `examples/zappend-demo.ipynb`
that demonstrates transaction rollbacks.

* Added CLI option `--traceback`. [#57]
* Added CLI option `--traceback`. (#57)

* Added a section in the notebook `examples/zappend-demo.ipynb`
that demonstrates transaction rollbacks.
Expand All @@ -62,17 +66,17 @@

* Fixed issue where a NetCDF package was missing to run the
demo Notebook `examples/zappend-demo.ipynb` in
[Binder](https://mybinder.readthedocs.io/). [#47]
[Binder](https://mybinder.readthedocs.io/). (#47)

## Version 0.4.1 (from 2024-02-13)

### Fixes

* Global metadata attributes of target dataset is no longer empty. [#56]
* Global metadata attributes of target dataset is no longer empty. (#56)

* If the target _parent_ directory did not exist, an exception was raised
reporting that the lock file to be written does not exist. Changed this to
report that the target parent directory does not exist. [#55]
report that the target parent directory does not exist. (#55)

### Enhancements

Expand All @@ -87,27 +91,27 @@
the step sizes between the labels of a coordinate variable associated with
the append dimension. Its value can be a number for numerical labels
or a time delta value of the form `8h` (8 hours) or `2D` (two days) for
date/time labels. The value can also be negative. [#21]
date/time labels. The value can also be negative. (#21)

* The configuration setting `append_step` can take the special values
`"+"` and `"-"` which are used to verify that the labels are monotonically
increasing and decreasing, respectively. [#20]
increasing and decreasing, respectively. (#20)

* It is now possible to reference environment variables
in configuration files using the syntax `${ENV_VAR}`. [#36]
in configuration files using the syntax `${ENV_VAR}`. (#36)

* Added a demo Notebook `examples/zappend-demo.ipynb` and linked
it by a binder badge in README.md. [#47]
it by a binder badge in README.md. (#47)

### Fixes

* When `slice_source` was given as class or function and passed
to the `zappend()` function either as configuration entry or as keyword
argument, a `ValidationError` was accidentally raised. [#49]
argument, a `ValidationError` was accidentally raised. (#49)

* Fixed an issue where an absolute lock file path was computed if the target
Zarr path was relative in the local filesystem, and had no parent directory.
[#45]
(#45)

## Version 0.3.0 (from 2024-01-26)

Expand All @@ -119,22 +123,22 @@
of `zappend.api.SliceSource`. If `slice_source` is given, slices passed to
the zappend function or CLI command will be interpreted as parameter(s)
passed to the constructor of the specified class or the factory function.
[#27]
(#27)

* It is now possible to configure runtime profiling of the `zappend`
processing using the new configuration setting `profiling`. [#39]
processing using the new configuration setting `profiling`. (#39)

* Added `--version` option to CLI. [#42]
* Added `--version` option to CLI. (#42)

* Using `sizes` instead of `dims` attribute of `xarray.Dataset` in
implementation code. [#25]
implementation code. (#25)

* Enhanced documentation including docstrings of several Python API objects.

### Fixes

* Fixed a problem where the underlying i/o stream of a persistent slice dataset
was closed immediately after opening the dataset. [#31]
was closed immediately after opening the dataset. (#31)

* Now logging ignored encodings on level DEBUG instead of WARNING because they
occur very likely when processing NetCDF files.
Expand All @@ -146,17 +150,17 @@
* Introduced _slice factories_
- Allow passing slice object factories to the `zappend()` function.
Main use case is to return instances of a custom `zappend.api.SliceSource`
implemented by users. [#13]
implemented by users. (#13)

- The utility functions `to_slice_factories` and `to_slice_factory`
exported by `zappend.api` ease passing inputs specific for a custom
`SliceSource` or other callables that can produce a slice object. [#22]
`SliceSource` or other callables that can produce a slice object. (#22)

* Introduced new configuration flag `persist_mem_slices`.
If set, in-memory `xr.Dataset` instances will be first persisted to a
temporary Zarr, then reopened, and then appended to the target dataset. [#11]
temporary Zarr, then reopened, and then appended to the target dataset. (#11)

* Added initial documentation. [#17]
* Added initial documentation. (#17)

* Improved readability of generated configuration documentation.

Expand All @@ -166,9 +170,9 @@

* Fixed problem when passing slices opened from NetCDF files. The error was
`TypeError: VariableEncoding.__init__() got an unexpected keyword argument 'chunksizes'`.
[#14]
(#14)

* Fixed problem where info about closing slice was logged twice. [#9]
* Fixed problem where info about closing slice was logged twice. (#9)


## Version 0.1.1 (from 2024-01-10)
Expand Down
23 changes: 13 additions & 10 deletions docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,7 @@ for computed slice datasets, especially if the specified target dataset chunking
different from the slice dataset chunking. This may cause Dask graphs to be
computed multiple times if the source chunking overlaps multiple target chunks,
potentially causing large resource overheads while recomputing and/or reloading the same
source chunks multiple times. In such cases it can help to "terminate" such
source chunks multiple times. In such cases it can help to "terminate"
computations for each slice by persisting the computed dataset first and then to
reopen it. This can be specified using the `persist_mem_slice` setting:

Expand All @@ -693,7 +693,7 @@ at the cost of additional i/o. It therefore defaults to `false`.
#### Slice Sources

If you need some custom cleanup after a slice has been processed and appended to the
target dataset, you can use an instance of `zappend.api.SliceSource` as slice item.
target dataset, you can use instances of `zappend.api.SliceSource` as slice items.
A `SliceSource` class requires you to implement two methods:

* `get_dataset()` to return the slice dataset of type `xarray.Dataset`, and
Expand Down Expand Up @@ -725,10 +725,11 @@ class MySliceSource(SliceSource):
self.ds = None
```

Instead of providing instances of `SliceSource` directly as a slice item, it is often
Instead of providing instances of `SliceSource` as slice items, it is often
easier to pass your `SliceSource` class and let `zappend` pass the slice item as
arguments(s) to your `SliceSource`'s constructor. This can be achieved using the
the `slice_source` configuration setting.
the `slice_source` configuration setting. If you need to access configuration
settings, it is even required to use the `slice_source` setting.

```json
{
Expand All @@ -737,7 +738,8 @@ the `slice_source` configuration setting.
```

The `slice_source` setting can actually be **any Python function** that returns a
valid slice item as described above.
valid slice item as described above such as a file path or URI, or
an `xarray.Dataset`.

If a slice source is configured, each slice item passed to `zappend` is passed as
argument to your slice source.
Expand All @@ -754,13 +756,14 @@ argument to your slice source.
- `dict`: keyword arguments only;
- Any other type is interpreted as single positional argument.

In addition, your slice source function or class constructor specified by `slice_source`
may define a 1st positional argument or keyword argument named `ctx`,
which will receive the current processing context of type `zappend.api.Context`.
This can be useful if you need to read configuration settings.
In addition, your slice source function or class constructor specified by
`slice_source` may define a 1st positional argument or keyword argument
named `ctx`, which will receive the current processing context of type
`zappend.api.Context`. This can be useful if you need to read configuration
settings.

Here is a more advanced example of a slice source that opens datasets from a given
file path and averages the values first along the time dimension:
file path and averages the values along the time dimension:

```python
import numpy as np
Expand Down
23 changes: 21 additions & 2 deletions docs/start.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,5 +74,24 @@ zappend((f"s3:/mybucket/data/{name}"
config=config)
```

Slice datasets can be passed in a number of ways; please refer to the section
[_Slice Sources_](guide.md#slice-sources) in the [User Guide](guide.md).
Slice items can also be arguments passed to your custom _slice source_,
a function or class that provides the actual slice to be appended:

```python
import xarray as xr
from zappend.api import zappend


def get_dataset(path: str):
ds = xr.open_dataset(path)
return ds.drop_vars(["ndvi_min", "ndvi_max"])

zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
slice_source=get_dataset,
target_dir="target.zarr")
```

For the details, please refer to the section [_Slice Sources_](guide.md#slice-sources) in the
[User Guide](guide.md).


Loading

0 comments on commit 70e739c

Please sign in to comment.