Skip to content

Add reset_encoding to Dataset and DataArray objects #7686

Closed
@jhamman

Description

@jhamman

Is your feature request related to a problem?

Xarray maintains the encoding of datasets read from most of its supported backend formats (e.g. NetCDF, Zarr, etc.). This is very useful when you want to perfectly roundtrip but it often gets in the way, causing conflicts when writing a modified dataset or when appending to another dataset. Most of the time, the solution is to just remove the encoding from the dataset and continue on. The following code sample is found in a number of issues that reference this problem.

    for v in list(ds.coords.keys()):
        if ds.coords[v].dtype == object:
            ds[v].encoding.clear()

    for v in list(ds.variables.keys()):
        if ds[v].dtype == object:
            ds[v].encoding.clear()

A sample of issues that show variants of this problem.

Describe the solution you'd like

In many cases, the solution to these problems is to leave the original dataset encoding behind and either use Xarray's default encoding (or the backends default) or to specify one's own encoding options. Both cases would benefit from a convenience method to reset the original encoding. Something like would serve this process:

ds = xr.open_dataset(...).reset_encoding()

Describe alternatives you've considered

Variations on the API above could also be considered:

xr.open_dataset(..., keep_encoding=False)

or even:

with xr.set_options(keep_encoding=False):
    ds = xr.open_dataset(...)

We can/should also do a better job of surfacing inconsistent encoding in our backends (e.g. to_netcdf).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions