Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: conflicting sizes for dimension 'spec': length 1 on the data but length 2 on coordinate 'time' #137

Open
alexgleith opened this issue Jun 19, 2024 · 2 comments

Comments

@alexgleith
Copy link
Contributor

alexgleith commented Jun 19, 2024

I've been getting an error, below, and I'm finding it hard to reproduce in other environments.

If I run with group_by = None, I can get stats to finish happily.

But when including group_by solar day, it's failing for some regions.

Has anyone seen a similar error and know how to fix it?

Key software versions include:

  • numpy==2.0.0
  • odc-algo==0.2.3
  • odc-cloud==0.2.5
  • odc-dscache==0.2.3
  • odc-geo==0.4.6
  • odc-io==0.2.2
  • odc-stac==0.3.9
  • odc-stats==1.0.46
  • xarray==2023.12.0
[2024-06-19 00:58:43,966] {proc.py:217} INFO - Starting processing of x038/y009/2023--P1Y
  xx = xx.groupby(groupby).map(fuser)
Traceback (most recent call last):
  File "/usr/local/bin/odc-stats", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/_cli_run.py", line 233, in run
    for result in result_stream:
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/proc.py", line 237, in _run
    proc.input_data(
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/plugins/_base.py", line 54, in input_data
    xx = load_with_native_transform(
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/io.py", line 222, in load_with_native_transform
    _load_with_native_transform_1(
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/io.py", line 137, in _load_with_native_transform_1
    xx = xx.groupby(groupby).map(fuser)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1563, in map
    return self._combine(applied)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1583, in _combine
    applied_example, applied = peek_at(applied)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/utils.py", line 193, in peek_at
    peek = next(gen)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1562, in <genexpr>
    applied = (func(ds, *args, **kwargs) for ds in self._iter_grouped())
  File "/usr/local/lib/python3.10/dist-packages/s1_geomad/plugin.py", line 57, in fuser
    return _xr_fuse(xx, partial(_first_valid_np, nodata=np_nan), "")
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 707, in _xr_fuse
    return xx.map(partial(_xr_fuse, op=op, name=name))
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 6931, in map
    variables = {
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 6932, in <dictcomp>
    k: maybe_wrap_array(v, func(v, *args, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 711, in _xr_fuse
    return _fuse_with_custom_op(xx, op, name=name)
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 702, in _fuse_with_custom_op
    return xr.DataArray(data, attrs=x.attrs, dims=x.dims, coords=coords, name=x.name)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 450, in __init__
    coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 197, in _infer_coords_and_dims
    _check_coords_dims(shape, new_coords, dims)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 135, in _check_coords_dims
    raise ValueError(
ValueError: conflicting sizes for dimension 'spec': length 1 on the data but length 2 on coordinate 'time'
@alexgleith
Copy link
Contributor Author

Possibly found a fix by pinning an old version of xarray xarray==2023.1.0

@Kirill888
Copy link
Member

I reckon issue is in odc.algo use of MultiIndex for spec dim/coord. I think this was a wrong solution that happened to work for a while, and then xarray changed something.

There is no need for multi-index I don't think, one can represent all of that with a single spec dimension (one entry per dataset) and then separate coords along spec dimension for time, uuid, grid. I think that all stemmed from misunderstanding that dim <-> coord relationship can be any to any and not only 1:1.

https://github.com/opendatacube/odc-algo/blob/f67879b1df951f4e1a3e3d52c13b244d1cb516a7/odc/algo/_grouper.py#L84-L94

    coords = [np.asarray(time, dtype="datetime64[ms]"), idx, uuids, grid]
    names = ["time", "idx", "uuid", "grid"]
    if solar_day is not None:
        coords.append(solar_day)
        names.append("solar_day")


    coord = pd.MultiIndex.from_arrays(coords, names=names)


    return xr.DataArray(
        data=data, coords=dict(spec=coord), attrs={"grid2crs": grid2crs}, dims=("spec",)
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants