Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_stac: optimize resolution #1043

Closed
bossie opened this issue Feb 6, 2025 · 5 comments · Fixed by #1044
Closed

load_stac: optimize resolution #1043

bossie opened this issue Feb 6, 2025 · 5 comments · Fixed by #1044
Assignees

Comments

@bossie
Copy link
Collaborator

bossie commented Feb 6, 2025

j-25020610185143a6a705ec2701413e4e by @filipsch only requests 60m bands but result has a 10m resolution.

Process graph:

{
  "process_graph": {
    "loadstac1": {
      "arguments": {
        "bands": [
          "B01_60m",
          "B02_60m",
          "B03_60m",
          "B04_60m",
          "B8A_60m",
          "B09_60m",
          "B11_60m",
          "B12_60m",
          "SCL_60m"
        ],
        "featureflags": {
          "indexreduction": 3,
          "tilesize": 256,
          "use-filter-extension": "cql2-json"
        },
        "properties": {
          "eo:cloud_cover": {
            "process_graph": {
              "lte1": {
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": 90
                },
                "process_id": "lte",
                "result": true
              }
            }
          },
          "grid:code": {
            "process_graph": {
              "eq2": {
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": "MGRS-36NYH"
                },
                "process_id": "eq",
                "result": true
              }
            }
          },
          "processing:version": {
            "process_graph": {
              "eq1": {
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": "05.00"
                },
                "process_id": "eq",
                "result": true
              }
            }
          }
        },
        "spatial_extent": {
          "crs": "EPSG:32636",
          "east": 809760,
          "north": 300000,
          "south": 190200,
          "west": 699960
        },
        "temporal_extent": [
          "2020-03-22T00:00:00Z",
          "2020-03-23T00:00:00Z"
        ],
        "url": "https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a"
      },
      "process_id": "load_stac"
    },
    "saveresult1": {
      "arguments": {
        "data": {
          "from_node": "loadstac1"
        },
        "format": "GTiff",
        "options": {
          "file_metadata": {
            "product_grid": "MGRS tiling grid",
            "product_tile": "36NYH"
          },
          "filename_prefix": "LCFM_CLOUD_DETECTION_TEST_CDSE_STAC_2020_36NYH",
          "separate_asset_per_band": false
        }
      },
      "process_id": "save_result",
      "result": true
    }
  }
}
@bossie bossie self-assigned this Feb 6, 2025
@bossie
Copy link
Collaborator Author

bossie commented Feb 6, 2025

STAC API https://stac.dataspace.copernicus.eu/v1 supports Projection Extension v2.0.0 (see its stac_extensions). In this version, proj:epsg has been replaced with proj:code so we need to support that.

@bossie bossie linked a pull request Feb 6, 2025 that will close this issue
bossie added a commit that referenced this issue Feb 7, 2025
openeo_driver.testing.ApiException: Expected response with status code 200 but got 500. Error: {'code': 'Internal', 'id': 'r-25020620100349b0bc0ac2cb63f8a079', 'message': 'Server error: AttributeError("\'NoneType\' object has no attribute \'upper\'")'}

#1043
@bossie
Copy link
Collaborator Author

bossie commented Feb 10, 2025

Observations:

  1. handling proj:code in addition to proj:epsg in load_stac is not enough: the resolution and CRS of the data cube are effectively those of the last asset;
  2. the resolution of a load_collection data cube can depend on band resolution (be coarser) so long as bands define an openeo:gsd; this is the case for SENTINEL2_L2A_SENTINELHUB but not for TERRASCOPE_S2_TOC_V2, for example.

bossie added a commit that referenced this issue Feb 10, 2025
bossie added a commit that referenced this issue Feb 11, 2025
bossie added a commit that referenced this issue Feb 12, 2025
bossie added a commit that referenced this issue Feb 12, 2025
openeo_driver.testing.ApiException: Expected response with status code 200 but got 500. Error: {'code': 'Internal', 'id': 'r-2502121147414e02b4203a843e0e2db8', 'message': 'Server error: KeyError("Your key: load_stac_apply_lcfm_improvements was not in the whitelist [\'vault_token\', \'sentinel_hub_client_alias\', \'max_soft_errors_ratio\', \'dependencies\', \'pyramid_levels\', \'require_bounds\', \'correlation_id\', \'user\', \'allow_empty_cubes\', \'do_extent_check\', \'parameters\']. This error needs to be fixed in the backend.")'}
bossie added a commit that referenced this issue Feb 13, 2025
bossie added a commit that referenced this issue Feb 13, 2025
bossie added a commit that referenced this issue Feb 13, 2025
bossie added a commit that referenced this issue Feb 13, 2025
bossie added a commit that referenced this issue Feb 13, 2025
* support "proj:code" in addition to "proj:epsg"

#1043

* handle optional "proj:code"

openeo_driver.testing.ApiException: Expected response with status code 200 but got 500. Error: {'code': 'Internal', 'id': 'r-25020620100349b0bc0ac2cb63f8a079', 'message': 'Server error: AttributeError("\'NoneType\' object has no attribute \'upper\'")'}

#1043

* test resolution

#1043

* adapt test to resilient StacIO #1043

* use resolution and CRS from requested bands (opt-in) #1043

* test CRS #1043

* enable in catalog or from job_option #1043

* whitelist feature flag #1043

openeo_driver.testing.ApiException: Expected response with status code 200 but got 500. Error: {'code': 'Internal', 'id': 'r-2502121147414e02b4203a843e0e2db8', 'message': 'Server error: KeyError("Your key: load_stac_apply_lcfm_improvements was not in the whitelist [\'vault_token\', \'sentinel_hub_client_alias\', \'max_soft_errors_ratio\', \'dependencies\', \'pyramid_levels\', \'require_bounds\', \'correlation_id\', \'user\', \'allow_empty_cubes\', \'do_extent_check\', \'parameters\']. This error needs to be fixed in the backend.")'}

* log usage of resolution improvement #1043

* fix combination of resolutions #1043

* address non-square pixels #1043

* improve/fix identifiers #1043

* adapt CHANGELOG #1043
@bossie
Copy link
Collaborator Author

bossie commented Feb 13, 2025

Integration test tests.test_collections.test_SENTINEL2_GLOBAL_MOSAICS_STAC[SENTINEL2_GLOBAL_MOSAICS_STAC_CLOUDFERRO] fails; the job actually succeeds but comparing against the reference image fails:

ValueError: zero-size array to reduction operation minimum which has no identity
connection = <Connection to 'https://openeo.dev.warsaw.openeo.dataspace.copernicus.eu/openeo/1.2/' with OidcBearerAuth>
tmp_path = PosixPath('/var/lib/jenkins/workspace/openEO/openeo-deploy-cdse/kube_resources/applications/integrationtests/pytest-tmp/test_SENTINEL2_GLOBAL_MOSAICS_0')
collection_id = 'SENTINEL2_GLOBAL_MOSAICS_STAC_CLOUDFERRO'

    @pytest.mark.parametrize("collection_id", [
        "SENTINEL2_GLOBAL_MOSAICS_STAC_CLOUDFERRO",
        # "SENTINEL2_GLOBAL_MOSAICS_STAC_COPERNICUS",  # covered as unit test now.
    ])
    def test_SENTINEL2_GLOBAL_MOSAICS_STAC(connection, tmp_path, collection_id):
        start_date = "2023-01-01"
        end_date = "2023-01-02"
    
        folder_name = f"result_{collection_id}_{start_date}/"
        expected_tif = get_expected_result(folder_name, "img.tif")
    
        # Only a few products are available at the moment
        cube = connection.load_collection(
            collection_id,
            temporal_extent=[start_date, end_date],
            spatial_extent={'west': 1.88, 'south': 35.31, 'east': 1.93, 'north': 35.32},
        )
    
        output_file = tmp_path / "actual.tif"
        with_retries(lambda: cube.download(output_file))
    
>       compare_tiff_files(expected_tif, output_file)

tests/test_collections.py:463: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_collections.py:66: in compare_tiff_files
    assert_xarray_equals(result.loc[1], expected.loc[1], max_nonmatch_ratio, tolerance)
tests/test_collections.py:44: in assert_xarray_equals
    significantly_different = numpy.abs(xa1 - 1.0 * xa2) > tolerance
/usr/local/lib/python3.8/site-packages/xarray/core/_typed_ops.py:209: in __sub__
    return self._binary_op(other, operator.sub)
/usr/local/lib/python3.8/site-packages/xarray/core/dataarray.py:4367: in _binary_op
    f(self.variable, other_variable)
/usr/local/lib/python3.8/site-packages/xarray/core/_typed_ops.py:399: in __sub__
    return self._binary_op(other, operator.sub)
/usr/local/lib/python3.8/site-packages/xarray/core/variable.py:2642: in _binary_op
    self_data, other_data, dims = _broadcast_compat_data(self, other)
/usr/local/lib/python3.8/site-packages/xarray/core/variable.py:3131: in _broadcast_compat_data
    self_data = new_self.data
/usr/local/lib/python3.8/site-packages/xarray/core/variable.py:434: in data
    return self.values
/usr/local/lib/python3.8/site-packages/xarray/core/variable.py:607: in values
    return _as_array_or_item(self._data)
/usr/local/lib/python3.8/site-packages/xarray/core/variable.py:313: in _as_array_or_item
    data = np.asarray(data)
/usr/local/lib/python3.8/site-packages/xarray/core/indexing.py:658: in __array__
    self._ensure_cached()
/usr/local/lib/python3.8/site-packages/xarray/core/indexing.py:655: in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
/usr/local/lib/python3.8/site-packages/xarray/core/indexing.py:628: in __array__
    return np.asarray(self.array, dtype=dtype)
/usr/local/lib/python3.8/site-packages/xarray/core/indexing.py:529: in __array__
    return np.asarray(array[self.key], dtype=None)
/usr/local/lib/python3.8/site-packages/xarray/backends/rasterio_.py:132: in __getitem__
    return indexing.explicit_indexing_adapter(
/usr/local/lib/python3.8/site-packages/xarray/core/indexing.py:820: in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
/usr/local/lib/python3.8/site-packages/xarray/backends/rasterio_.py:114: in _getitem
    band_key, window, squeeze_axis, np_inds = self._get_indexer(key)
/usr/local/lib/python3.8/site-packages/xarray/backends/rasterio_.py:101: in _get_indexer
    start, stop = np.min(k), np.max(k) + 1
<__array_function__ internals>:200: in amin
    ???
/usr/local/lib64/python3.8/site-packages/numpy/core/fromnumeric.py:2946: in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = array([], dtype=int64), ufunc = <ufunc 'minimum'>, method = 'min'
axis = None, dtype = None, out = None
kwargs = {'initial': <no value>, 'keepdims': <no value>, 'where': <no value>}
passkwargs = {}

    def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs):
        passkwargs = {k: v for k, v in kwargs.items()
                      if v is not np._NoValue}
    
        if type(obj) is not mu.ndarray:
            try:
                reduction = getattr(obj, method)
            except AttributeError:
                pass
            else:
                # This branch is needed for reductions like any which don't
                # support a dtype.
                if dtype is not None:
                    return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
                else:
                    return reduction(axis=axis, out=out, **passkwargs)
    
>       return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
E       ValueError: zero-size array to reduction operation minimum which has no identity

/usr/local/lib64/python3.8/site-packages/numpy/core/fromnumeric.py:86: ValueError

The result GeoTiff has a pixel size of almost but not quite 10m: (10.000000000026239,-9.999999999777119)

@bossie
Copy link
Collaborator Author

bossie commented Feb 13, 2025

Corresponding STAC API request (r-2502131520304167bd793d33581c9156): https://pgstac/search?limit=20&bbox=1.88%2C35.31%2C1.93%2C35.32&datetime=2023-01-01T00%3A00%3A00Z%2F2023-01-01T23%3A59%3A59.999000Z&collections=sentinel-2-global-mosaics

What changed is that enabling the interpretation of proj:code (this happens regardless of the value of the feature flag) in addition to proj:epsg makes load_stac derive the pixel size from proj:bbox and proj:shape:

They are respectively:

[
  399959.9999997376,
  3899939.999915692,
  500040.0000000008,
  4000019.9999134773
]

and

[
  10008,
  10008
]

Calculating the pixel size:

(500040.0000000008 - 399959.9999997376) = 100080.00000026322 / 10008 = 10.0000000000263
(4000019.9999134773 - 3899939.999915692) = 100079.99999778531 / 10008 = 9.999999999778709

A problem with the STAC API rather than load_stac?

@bossie
Copy link
Collaborator Author

bossie commented Feb 13, 2025

The original process graph with what looks like properly aligned proj:bbox and proj:shape runs to completion with a 60m result and the proper CRS.

The quick fix for the integration test is probably to put proj:code behind the feature flag too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant