Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read all bands from multi band visual asset from "sentinel-2-c1-l2a" collection #148

Open
rbavery opened this issue Apr 4, 2024 · 10 comments

Comments

@rbavery
Copy link

rbavery commented Apr 4, 2024

I'm trying to read a multi band asset following the guidance here: #118 (comment)

def xarr_from_aoi_point(point: Point, time_range: str = "2023-01-01/2024-01-01", max_cloud_cover: int = 10, max_nodata_percent: int = .5) -> xr.DataArray:
    collection_id = "sentinel-2-c1-l2a"
    catalog = pystac_client.Client.open(
    "https://earth-search.aws.element84.com/v1",
    )

    def perform_search(tile: str = None):
        query_params = {"eo:cloud_cover": {"lt": max_cloud_cover}, "s2:nodata_pixel_percentage": {"lt":max_nodata_percent}}
        if tile:
            query_params['grid:code'] = {'eq': tile}
        
        search = catalog.search(collections=[collection_id], 
                                intersects=point, 
                                datetime=time_range, 
                                query=query_params, 
                                max_items=3,
                                sortby="properties.datetime")
        return search.item_collection()
    
    items = perform_search()
    if len(set([item.properties['grid:code'] for item in items])) != 1:
        tile_code = next((item.properties.get('grid:code') for item in items if 'grid:code' in item.properties), None)
        items = perform_search(tile=tile_code)
    xarr = stac_load(
        items,
        bands=("visual.1", "visual.2", "visual.3"),
        chunks={},  # <-- use Dask
        crs = items[0].properties['proj:epsg']
    )
    return items

but this errors, the visual asset's multiple bands are not found. only "visual.1" is found.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[117], line 2
      1 point = sample.iloc[0]
----> 2 items = xarr_from_aoi_point(sample.iloc[0])

Cell In[116], line 36, in xarr_from_aoi_point(point, time_range, max_cloud_cover, max_nodata_percent)
     34     tile_code = next((item.properties.get('grid:code') for item in items if 'grid:code' in item.properties), None)
     35     items = perform_search(tile=tile_code)
---> 36 xarr = stac_load(
     37     items,
     38     bands=("visual.1", "visual.2", "visual.3"),
     39     chunks={},  # <-- use Dask
     40     crs = items[0].properties['proj:epsg']
     41 )
     42 return items

File [/opt/conda/lib/python3.10/site-packages/odc/stac/_load.py:514](http://127.0.0.1:8888/opt/conda/lib/python3.10/site-packages/odc/stac/_load.py#line=513), in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw)
    511 # Check we have all the bands of interest
    512 # will raise ValueError if no such band[/alias](http://127.0.0.1:8888/alias)
    513 collection = _collection(_parsed)
--> 514 bands_to_load = collection.resolve_bands(bands)
    515 bands = list(bands_to_load)
    517 load_cfg = _resolve_load_cfg(
    518     bands_to_load,
    519     resampling,
   (...)
    523     fail_on_error=fail_on_error,
    524 )

File [/opt/conda/lib/python3.10/site-packages/odc/stac/_model.py:157](http://127.0.0.1:8888/opt/conda/lib/python3.10/site-packages/odc/stac/_model.py#line=156), in RasterCollectionMetadata.resolve_bands(self, bands)
    153 """
    154 Query bands taking care of aliases.
    155 """
    156 bands = self.normalize_band_query(bands)
--> 157 return {
    158     band: self.bands[k]
    159     for band, k in ((band, self.band_key(band)) for band in bands)
    160 }

File [/opt/conda/lib/python3.10/site-packages/odc/stac/_model.py:158](http://127.0.0.1:8888/opt/conda/lib/python3.10/site-packages/odc/stac/_model.py#line=157), in <dictcomp>(.0)
    153 """
    154 Query bands taking care of aliases.
    155 """
    156 bands = self.normalize_band_query(bands)
    157 return {
--> 158     band: self.bands[k]
    159     for band, k in ((band, self.band_key(band)) for band in bands)
    160 }

KeyError: ('visual', 2)

If I change the load to

    xarr = stac_load(
        items,
        bands=("visual.1"),
        chunks={},  # <-- use Dask
        crs = items[0].properties['proj:epsg']
    )

the asset is found but I can only get the first band.

My odc stac version is v0.3.9.

I can provide a MRE if it is helpful for diagnosing the issue but first wanted to check if I'm specifying the bands incorrectly.

@rbavery rbavery changed the title Can't read multi band visual asset from "sentinel-2-c1-l2a" collection with odc.stac Can't read all bands from multi band visual asset from "sentinel-2-c1-l2a" collection Apr 4, 2024
@Kirill888
Copy link
Member

@rbavery what version of pystac do you have? This sounds like eo:bands extension not being detected.

@Kirill888
Copy link
Member

Ok, so it's missing raster:bands data for visual band, even though it has eo:bands defined. And it's raster:bands that we use to determine how many bands are in a given asset.

Visual band should have this extra info:

{"assets": {"visual": {
      ...,
      "raster:bands": [{"data_type": "uint8"},{"data_type": "uint8"},{"data_type": "uint8"}]
}}}

Sample item: https://earth-search.aws.element84.com/v1/collections/sentinel-2-c1-l2a/items/S2B_T52JFM_20240407T012410_L2A

ping @gadomski, visual band has no raster level info in sentinel-2-c1-l2a collection, is that a known constraint?

Appendix

{"assets": {
    "visual": {
      "href": "https://e84-earth-search-sentinel-data.s3.us-west-2.amazonaws.com/sentinel-2-c1-l2a/52/J/FM/2024/4/S2B_T52JFM_20240407T012410_L2A/TCI.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "title": "True color image",
      "eo:bands": [
        {
          "name": "B04",
          "common_name": "red",
          "center_wavelength": 0.665,
          "full_width_half_max": 0.038
        },
        {
          "name": "B03",
          "common_name": "green",
          "center_wavelength": 0.56,
          "full_width_half_max": 0.045
        },
        {
          "name": "B02",
          "common_name": "blue",
          "center_wavelength": 0.49,
          "full_width_half_max": 0.098
        }
      ],
      "gsd": 10,
      "proj:shape": [
        10980,
        10980
      ],
      "proj:transform": [
        10,
        0,
        600000,
        0,
        -10,
        6700000
      ],
      "file:checksum": "1220cb7e18f0fb190115e61d9c04adcebf2a98d6d786a9e01360d0fafaac6db6c36e",
      "file:size": 304120956,
      "roles": [
        "visual"
      ]
    }
}}

@gadomski
Copy link
Contributor

gadomski commented Apr 8, 2024

@Kirill888 I checked with @matthewhanson and it sound like raster:bands wasn't included on the visual because it's byte scaled so shouldn't be used for any analysis, just for visualization.

@rbavery
Copy link
Author

rbavery commented Apr 8, 2024

Got it thanks for the investigation and info @Kirill888 and @gadomski.

It's pretty common in the ML community to select less precise data types to optimize for storage and throughput. Not all detection tasks require the full precision of the analysis assets. For example this SATLAS model I'm working with was trained on and expects to test on the TCI visual product: https://github.com/allenai/satlas/blob/main/Normalization.md#sentinel-2-images

It'd definitely be helpful to have the raster:bands metadata. Currently I'm working with the float bands instead to get around this and need to stack these bands myself. I'm also not sure if I'm doing normalization exactly like the TCI product has done.

@matthewhanson
Copy link

@rbavery it's a bit hacky, but you to get this working you can just create the metadata yourself for the visual asset, then loop through your items and insert it into the JSON. Then open it with odc-stac.

@matthewhanson
Copy link

for item in items:
   item['assets']['visual']['raster:bands'] = [
      {'nodata': 0, 'data_type': 'uint8'},
      {'nodata': 0, 'data_type': 'uint8'},
      {'nodata': 0, 'data_type': 'uint8'}
   ]

@Kirill888
Copy link
Member

@Kirill888 I checked with @matthewhanson and it sound like raster:bands wasn't included on the visual because it's byte scaled so shouldn't be used for any analysis, just for visualization.

@gadomski @matthewhanson
But it does include eo:bands section leading to alias clashes with data bands for the primacy of read/green/blue band locations.

I would have thought that role=visual is enough of a marker to indicate "don't use for data processing tasks". I'd argue that raster:bands should be defined regardless, visual bands might not be useful for actual data analysis, but they help a lot in the visualization while doing actual data analysis. So I would argue wanting to load them in Python is reasonable, and for that accurate metadata is needed.

It's always easier to ignore "loadable data" than to make "unloadable data" work from the end-user perspective. It's not like lack of raster:bands prevents libs like odc-stac and stackstac from exposing visual band completely, it just limits it to the first band.

Also odc-stac should make it easier to investigate and work-around issues like this. It should be easy for the user to

  • See how "parsed STAC item" looks to odc-stac
  • Add "overrides" that support multi-band assets, current cfg= interface only supports single-band overrides
  • Discover how to do above points from documentation, github, FAQ or ..

@rbavery
Copy link
Author

rbavery commented Apr 24, 2024

for item in items:
item['assets']['visual']['raster:bands'] = [
{'nodata': 0, 'data_type': 'uint8'},
{'nodata': 0, 'data_type': 'uint8'},
{'nodata': 0, 'data_type': 'uint8'}
]

just a heads up that this doesn't work since assets don't support item assignment

for item in items:
   item.assets['visual']['raster:bands'] = [
      {'nodata': 0, 'data_type': 'uint8'},
      {'nodata': 0, 'data_type': 'uint8'},
      {'nodata': 0, 'data_type': 'uint8'}
   ]
TypeError                                 Traceback (most recent call last)
Cell In[241], line 2
      1 for item in items:
----> 2    item.assets['visual']['raster:bands'] = [
      3       {'nodata': 0, 'data_type': 'uint8'},
      4       {'nodata': 0, 'data_type': 'uint8'},
      5       {'nodata': 0, 'data_type': 'uint8'}
      6    ]

TypeError: 'Asset' object does not support item assignment

And odc.stac requires working with the pystac objects that have stac metadata as attributes, so converting the item collection to an iterable of dicts won't work.

@gadomski
Copy link
Contributor

Yeah @rbavery, pystac's extensions are getting in your way here. The assigment should be:

from pystac.extensions.raster import RasterBand

item.assets["visual"].ext.raster.bands = [
    RasterBand.create(nodata=0, data_type="uint8"),
    RasterBand.create(nodata=0, data_type="uint8"),
    RasterBand.create(nodata=0, data_type="uint8"),
]
Full test script
import json

from pystac_client import Client

from pystac.extensions.raster import RasterBand

client = Client.open("https://earth-search.aws.element84.com/v1/")
item_search = client.search(
    max_items=1,
    collections=["sentinel-2-c1-l2a"],
    intersects={"type": "point", "coordinates": [-115.3479, 51.0884]},
    sortby="-properties.datetime",
)
item = next(item_search.items())
item.assets["visual"].ext.raster.bands = [
    RasterBand.create(nodata=0, data_type="uint8"),
    RasterBand.create(nodata=0, data_type="uint8"),
    RasterBand.create(nodata=0, data_type="uint8"),
]
print(json.dumps(item.assets["visual"].to_dict()))

@rbavery
Copy link
Author

rbavery commented Apr 25, 2024

Thank you! makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants