Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerating tool chains for geozarr in different ecosystems #5

Open
rabernat opened this issue Feb 1, 2023 · 8 comments
Open

Enumerating tool chains for geozarr in different ecosystems #5

rabernat opened this issue Feb 1, 2023 · 8 comments

Comments

@rabernat
Copy link

rabernat commented Feb 1, 2023

As we discussed on today's call, it will be very useful to enumerate different tool chains used to read Zarr in a geospatial context in different languages / package ecosystems. This will help us understand where GeoZarr actually needs to be implemented.

I'll go first in a reply to this issue.

@rabernat
Copy link
Author

rabernat commented Feb 15, 2023

Toolchain: Python / Pangeo (Zarr + Dask + Xarray + MetPy)

Overview

The Pangeo ecosystem has done a lot to move Zarr forward as a cloud native format. Here's what our stack looks like for the common use case of reading Zarr from cloud storage.

If you want to parse CRS information out of your Xarray data that has been loaded in this way, I believe that your only option is MetPy's assign_crs function, but I could be wrong about that.

Dependency chain

flowchart TD
    s3fs --> zarr-python
    zarr-python --> Dask
    Dask --> Xarray
    zarr-python --> Xarray
    Xarray --> MetPy
Loading

@christophenoel
Copy link

I'm not sure to see what you mean with the CRS. You might parse it with MetPy (and I suppose with Rioxarry) if you first set it, right ?

@rabernat
Copy link
Author

I'm not sure to see what you mean with the CRS. You might parse it with MetPy (and I suppose with Rioxarry) if you first set it, right ?

What I mean is that Xarray by itself has no inherent understanding of CRS. The metadata are there, but not useful. MetPy's parse_cf function attaches an actual cartopy CRS object to the dataset, which can then be used for plotting. Here's an example I saw on twitter today: https://twitter.com/at_dot_Py/status/1627206421950664705

@rabernat
Copy link
Author

rabernat commented Mar 1, 2023

To follow up on this issue, at today's meeting, we played around with trying to parse data from Brianna's example Zarr dataset into rioxarray. Notebook here

https://gist.github.com/rabernat/8c53c380fbb38dcf556e85487960a847

@TomAugspurger
Copy link

Here's an example of rioxarray (building on rasterio & GDAL) understanding CRS information (via rioxarray's support for CF conventions I believe).

import xarray as xr
import rioxarray
import requests

token = requests.get(
    "https://planetarycomputer.microsoft.com/api/sas/v1/token/daymet-daily-hi"
).json()["token"]

ds = xr.open_dataset("abfs://daymet-zarr/daily/hi.zarr", engine="zarr", decode_coords="all", storage_options={"account_name": "daymeteuwest", "credential": token})
print(ds.rio.crs)
print("---")
print(ds.rio.transform())

which prints out

PROJCS["undefined",GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6378137,298.257223563]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["standard_parallel_1",25],PARAMETER["standard_parallel_2",60],PARAMETER["latitude_of_origin",42.5],PARAMETER["central_meridian",-100],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
---
| 1000.00, 0.00,-5802750.00|
| 0.00,-1000.00,-38500.00|
| 0.00, 0.00, 1.00|

In that case, the .crs object is a rasterio.crs.CRS. I believe it used pyproj to parse the CF metadata into a CRS object.

It doesn't appear that odc.geo correctly reads the geospatial metadata for this dataset. I'll share an example later of odc.geo correctly interpreting geospatial metadata.

@rabernat
Copy link
Author

rabernat commented Apr 3, 2023

Super useful Tom! You issue reveals the problem that different python libraries use different in-memory representation of CRS / transform. This is separate from, but closely related to, the on-disk representation (e.g. #12).

@christophenoel
Copy link

@TomAugspurger : so the source metadata only includes CF projection (grid mapping) and rioxarray handles that ? I don't remember that this was working last year on my side. Surprising.

@TomAugspurger
Copy link

You issue reveals the problem that different python libraries use different in-memory representation of CRS / transform.

Yes, at some point I want to dig into why that is. As long as they're all using the same "protocol" things aren't so bad, but still it'd be nicer to consolidate if possible.

@TomAugspurger : so the source metadata only includes CF projection (grid mapping) and rioxarray handles that ? I don't remember that this was working last year on my side. Surprising.

Yes. I think it's some combination of https://github.com/corteva/rioxarray/blob/34c64140ba1f38e020e6d4b2cd0f3e4bb4c36f62/rioxarray/rioxarray.py#L262-L277 and https://github.com/corteva/rioxarray/blob/34c64140ba1f38e020e6d4b2cd0f3e4bb4c36f62/rioxarray/rioxarray.py#L307-L310 in rioxarray. It is important to use decode_coords="all" in this specific example, otherwise rioxarray isn't able to infer what to use for the CRS. I haven't looked at why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants