Skip to content

Conversation

emmanuelmathot
Copy link
Contributor

@emmanuelmathot emmanuelmathot commented Sep 8, 2025

This PR introduces attribute extensions as a new extension point and adds the first attribute extension for coordinate reference system (CRS) metadata.

Summary

  • Creates new "Attributes" extension category
  • Adds geo:proj attribute extension for CRS metadata
  • Based on STAC Projection Extension v2.0.0
  • Supports multiple CRS representations (EPSG codes, WKT2, PROJJSON)
  • Handles multi-dimensional arrays with explicit spatial dimension identification

Key Design Decisions

  1. New extension point: Attributes join codecs, data types, etc. as standardizable components
  2. Hybrid spatial dimension identification: Explicit spatial_dimensions with fallback to naming conventions
  3. Multiple CRS formats: Support for EPSG codes, WKT2, and PROJJSON with defined precedence
  4. Inheritance: Group-level CRS applies to child arrays unless overridden

Files Changed

  • README.md - Added Attributes to extension points list
  • attributes/README.md - New file defining attribute extensions
  • attributes/projection/README.md - Projection extension specification
  • attributes/projection/schema.json - JSON schema for validation

Testing Checklist

  • Schema validates against example JSON
  • Schema formatted with npx prettier -w **/schema.json

Related Discussions

This extension provides a simple, standardized way to add CRS metadata to Zarr arrays without the complexity issues identified in GeoZarr discussions around CF conventions.

cc @d-v-b @vincentsarago @j08lue

@jbms
Copy link
Contributor

jbms commented Sep 9, 2025

I think the name of the directory should match the attribute name --- i.e. proj:crs rather than projection.

…and update documentation for spatial dimension identification
@benbovy
Copy link

benbovy commented Sep 9, 2025

This is an interesting proposal and I find the similarities with STAC/projection quite appealing.

Having a background as Xarray developer where CF conventions / grid mapping variables are more common, I'm approaching it with some reservations (and I might not be the only one :-)). Thinking more about it -- trying to do so out-of-the-box -- helped to overcome some of my initial concerns. One of my remaining concerns is about the translation between this model and the CF conventions, which seems tricky and/or cumbersome at least for some cases.

When proj:crs is defined at the group level, it applies to all arrays within that group unless overridden at the array level.

This implies that there’s a unique “main” CRS for the group, with the possibly of a few exceptions. Does that represent well the broad range of use cases (EO data products, models, etc.)?

This also enforces a single CRS defined (or inherited) for an array, unlike CF grid mappings and particularly its extended syntax. @mdsumner would likely agree with that zarr-developers/geozarr-spec#90 (comment), although his rant is more about explicit (NetCDF-style) spatial coordinates than about multiple grid mappings. There may still be a value in being able to create one or more sets of lazy, auxiliary spatial coordinates from the metadata, but I don’t clearly see how this proposal would allow it. Conversion of the CF grid mapping extended syntax into this model might be lossy / roundtrip conversion might not be possible.

Get a list of all CRSs defined in the group requires to iterate over all arrays within the group, i.e., we don’t know whether there’s a single CRS until we checked all the arrays. Get a list of all unique CRSs (e.g., for conversion to CF grid mapping variables) requires to parse & compare all geoj:proj attributes found within the group. Maybe not a big deal, though?

@emmanuelmathot
Copy link
Contributor Author

Thank you @benbovy for the thoughtful feedback. I understand your concerns about overlap with existing conventions, and I appreciate you taking the time to review this proposal.

You're right that there is overlap with CF conventions, and this is intentional but for different reasons than might first appear:

1. Tooling Independence: While xarray has excellent CF support, this extension aims to serve the broader Zarr ecosystem. Many tools that work with geospatial Zarr data aren't xarray-based (GDAL, Zarr.jl, geospatial databases, web mapping libraries). By creating a Zarr-native extension, we enable these tools to read/write CRS metadata without requiring netCDF/CF dependencies.

2. Exploring Alternative Approaches: The geospatial community has been exploring various ways to encode CRS metadata in Zarr. This extension represents another approach - one that prioritizes simplicity and direct mapping from established standards like STAC. By offering this as an optional extension, we can gather real-world feedback on what works best for different use cases while other standardization efforts continue in parallel.

3. Community Demand for Simplicity: Multiple developers in the GeoZarr discussions have explicitly requested a simpler approach for basic raster mapping. While CF conventions excel at complex use cases (bounds, irregular grids, etc.), many users just need to specify "this raster uses EPSG:3857 with this transform." This extension serves that need without preventing future CF-compatible extensions for more complex cases.

4. Extension as Experimentation: The Zarr extensions mechanism allows us to test ideas in practice. If this extension gains adoption, it provides valuable data about what the community actually needs. If it doesn't, we've learned something important without blocking other approaches.

Rather than seeing this as competition with CF conventions, I view it as complementary - a simple solution for simple cases that can coexist with more comprehensive solutions. The perfect shouldn't be the enemy of the good.

Would it help if we explicitly documented in the README that this extension is designed for simple raster cases and that users requiring more complex coordinate systems should consider CF conventions? This could help clarify the scope and prevent confusion about when to use each approach.

@emmanuelmathot
Copy link
Contributor Author

emmanuelmathot commented Sep 11, 2025

Proposal ready for review by the @zarr-developers/steering-council
cc @maxrjones @joshmoore @normanrz @alimanfoo @rabernat

@jbms
Copy link
Contributor

jbms commented Sep 11, 2025

One potential issue with the directory naming is that on Windows : is not allowed in paths. If we are going to stick to the prefix:suffix syntax then we could use prefix/suffix as the path.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 12, 2025

this looks really good! I left a few comments / suggestions about the prose, and I think we should include a demo dataset. But otherwise I think this looks like a good jumping off point for experimenting on the implementation side.

@benbovy
Copy link

benbovy commented Sep 12, 2025

Rather than seeing this as competition with CF conventions, I view it as complementary - a simple solution for simple cases that can coexist with more comprehensive solutions. The perfect shouldn't be the enemy of the good.

Would it help if we explicitly documented in the README that this extension is designed for simple raster cases and that users requiring more complex coordinate systems should consider CF conventions? This could help clarify the scope and prevent confusion about when to use each approach.

I fully agree that the perfect shouldn't be the enemy of the good. In general I very much agree with the reasons mentioned in #21 (comment).

I'm not sure whether the "good" here should preferably consist in multiple co-existing solutions very suited to different cases or one good but non-perfect solution that works well enough for the majority of cases, though. From the perspective of maintainers of generic tools it is certainly easier to deal with the latter than the former.

It is possible that the proposal here is already a good candidate for the latter, actually!

@emmanuelmathot
Copy link
Contributor Author

We're optimizing for the common case, not the perfect solution.

This extension solves the 80% problem: simple 2D rasters need CRS metadata in Zarr. STAC Projection already serves this exact use case for millions of datasets.

For those suggesting broader scope (CRS/transform separation, 3D/4D data, CF compliance, OGC alignment): these are valid needs but different problems. Extensions are meant to be focused and composable, not comprehensive solutions.

What do we want to do? ship a working solution for the majority use case now, or spend another year debating edge cases while the community continues without standardized CRS metadata in Zarr?

Let's keep this focused on its intended scope and move toward an approval and a merge in a reasonable timing.

@normanrz
Copy link
Member

Thanks @emmanuelmathot for your continued work on this proposal.

Much like anyone can create Github repositories or upload packages to PyPI, registering extensions in this repository is meant to be a lightweight process. The ZSC only reviews to avoid confusing names and to prevent malicious activities. So, in general the authors who propose new extensions should feel empowered to decide when they want to request final review to get the PR merged. Similarly, extensions can be changed at any time upon request of the extension maintainer.

Since this is the first "registered attributes" extension, there is a bit more work than we intend for the future. This PR includes procedures for registering attributes, which is a bit stricter than what we outlined for other extension types, e.g. MUST have JSON schema. I would like to discuss that in our next ZSC meeting. Ideally, that part would be disassociated from this PR, but we can also modify it after this PR has been merged.

I would like to reiterate that the folder name has a : which will likely cause issues on Windows machines.

Are there comments from the community about the choice of the name geo:proj? It seems quite broad. I had assumed that GeoZarr might want to register the geo name for its metadata attributes.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 17, 2025

I would like to reiterate that the folder name has a : which will likely cause issues on Windows machines.

Are there comments from the community about the choice of the name geo:proj? It seems quite broad. I had assumed that GeoZarr might want to register the geo name for its metadata attributes.

What's blocking us from using geo/proj here?

@emmanuelmathot
Copy link
Contributor Author

What's blocking us from using geo/proj here?

Nothing. I just kept it until now to keep the suggestions and comments aligned. I will move the folder soon.

@pvanlaake
Copy link

What do we want to do? ship a working solution for the majority use case now, or spend another year debating edge cases while the community continues without standardized CRS metadata in Zarr?

I sense some frustration here, which is unfortunate. Do you want to ship a working solution now that gets shelved within the year because it is deficient or unworkable, or do you want to take some time to learn from the perspective of others who may have suggestions for improvements or who can point out weak parts of your proposal? I have not seen anyone ready to throw up major roadblocks, quite the opposite.

So shortcomings in the GeoZarr specification and tenuous alignment with OGC standards aside, what about the other points raised? I see a particular failure point in the section on "Spatial Dimension Identification". Such inference approaches are a beast to implement for data consumers and easy to get wrong. Simply requiring that "spatial_dimensions" is specified would already get rid of pattern-based detection, by far the most problematic part of the proposal.

Otherwise, this ability to specify a CRS is long overdue and thus very welcome.

@emmanuelmathot
Copy link
Contributor Author

@normanrz

Since this is the first "registered attributes" extension, there is a bit more work than we intend for the future. This PR includes procedures for registering attributes, which is a bit stricter than what we outlined for other extension types, e.g. MUST have JSON schema. I would like to discuss that in our next ZSC meeting. Ideally, that part would be disassociated from this PR, but we can also modify it after this PR has been merged.

I will open a PR just for the attributes extension.

I would like to reiterate that the folder name has a : which will likely cause issues on Windows machines.

Will do

Are there comments from the community about the choice of the name geo:proj? It seems quite broad. I had assumed that GeoZarr might want to register the geo name for its metadata attributes.

In my mind, this is actually meant to be part of GeoZarr so I believe the scope of this proposal is worth the geo prefix

@emmanuelmathot
Copy link
Contributor Author

Attribute extension point proposal and related discussions in #23

With data arrays:

- `temperature/`: `dimension_names: ["time", "lat", "lon"]` ✅ Contains ["lat", "lon"]
- `precipitation/`: `dimension_names: ["time", "lat", "lon"]` ✅ Same pattern

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this section! I think it makes things clearer. My questions around ordering essentially boil down to: Would it pass if "lon" and "lat" were flipped?

- `temperature/`: `dimension_names: ["time", "lon", "lat"]`
- `precipitation/`: `dimension_names: ["time", "lon", "lat"]`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@maxrjones
Copy link
Member

What do we want to do? ship a working solution for the majority use case now, or spend another year debating edge cases while the community continues without standardized CRS metadata in Zarr?

Let's keep this focused on its intended scope and move toward an approval and a merge in a reasonable timing.

I think there's a few small changes, mentioned in my PR review, that will make this composable for 3+ dimensional geospatial datasets. But I understand that you want this merged now and support that. Would opening a PR making this framework composable after it's merged be an acceptable compromise for you? Then, if it turns into a bike-shedding exercise you can just ignore the conversation and progress with this version.

@joshmoore
Copy link
Member

In my mind, this is actually meant to be part of GeoZarr so I believe the scope of this proposal is worth the geo prefix

If that's a proposal, I could definitely see getting behind it. As a reviewer on this repo, I'd naively look towards the prevailing GeoZarr consensus-mechanisms for verifying that. I might, for example, look for a statement in the GeoZarr-spec repo.

- Deleted the old README.md and schema.json files for the geo:proj extension.
- Created a new README.md that clarifies the usage of the `proj` key under the `geo` dictionary, including inheritance rules and detailed property descriptions.
- Updated the schema.json to reflect the new structure and properties for the geo projection attributes, ensuring compatibility with the latest specifications.
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. I had a minor request for some textual changes, but we can also refine this further on down the road.

@emmanuelmathot
Copy link
Contributor Author

According to #23 , the geo/proj specification now lives in https://github.com/EOPF-Explorer/data-model. The PR is still under review in EOPF-Explorer/data-model#39 and will be merged soon.

@emmanuelmathot
Copy link
Contributor Author

EOPF-Explorer/data-model#39 merged.
@normanrz @joshmoore @rabernat Glad if we can merge this one as well. Thx

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor link fixes but in general, this looks pretty ideal from my POV as a template on what we (or at last I) would want these pages to look like. Thank you! 🙌🏽

My one caveat would be regarding the ownership of the top-level "geo" namespace. I like what you've done here since it matches my understanding but it might require an additional step outside of this PR of getting sign-off on the parent key before moving forward.

Copy link

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emmanuelmathot for your perseverance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.