-
Notifications
You must be signed in to change notification settings - Fork 9
Proposal: Projection Attribute Extension for Zarr v3 #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I think the name of the directory should match the attribute name --- i.e. |
…bute Extension documentation and schema
…and update documentation for spatial dimension identification
…ma for consistency
…detailed examples and clarifications
…additional examples and explanations
This is an interesting proposal and I find the similarities with STAC/projection quite appealing. Having a background as Xarray developer where CF conventions / grid mapping variables are more common, I'm approaching it with some reservations (and I might not be the only one :-)). Thinking more about it -- trying to do so out-of-the-box -- helped to overcome some of my initial concerns. One of my remaining concerns is about the translation between this model and the CF conventions, which seems tricky and/or cumbersome at least for some cases.
This implies that there’s a unique “main” CRS for the group, with the possibly of a few exceptions. Does that represent well the broad range of use cases (EO data products, models, etc.)? This also enforces a single CRS defined (or inherited) for an array, unlike CF grid mappings and particularly its extended syntax. @mdsumner would likely agree with that zarr-developers/geozarr-spec#90 (comment), although his rant is more about explicit (NetCDF-style) spatial coordinates than about multiple grid mappings. There may still be a value in being able to create one or more sets of lazy, auxiliary spatial coordinates from the metadata, but I don’t clearly see how this proposal would allow it. Conversion of the CF grid mapping extended syntax into this model might be lossy / roundtrip conversion might not be possible. Get a list of all CRSs defined in the group requires to iterate over all arrays within the group, i.e., we don’t know whether there’s a single CRS until we checked all the arrays. Get a list of all unique CRSs (e.g., for conversion to CF grid mapping variables) requires to parse & compare all |
Thank you @benbovy for the thoughtful feedback. I understand your concerns about overlap with existing conventions, and I appreciate you taking the time to review this proposal. You're right that there is overlap with CF conventions, and this is intentional but for different reasons than might first appear: 1. Tooling Independence: While xarray has excellent CF support, this extension aims to serve the broader Zarr ecosystem. Many tools that work with geospatial Zarr data aren't xarray-based (GDAL, Zarr.jl, geospatial databases, web mapping libraries). By creating a Zarr-native extension, we enable these tools to read/write CRS metadata without requiring netCDF/CF dependencies. 2. Exploring Alternative Approaches: The geospatial community has been exploring various ways to encode CRS metadata in Zarr. This extension represents another approach - one that prioritizes simplicity and direct mapping from established standards like STAC. By offering this as an optional extension, we can gather real-world feedback on what works best for different use cases while other standardization efforts continue in parallel. 3. Community Demand for Simplicity: Multiple developers in the GeoZarr discussions have explicitly requested a simpler approach for basic raster mapping. While CF conventions excel at complex use cases (bounds, irregular grids, etc.), many users just need to specify "this raster uses EPSG:3857 with this transform." This extension serves that need without preventing future CF-compatible extensions for more complex cases. 4. Extension as Experimentation: The Zarr extensions mechanism allows us to test ideas in practice. If this extension gains adoption, it provides valuable data about what the community actually needs. If it doesn't, we've learned something important without blocking other approaches. Rather than seeing this as competition with CF conventions, I view it as complementary - a simple solution for simple cases that can coexist with more comprehensive solutions. The perfect shouldn't be the enemy of the good. Would it help if we explicitly documented in the README that this extension is designed for simple raster cases and that users requiring more complex coordinate systems should consider CF conventions? This could help clarify the scope and prevent confusion about when to use each approach. |
Proposal ready for review by the @zarr-developers/steering-council |
One potential issue with the directory naming is that on Windows |
this looks really good! I left a few comments / suggestions about the prose, and I think we should include a demo dataset. But otherwise I think this looks like a good jumping off point for experimenting on the implementation side. |
I fully agree that the perfect shouldn't be the enemy of the good. In general I very much agree with the reasons mentioned in #21 (comment). I'm not sure whether the "good" here should preferably consist in multiple co-existing solutions very suited to different cases or one good but non-perfect solution that works well enough for the majority of cases, though. From the perspective of maintainers of generic tools it is certainly easier to deal with the latter than the former. It is possible that the proposal here is already a good candidate for the latter, actually! |
We're optimizing for the common case, not the perfect solution. This extension solves the 80% problem: simple 2D rasters need CRS metadata in Zarr. STAC Projection already serves this exact use case for millions of datasets. For those suggesting broader scope (CRS/transform separation, 3D/4D data, CF compliance, OGC alignment): these are valid needs but different problems. Extensions are meant to be focused and composable, not comprehensive solutions. What do we want to do? ship a working solution for the majority use case now, or spend another year debating edge cases while the community continues without standardized CRS metadata in Zarr? Let's keep this focused on its intended scope and move toward an approval and a merge in a reasonable timing. |
Thanks @emmanuelmathot for your continued work on this proposal. Much like anyone can create Github repositories or upload packages to PyPI, registering extensions in this repository is meant to be a lightweight process. The ZSC only reviews to avoid confusing names and to prevent malicious activities. So, in general the authors who propose new extensions should feel empowered to decide when they want to request final review to get the PR merged. Similarly, extensions can be changed at any time upon request of the extension maintainer. Since this is the first "registered attributes" extension, there is a bit more work than we intend for the future. This PR includes procedures for registering attributes, which is a bit stricter than what we outlined for other extension types, e.g. MUST have JSON schema. I would like to discuss that in our next ZSC meeting. Ideally, that part would be disassociated from this PR, but we can also modify it after this PR has been merged. I would like to reiterate that the folder name has a Are there comments from the community about the choice of the name |
What's blocking us from using |
Nothing. I just kept it until now to keep the suggestions and comments aligned. I will move the folder soon. |
I sense some frustration here, which is unfortunate. Do you want to ship a working solution now that gets shelved within the year because it is deficient or unworkable, or do you want to take some time to learn from the perspective of others who may have suggestions for improvements or who can point out weak parts of your proposal? I have not seen anyone ready to throw up major roadblocks, quite the opposite. So shortcomings in the GeoZarr specification and tenuous alignment with OGC standards aside, what about the other points raised? I see a particular failure point in the section on "Spatial Dimension Identification". Such inference approaches are a beast to implement for data consumers and easy to get wrong. Simply requiring that "spatial_dimensions" is specified would already get rid of pattern-based detection, by far the most problematic part of the proposal. Otherwise, this ability to specify a CRS is long overdue and thus very welcome. |
I will open a PR just for the attributes extension.
Will do
In my mind, this is actually meant to be part of GeoZarr so I believe the scope of this proposal is worth the |
Attribute extension point proposal and related discussions in #23 |
attributes/geo:proj/README.md
Outdated
With data arrays: | ||
|
||
- `temperature/`: `dimension_names: ["time", "lat", "lon"]` ✅ Contains ["lat", "lon"] | ||
- `precipitation/`: `dimension_names: ["time", "lat", "lon"]` ✅ Same pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this section! I think it makes things clearer. My questions around ordering essentially boil down to: Would it pass if "lon" and "lat" were flipped?
- `temperature/`: `dimension_names: ["time", "lon", "lat"]`
- `precipitation/`: `dimension_names: ["time", "lon", "lat"]`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
I think there's a few small changes, mentioned in my PR review, that will make this composable for 3+ dimensional geospatial datasets. But I understand that you want this merged now and support that. Would opening a PR making this framework composable after it's merged be an acceptable compromise for you? Then, if it turns into a bike-shedding exercise you can just ignore the conversation and progress with this version. |
If that's a proposal, I could definitely see getting behind it. As a reviewer on this repo, I'd naively look towards the prevailing GeoZarr consensus-mechanisms for verifying that. I might, for example, look for a statement in the GeoZarr-spec repo. |
- Deleted the old README.md and schema.json files for the geo:proj extension. - Created a new README.md that clarifies the usage of the `proj` key under the `geo` dictionary, including inheritance rules and detailed property descriptions. - Updated the schema.json to reflect the new structure and properties for the geo projection attributes, ensuring compatibility with the latest specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. I had a minor request for some textual changes, but we can also refine this further on down the road.
Co-authored-by: Davis Bennett <[email protected]>
According to #23 , the |
EOPF-Explorer/data-model#39 merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor link fixes but in general, this looks pretty ideal from my POV as a template on what we (or at last I) would want these pages to look like. Thank you! 🙌🏽
My one caveat would be regarding the ownership of the top-level "geo" namespace. I like what you've done here since it matches my understanding but it might require an additional step outside of this PR of getting sign-off on the parent key before moving forward.
Co-authored-by: Josh Moore <[email protected]>
Co-authored-by: Josh Moore <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emmanuelmathot for your perseverance!
This PR introduces attribute extensions as a new extension point and adds the first attribute extension for coordinate reference system (CRS) metadata.
Summary
geo:proj
attribute extension for CRS metadataKey Design Decisions
spatial_dimensions
with fallback to naming conventionsFiles Changed
README.md
- Added Attributes to extension points listattributes/README.md
- New file defining attribute extensionsattributes/projection/README.md
- Projection extension specificationattributes/projection/schema.json
- JSON schema for validationTesting Checklist
npx prettier -w **/schema.json
Related Discussions
grid_mapping
variables in GeoZarr geozarr-spec#90 (grid_mapping variables)geozarr
attribute key geozarr-spec#88 (attribute organization)This extension provides a simple, standardized way to add CRS metadata to Zarr arrays without the complexity issues identified in GeoZarr discussions around CF conventions.
cc @d-v-b @vincentsarago @j08lue