Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize coords as int32 dtype automatically #87

Closed
TomNicholas opened this issue Jan 16, 2024 · 6 comments · Fixed by #117
Closed

Standardize coords as int32 dtype automatically #87

TomNicholas opened this issue Jan 16, 2024 · 6 comments · Fixed by #117

Comments

@TomNicholas
Copy link
Contributor

Apparently (I'm told by @katamartin) the carbonplan-maps tools require any coordinates in the pyramid zarr store to be of int32 dtype. See her message to me:

the zarr-js client doesn’t support dtype: <i8, which a couple of coords are using. sort of annoying, but this is because JS starts requiring BigInt at 64-bit (realizing that maybe this validation should be encoded into ndpyramid?).

This should presumably be implemented as an automatic check / coercion in ndpyramid, just before the xarray-datatree object is returned.

@maxrjones
Copy link
Contributor

This is the complete link of data types supported by zarr-js for Zarr V2 data, which is used to load the data and afaik shouldn't be different between the non-coordinate variables and coordinates (perhaps @katamartin can correct me if I'm wrong):

  const constructors = {
    '<i1': Int8Array,
    '<u1': Uint8Array,
    '|b1': BoolArray,
    '|u1': Uint8Array,
    '<i2': Int16Array,
    '<u2': Uint16Array,
    '<i4': Int32Array,
    '<u4': Uint32Array,
    '<f4': Float32Array,
    '<f8': Float64Array,
    '<U': StringArray,
    '|S': StringArray,
  }

I agree that we should at least have a validator for carbonplan/maps and zarr-js requirements in ndpyramid (xref carbonplan/maps#14). As people may be interested in using these pyramids outside the @carbonplan/maps stack, I would lean against automatic coercion in ndpyramid but would be interested in any arguments for that case.

@TomNicholas
Copy link
Contributor Author

As people may be interested in using these pyramids outside the @carbonplan/maps stack, I would lean against automatic coercion in ndpyramid but would be interested in any arguments for that case.

This is in general unclear to me: Does ndpyramid create general, unopinionated pyramids for a range of use cases, or does it specifically create pyramids with the intention that the output immediately works with carbonplan-maps? I had thought it was the latter. If it's not, then there should be a clearer separation in the ndpyramid code/api/docs of which choices are carbonplan-maps-specific.

@maxrjones
Copy link
Contributor

maxrjones commented Jan 19, 2024

As people may be interested in using these pyramids outside the @carbonplan/maps stack, I would lean against automatic coercion in ndpyramid but would be interested in any arguments for that case.

This is in general unclear to me: Does ndpyramid create general, unopinionated pyramids for a range of use cases, or does it specifically create pyramids with the intention that the output immediately works with carbonplan-maps? I had thought it was the latter. If it's not, then there should be a clearer separation in the ndpyramid code/api/docs of which choices are carbonplan-maps-specific.

I wrote up https://ndpyramid.readthedocs.io/en/latest/schema.html to hopefully offer some clarity regarding this question as well as #78 (comment). Please feel welcome to re-open #78 if there's still confusion. The docs are quite new, so it's great that you're pointing out these places for improvement!

@TomNicholas
Copy link
Contributor Author

Oh that's so much clearer now, thank you for adding that! If I find any specific things that could be clearer I'll post a new issue 🙂

I think one thing we could add is a translation of what the data types supported by zarr-js imply for the user of ndpyramid working in python. I mean it's fairly obvious about the ints and so on but as someone who has never used Javascript there might be some subtlety with the string types or something...

@TomNicholas
Copy link
Contributor Author

I'm still a bit confused by something: pyramid_coarsen does not do anything specific to carbonplan-maps, but pyramid_reproject and pyramid_regrid both call set_zarr_encoding_and_metadata at the end, which does several things that are specific to carbonplan-maps. Is this a meaningful distinction? If so I think it should be highlighted in the docs.

@maxrjones
Copy link
Contributor

maxrjones commented Mar 20, 2024

@TomNicholas after working more with ndpyramid recently, in particular to help with the pangeo-forge integration, I have come to agree that it would be best to standardize the coords automatically to avoid obscure errors in the mapping library. Since it seems you've been using the tool as well, I would welcome any feedback on #117.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants