-
Notifications
You must be signed in to change notification settings - Fork 41
Resolve dimension coordinates from parents #297
Comments
the trouble here is that there are many datasets where sharing coordinates is not desired or possible, or it mutates the dataset in a way that makes it invalid. So I don't think we should do this by default, and doing so on variable access just makes the code (and behavior) way too complicated. However, what I could imagine is adding a method that you can explicitly call to (shallow) copy the variables from the parent nodes to the children. |
If I understand correctly, this is how the CF Conventions scoping is supposed to work. So I'd imagine there is a great number of datasets out there where this is how you'd want to resolve coordinates, and certainly something worth making easy for users. I'd personally love to see CF Conventions by default, and optionally disabled for using non-CF datasets. |
I agree with @TomNicholas that this should not be default behavior, but would be a VERY useful option. Here is some code for a tree of xarray Datasets that I'm thinking of abandoning in favor of DataTree, but it does provide some functions for inheriting data or coords from ancestors. See the |
So the reason I did not implement this "inheritance" of information from parent groups is that (a) it was considerably simpler not to and (b) I thought it could lead to inconsistent states. However, today I had a long discussion with @shoyer about this (plus @flamingbear, @etienneschalk and @eni-awowale), and now I think this might be possible. My original concern was what happens if I inherit a variable with a dimension with the same name but different length to what's already present in this node? i.e.
where the variables This difficulty is why I did not implement this feature. But looking again today at the CF conventions for groups, it says
In the xarray model without explicit dimension objects, this translates to "the two dimensions must be the same length". Interpreting this "must" to mean "if this is not the case don't try to inherit this variable", I think this caveat resolves the concern I had. We could imagine changing the behaviour of
(This is essentially the "search by proximity" described in the CF conventions, just without the deprecated "lateral search" stuff.) The questions now are:
|
A practical API question here is whether or not inherited variables are considered to be "part of" the child
should |
I think We could easily have another interface for surfacing only the "local" dataset (e.g., Incidentally, this would be a reasonable way to organize the contents of a node:
|
I quite like this idea. Providing a
I think I can test the inheritance idea orthogonally from this suggested refactor. |
If a variable at the root is inherited down through the whole tree, it's now effectively present everywhere. If I then map a function/method over the whole tree with EDIT: Of course inheriting variables downwards + mapping only over the leaves does not add up to give equivalent behaviour to not inheriting + mapping over the whole tree... |
closing this in favor of pydata/xarray#9056: we've been discussing this for a while in the datatree meetings, and the version in |
Basically this issue but for datatree pydata/xarray#1982
We have some hourly and weekly data in different groups, and shared coordinates in the root. For example, I have a netCDF with the following structure:
I was hoping datatree would add the shared coordinates variables when I access one of the child groups, but instead if I access one of the Datasets, I have a whole bunch of dimensions without coordinates. Eg:
Is this something you think datatree should/will address? Thanks!
The text was updated successfully, but these errors were encountered: