Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening a dataset doesn't display groups. #4840

Closed
dklink opened this issue Jan 23, 2021 · 3 comments
Closed

Opening a dataset doesn't display groups. #4840

dklink opened this issue Jan 23, 2021 · 3 comments
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@dklink
Copy link

dklink commented Jan 23, 2021

Problem

I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it?

Solution

When you open a dataset with the netcdf4-python library, you get something like this:

>>> netCDF4.Dataset(path)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
some global attribute: some value
dimensions(sizes): ...
variables(dimensions): ...
groups: group1, group2

"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:

>>> xr.open_dataset(path)
<xarray.Dataset>
Dimensions: ...
Coordinates: ...
Data variables: ...
Attributes: ...
Groups: group1, group2

Workaround

The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file.

Conclusion

Considering that xr.open_dataset has a group parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a __str__ method on the Dataset class, which is where I expected to make this addition.

@dklink
Copy link
Author

dklink commented Jan 23, 2021

Update: after diving into the way the source code works, it seems group information would actually have to get loaded on the backend loaders; this is a pretty deep code change. The minimal diff seems like it would be to load the group names, then add to the global attrs dictionary {"groups": "group1, group2, ..."}. This way, they would automagically propagate all the way through the codebase to the __repr__ call and show up in the output string. Of course, it's a little clugey, because the names of the groups aren't really an attribute of the underlying file. And if there's already an attribute named 'groups'? Tricky, not sure what the optimal resolution to that is; probably just don't overwrite it and do nothing. But the alternative is creating a representation for "groups" alongside "dimensions", "coordinates", "data variables", and "attributes", and adding machinery for these throughout the code base, changing method signatures, etc, which is really more moving in the direction of Datasets actually supporting groups, which is a whole different undertaking. This is just supposed to be a bit more visibility into the underlying netcdf file. Unsure if this moderate level of cluge is acceptable or not though.

@keewis
Copy link
Collaborator

keewis commented Jan 24, 2021

related to #4118

@kmuehlbauer
Copy link
Contributor

Is this still an issue, now, with DataTree available? Maybe this can be closed, then?

@dcherian dcherian closed this as completed Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

4 participants