Feature Request: Hierarchical storage and processing in xarray

I am using xarray for processing geospatial data and have encountered two major challenges with existing data structures in xarray:

- Data arrays stored in an xarray Dataset cannot be grouped into hierarchical levels/logical subsets to reflect the internal organisation of the data. This makes it difficult to identify and process a subset of the data variables that pertain to a specific problem. 

- When two data arrays having a shared dimension but different coordinate values along the dimension are merged into a Dataset, the union of coordinate values from the 2 data arrays becomes the new coordinate set corresponding to that dimension. Consequently, when the value of a variable in the dataset corresponding to a coordinate value is unknown, `nan` is used as a substitute which results in memory wastage. 

I would like to suggest a tree-based data structure for xarray in which the leaves store individual data arrays and the other nodes store the hierarchical information. Since data arrays are stored independently, each dimension only needs to be associated with coordinate values that are valid for that data array. 

To meet these requirements, I have implemented a data structure that also supports the below capabilities: 
- Standard xarray methods can be applied to the tree at all hierarchical levels, i.e., when a function is called at a hierarchical level, it is mapped over all data arrays that occur at the leaves under the corresponding node. For example, say I have a tree object (lets call it `dt`) with child nodes: `weather`, `satellite image` and `population`. Each of these nodes has data arrays/subtrees under it. 

>  ![Screenshot 2020-06-02 at 2 10 28 AM](https://user-images.githubusercontent.com/39640592/83452402-42152680-a476-11ea-9e88-cfb4ddb80310.png)

The mean over time of all data variables associated with weather can be obtained using `dt.weather.mean('time')` which applies the function to `sea_surface_temperature`, `dew_point_temperature`, `wind_speed` and `pressure`.

- It can be encoded into the netCDF format, like xarray Datasets.
- It supports item assignment at all hierarchical levels.

I would like to know of the possibility of introducing such a data structure in xarray and the challenges involved in the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature Request: Hierarchical storage and processing in xarray #4118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Hierarchical storage and processing in xarray #4118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions