-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray Dataset #1486
Comments
Completely agree with adding support for this, even if it means more deps. @RitwikGupta and @cjrd are our climate experts and may also have thoughts on the best way to do this. I'm not sure if/how we could support a 4th (z) dimension that frequently comes with climate data. But lets first focus on how to best handle xarray, especially when it comes to reprojection and geospatial indexing. Making a new subclass of GeoDataset that works similarly to RasterDataset will already be a big enough endeavor. Can't remember if @isaaccorley ever worked on this before. |
I've use it to load some netcdf files. It has good support for climate datasets and seems like it's widely used by the community. |
@weiji14 is another expert here (driver of #509) and is independently supporting stuff like this in zen3geo https://github.com/weiji14/zen3geo |
I would like to continue discussing how to best implement this @nilsleh! I am new to torchgeo, but one challenge here is the current structure of the base I also think for simplicity, it would be easiest to support a single |
This is only true for |
I will try to summarize the discussion and pain points encountered so far when we first started looking at this. Generally, one could consider a sort of similar distinction for these Grid based datasets as we have for Raster datasets.
|
Thanks for the clarifications and explanations @adamjstewart and @nilsleh! @adamjstewart, here I am referring to the @nilsleh I think it would be helpful to constrain the xarray dataset class to be a I think it's helpful to defer the work to set metadata and merging arrays to either (1) the user loading datasets from disk or (2) a subclass operating on fixed paths, like the other The |
@nilsleh here is a draft, not tested yet, curious to hear your thoughts! https://github.com/microsoft/torchgeo/compare/main...noahgolmant:torchgeo:noah/xarray?expand=1 |
Cool, so of course difficult to say without tests, but from first glance it looks like it could work. Generally, However, ideally everything would be a |
It's also about whether or not two datasets can be combined via intersection/union. So if you have a benchmark dataset where you don't need to combine it, NonGeo is fine. But if it's just a single raster input layer or mask, you'll need it to be GeoDataset so you can combine with other datasets (either another xarray dataset or any other dataset as well). |
Summary
I am working with different climate data sources that come in the form of .netcdf files and xarrays. Although, I am not an expert in that domain, it seems that this is the go to data format that is frequently used. Since there are lots of features in Torchgeo that I would like to use with this data without having to reformat to tiff files for example, I think it could be quiet powerful to add support for Xarray datasets, even though it would be another couple dependencies to add to Torchgeo.
Rationale
This could quiet possibly extend the horizon of users to other communities that work with Xarray data and benefit from all the tools Torchgeo already provides. In the majority of cases climate data also comes in the form of time-series so this would go hand in hand with the planned support for TimeSeries models and dataloading stuff in Torchgeo.
Implementation
Both in #412 and #509 there was some discussion about this, but nothing was finalized. I am definitely willing to start on this but don't have a detailed plan yet as I first wanted to gather opinions on this.
Alternatives
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: