-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What makes a dimension a grid? #127
Comments
I made up 'grid', I didn't find a name for it. My key insight was that netcdf has dimensions upfront but then scatters them about in a long list of variables, to me there are variables that belong together (like columns in a table) because they use the same dimensions. So, a grid in my weird take is instances of dimensions grouped together. (I'm using 'group' divorced from the hdf/netcdf meaning here). When a variable uses only one dimension, that makes that a group of one (a grid with one dimension, that might have many variables associated with it). Anything in a "grid" can be slurped up together because they have the same shape. I found that a natural way to "group" variables, something entirely lost in the Does that make sense? I did find a word eventually that maybe would have been better. |
Maybe "shape" is what it should have been, I've become used to that term in python. There are 4 shapes in that file above, and it's possible that the "time" shape could be used by multiple variables, but it isn't here (when "time" is used in sst it's part of the D0,D1,D2 shape, an unfortunate label but I couldn't think of anything better). |
Hmmm. You're stretching my brain here like a rubber band - I hope it doesn't snap! So, that makes the data (sst) a shape? |
No it has a shape. Those files could have more variables, in fact the daily ones do, sst, anom, ice, err all exist on the same grid/shape. |
Ah, the old "is a" versus "has a" is new again! I think I have caught just enough of this to be able to lurch forward. I'll try to say back what I think you are saying... Grid section Dimension section The dimension section shows the details of each dimension (of which one or more might be "active"). Active dimensions show additional info ( Am I getting closer? |
Yes 🙌 |
and so this one, has more sets of grids
I said that a 1D "grid" might have multiple variables, but actually I think that probably never happens. Above the "time" dimension occurs in 3 grids 'D1,D2,D3,D7', 'D1,D2,D7', and 'D7'. If I activate the first, then hyper_array/hyper_tibble will emit values from a 4D variable, but if I activate("D7") I'll just get the values from the "time" var, the only variable that is defined on that grid. It allows the "activate" scheme to work on any set of variables and not treat any of them as special. This was all done before I knew anything about xarray of course, and I'd defer to concepts there going forward. (Be nice if they can transcend the degenerate rectlinear model from netcdf/matlab though). |
here's a more complex one, where the D8 and D9 grids do have multiple 1D vars associated with them. In ncdumph output you have to scan the variables to see which ones go together, which is why I organized this way.
|
xarray at least lists them together so it's easier to scan
|
Hmmm. Not to get hung up on semantics, but in xarray what is the difference between dimensions and coordinates? |
The same difference as in netcdf, MATLAB, and R. |
Oh, well now that's embarrassing - after 20 something years of mucking about in these things I have never distinguished between the two. I guess I have some homework. |
I'm not sure what else to say, R arrays are dimensioned but have no coordinates, image(x, y, z) uses the same coordinate model as most netcdf files (option to use centre OR edge) and xarray formalizes that in n-dimensions. Sadly image(z) puts the space in 0,1 but I think that's mistake - though not the only unconventional choice cf. rasterImage() - and see now terra and stars use 0,ncol/nrow (also note GDAL defaults to +y which flips an image, though there's a difference between most imagery and netcdf in that regard). The index of a dimension is the coordinates by default, but it's not the dimension. (Terminology is tough here, dimension and resolution get conflated but I think dimension is clear in the R context). 🙏 xarray calls coordinates labelling as well, which I find weird - but a bit like R's dimnames which were never really leveraged. |
Yeah, I think I have just been lazy in my thinking (in fact, I'm an expert in laziness at this point), and I haven't had to think about it until now. |
BTW - I just found your blog post (pre-pandemic!), and this section really gets at the nut of it very well. |
ah indeed, glad that helps |
I think I am getting the hang of it thanks to your many many examples. Of course my first foray had to be curvilinear grids where the lon/lat transforms are stored in a separate NetCDF from the NetCDF with the data. Doh! Trial by fire! But a question surfaces for me (which I can move to a separate issue if preferred). Why is the CF timestamp stored as character? Perhaps that is because you want |
I don't actually know, @pvanlaake contributed the CFTime support. To me it seemed "too hard" (== I never trusted any automatic way to do it) so I didn't do anything with metadata or units before. Curvilinear grids are best dealt with the GDAL warper api (as long as they are mass properties, not directional ones). But, and I've seen this before - that latitude spacing is really weird, whereas the rectilinear noise in longitude is just noise. Why do they do this ... The curve in the latitude is potentially Mercator stretch (aviso used to do that). I actually think it means you aren't supposed to care about data north of a particular latitude. |
On the type of the timestamp that On the issue of dimensions, axes, grids, shapes, coordinates, variables and the like: |
That makes perfect sense - thanks for the explanation and the heads up. Your point is well taken - the CEFI historical and forecast data are explicit about the calendars used. |
The lead example on the README shows that two dimensions,
lon
andlat
, appear under the listing of grids. They happen to be the active dimensions, but is that a coincidence? It doesn't quite fit my paradigm, which is more like this roadmap, to have the dimensions show up under the grid list. Clearly variables and grids are not exactly the same thing in this new paradigm.Here is another example in which case all of the dimensions are included in grids.
Created on 2024-08-30 with reprex v2.1.1
Thanks!
The text was updated successfully, but these errors were encountered: