-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"mesh variable" instead of "boundary variable" for contiguous grid cells #5
Comments
@rabernat makes very good points and I agree with everything said. The concept of a space-filling and contiguous mesh is fundamental to many finite-volume codes and I find it baffling that a more efficient description has not already replaced the current description. The names of the mesh coordinates do not need to be fixed. They can be named in an attribute, e.g.
Several notes:
At the risk of derailing the thread I'll mention that we can arrive at an even more succinct description if we discard the idea of data coordinates which I think is unnecessary when providing the finite volume mesh information:
I believe the latter example to be a more appropriate description of the data. However, I realize that the latter is not backward compatible since there is an expectation that there are coordinates and will cause trouble if they are missing. So I will be pleased enough if we can at least adopt an attribute that identifies mesh coordinates as in the earlier example. |
I was hoping for some sort of response or discussion from the maintainers. Have we raised this issue in the wrong place? Would it be better off in the main cf-conventions repo issue tracker (rather than the discussion repo)? |
I don't know - I'm somewhat lost on where and what things are. It does seem that https://github.com/cf-convention/cf-conventions/issues is both active and has proposals being discussed. |
Hi, I maintain the standard name table and I am under the impression that https://github.com/cf-convention/cf-conventions/issues is for general cf convention discussions and https://github.com/cf-convention/discuss/issues is for standard name requests and discussions. I shall ask one of the other maintainers to confirm too. Apologies for the confusion. |
The readme for this repo says the following:
https://github.com/cf-convention/discuss/blob/master/README.md |
I must be wrong. I will leave this to one of the other maintainers to respond to. Hopefully someone responds to your discussion soon. |
This is the correct repo for discussion about possible changes to conventions, if you're not at the stage of proposing a change. It takes over from the UCAR CF email list, which is now mothballed. When you have a specific proposal to make, it should be done as an issue in the conventions repo, which has taken over from the old CF trac tickets. |
Thanks @JonathanGregory for the clarification! 😄 I opened this issue specifically for discussion. We are looking for feedback from the CF community about this idea. If no one has any feedback, I suppose we will just move forward with making a proposal over in the cf-conventions repo. |
Let's see. Not everyone will have seen this yet, since they may not be watching this repo, as it has just been introduced. I have to confess I am one of those! I hadn't seen this question before just now and therefore haven't yet thought about it. |
Sure, no rush! We'll leave this discussion open as long as needed. |
@rabernat @adcroft I thought I would drop in some background. The existing CF coordinate conventions were designed to provide maximum flexibility for a wide variety of use cases. Coordinate tuples can describe points or cells. Cells can have arbitrary shapes, can be non-contiguous, and can overlap. If the coordinate variables are not 1D, there's no requirement that the coordinates describe a mesh grid of any sort. There is no prescribed relationship between a given coordinate tuple for a cell — (lat,lon) for example, and its relative location within the cell it is referencing. (There is a loose assumption that a coordinate tuple without any bounds describes the center of a cell, but that is not required or guaranteed.) This flexibility comes with a cost in the form of the complexity (and bulkiness) of the representation. I think CF has done a pretty reasonable job of this overall. |
@rabernat @adcroft There have been a number of times when people have requested a simplified representation of coordinates when there is a straightforward mesh grid where cells are contiguous and non-overlapping. This amounts to representing the grid cells by specifying the vertices of the cells. This can work whether the coordinate variables are 2D (as in your example), 1D, or N-D. Such a scheme is quite compact, but it does represent a significant departure from previous assumptions. Software (and humans) that don't notice that a different scheme is in use will have serious problems. The The mesh coordinate variables you are suggesting would not be recognized as proper auxiliary coordinates by existing CF-compliant software because the dimensions don't match the dimensions of the data variable. This isn't a problem. I'm just stating the facts. I expect that most software would either check the dimensions and throw an error because of the mismatch or attempt to proceed naïvely and have serious problems. I think there needs to be an unambiguous marker designating a coordinate variable as a mesh coordinate variable. That way software does is not forced to rely on dimensions being +1 in size as the only marker. The two ways I see to do that is to declare the relationship with a special I think there is a valid use case for such a representation. It represents enough of a departure from existing CF that quite a bit of careful thinking will need to go into it. |
Some general comments: I agree with @JimBiardCics that the current flexibility comes at the expense of bulkiness, but rather than with the cost of complexity, I think that it comes with the benefit of simplicity - one methodology for any type of cells and trivial lookup of individual cell bounds. We should always be wary of introducing a different way of doing something that can already be done, because it makes the conventions harder to understand and it makes software harder to write. However, we must also consider whether or not the potential benefits outweigh these concerns, and CF does have a history of introducing complexity to deal with ever increasing file sizes (e.g. external variables, DSGs). That said, and given the need, I might prefer an example more like this:
Here, software that doesn't understand meshes will still read the file without crashing. It be able to parse the data variable and all of its metadata with the only exception of the mesh bounds, that it will just ignore. The key to this is linking the bounds to the coordinates with the Also, the CF checker could easily only check meshes at the appropriate versions. I agree that omitting the coordinate value and just storing the mesh vertices is a distraction at this stage (see #5 (comment)). That would require a change to the CF data model, which is otherwise not needed. I was a bit concerned that the mesh is fragile, in the sense that it might cease to be a mesh under certain processing or analytical operations (e.g. subspacing). But I'm thinking now that this is not a concern of the CF conventions. |
Thanks everyone for the useful feedback! 😃 A couple of key points emerged:
These points support @davidhassell's suggestion that
@JimBiardCics - thanks for this background. I agree that CF |
A technical issue occurs to me - that of mapping the N-d mesh vertex variable dimensions to those of the parent coordinate variable (or data variable). This can not be done by dimension name, and can not rely on dimensions sizes being different - they might not be. You could state that the dimension order of a mesh variable must match that of the parent, but this is not checkable and so is far from ideal. Here a a couple of examples to demonstrate the difficulty:
In both cases I don't know what the mesh variables dimension orders are, relative to There's probably an elegant way round this, but I can't think right now what it is ... |
@davidhassell We could force that the order of dimensions of the mesh variables equals the order of the dimensions of the coordinate variables, and trust the users to comply with it. Additionally, we could create a mapping from
|
I hadn't realized that changing the order of indices was allowed. You're saying it's not in the standard. Is this an example where the dimension variable resolves the ambiguity? e.g. adding the |
Putting the mesh attribute on the coordinate variables requires we have to delve through a level of indirection to find the mesh for a variable. For a finite volume model the mesh coordinates are more primary to the variable than the coordinate of a notional cell center so I prefer for it do be directly associated and at the same level of the coordinate attribute. Attaching the mesh attribute to the data variable is backward compatible. Was there a reason to not add such an attribute to the data variable? This could still live besides a bounds approach (even though it would be redundant) without conflict. I think the following resolves
|
I really like it @adcroft! Doing it that way also allows for the common case where a variable might be a mesh coordinate for one variable (cell centered) and a regular coordinate variable for another (cell vertices). For example
with the rest the same as your example. |
That's a much stronger argument than my complaint about the indirection. |
Thanks for considering my example, but I'm afraid that I don't see how the latest example (#5 (comment)) works for all cases. A few concerns are: What connects In general, it is best to avoid redundancy, as it increases the chances of inconsistency. It seems to me that the discussion might be veering away from how to save space when storing the bounds of contiguous cells, to something more general about how to encode particular types of grid (which reminds me of UGRID). That's fine, but it's hard to talk about both at the same time, I think. I hope you don't perceive me as being too negative - I'm just trying to searching for a solution that is as robust as other part of the CF conventions! |
I'd like to raise another possible issue: I'm pretty sure (but not positive) that CF allows one to have simple "index" dimensions where the index values start at 0 and are consecutive. For example, one would usually store a variable defined on an icosahedral "cube-sphere" grid as a function of three indices. I don't think CF requires the vectors of index values (for each of the 3 dimensions) be stored explicitly in the file. The longitude and latitude coordinate values for each grid cell would be pointed to by the coordinates attribute attached to the variable, and these coordinates would also be a functions of the 3 indices. If this is so, then how would this be impacted by the ideas being discussed here? My reading of some of the examples above is that we would now require the index values be explicitly included in the file. |
I'm also very much in favor of avoiding redundancy. At the very least, redundancy permits inconsistency.
If netcdf allowed us to declare data with shape The order of dimensions is currently not restricted in CF (second paragraph on dimensions). @neumannd suggests we require that the order of dimensions be the same for related variables (i.e. coordinates, data, and mesh coordinates). The text on two-dimensional coordinate variables has the following line:
which shows the same ordering of dimensions but doesn't state that they should be in the same order. I would be very surprised if different ordering is ever actually used in files, or even considered in much software. I think restricting the dimension order to be the same for related data does solve the association problem raised by @davidhassell. So I like @neumannd's proposal which happens to also render a more succinct description:
|
@taylor13 I don't think that the index value functionality is (yet) in CF. Please correct me if needs be! Is it part of the Gridspec standard (which I'm not too familiar with, but thought it provides a means for storing cubed-sphere grids)? @adcroft You're right that the order of dimensions has to be the same between an auxiliary coordinate variable and its bounds variable (I missed this earlier - sorry), but the this order does not have to correspond to the order of dimensions of the data variable. The order of variable names given by the Given that a bounds variable dimensions must be ordered like its parent coordinate variable. This could be extended to "mesh" bounds variables, and allow, as I suggested earlier,
where it would now be given that |
@davidhassell :
Here the term "coordinate variables" refers to a variable defined according to the NUG definition
The last paragraph implies that not all dimensions have a corresponding coordinate variable, and I'll refer to these as "pure index dimensions". The CF-convention uses the
Note that there is no Similarly for a cubed-sphere grid logically structured as a 3 dimensional array, the 3 dimensions of a variable would all be "pure index dimensions", and those indexes wouldn't have to appear as (NUG) coordinate variables in the file. The reason I brought this up is that in #5 (comment) above, i, j, dim1, and dim2 were all explicitly defined as coordinate variables when I don't think this should be necessary. It appears that the suggested approach requiring definition of coordinate variables i, j, dim1, and dim2 has been dropped, so I don't think we have a problem unless what I've said here is somehow incorrect (in which case some clarification is needed). |
@taylor13 We were dropping using coordinate variables because they are not normally required, as you just confirmed.
If I renamed I see that adding the I am struggling to think of another solution to the dimension association problem while attaching the
One more heretical thought: there is a line in the cell boundaries section that reads "A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable." (it is italicized in the document for emphasis). If we allowed for a bounds variable with no extra dimension then this is exactly what the BTW, that NUG definition of a coordinate variable caught me off guard - it seems I've been using the wrong terminology. What is defined there as a "coordinate variable" is what I was referring to as a "dimension variable" because we have meaningful variables that are coordinates which in normal conversation one would call a "coordinate variable". I've been using these terms incorrectly for too long that I don't know where I got them from but @JimBiardCics corrects someone's terminology in a post from 2011 so I'm at least 8 years out of date! |
@taylor13 I added the definition of the dimension variables for the indices |
Hey @rabernat, @adcroft, @davidhassell et al., I'd like to add another point: 3D meshes on a 2D curvilinear grid? We use something like this within the COSMO-CLM model. To be more explicit: Consider a 3D variable
here the mesh variables Now, suppose we also want to specify the bounds of the vertical
with
I think both should be accepted, although the first would be preferable for the purpose of visualization. |
Another point: How is it supposed to continue with this thread? I'd very much like to see this implemented in the conventions. According to the README of this repo, this issue should be raised at https://github.com/cf-convention/cf-conventions/issues after the discussion. But who defines what after the discussion exactly means? Are there any guidelines or experiences that state at which point we move this issue over to the cf-conventions repo? |
I'd like to add another point to the discussion, that was partly mentioned by @davidhassell in #5 (comment)
Yes, I think with the current implementation, this is a problem. Consider the following dataset
I use xarray, to illustrate my point, but I am sure the problem exists with others as well. If I would do an
which is definitely not what I want (
is now One possibility might be, to add an attribute to an
But I think this is not a good solution as this makes the dimension |
Sorry Folks, I wasn't following this at the time, and I haven't carefully read the whole thing, but: My first thought was: WTF? we've been using curvilinear quad meshes in CF forever! But I suppose the issue is that we were making a lot of assumptions about cells, rather then having teh well specified. So thought 2: Isn't this EXACTLY what SGRID is about? http://sgrid.github.io/sgrid/ And while we are at it, unstructured grids should be accommodated as well: http://ugrid-conventions.github.io/ugrid-conventions/ (I did see that briefly mentioned in this issue) Both of those have been designed as extensions to CF, and are in use, at least a little bit. Anyway, let's not reinvent the wheel here. NOTE: those aren't so good at dealing with meshes that wrap around the earth -- but they could be extended for that. |
Good points Chris. The issue is sgrid doesn't have nearly the same level of visibility / adoption as CF itself.
Is there a formal mechanism for "extensions to CF"? If so, is there a directory of such extensions? And what defines the scope of extensions vs. the main spec? |
BTW: there have been proposals to make these auxiliary standards "official" |
"Is there a formal mechanism for "extensions to CF" Well, I was using it in an informal way -- which means that they were designed to be able to be used without conflicting with CF. And I think it is "official" that CF can be used in combination with other specs:
And yes, CF does have much more recognition -- so it would be a great idea for CF to "bless" some of these other conventions. I'm just saying that we really don't want to start all over again with the design. |
Hello @rabernat, It'd be very useful if you briefly summarize this issue https://docs.google.com/document/d/1urPWngzDCuHTrfpA8nedGoRDVKXs5OmjqO8M6i3UZJM/edit#, including what might be good outcomes from a discussion at the CF meeting. If this could be done today or tomorrow that would be best, as we will use it to help people decide on which sessions to attend in advance of the meeting. Many thanks, |
Hi @davidhassell - I went to add my summary, but the link you provided doesn't give me edit access. |
Ok done. |
I'm not sure where else to post this, but: As far as discussing the possiuble inclusion of the SGRID standard in CF at the 2020 CF Workshop, maybe we should discuss UGRID as well. It's not really a new topic -- there are lots of details, but a key idea is the same: a "mesh" or "grid" variable which has attributes describing how the grid is defined. grid:cf_role = grid_topology http://ugrid-conventions.github.io/ugrid-conventions/ So if CF IS going to include either of these, it makes sense to do it in a unified way. |
Dear @ChrisBarker-NOAA |
@JonathanGregory : Thanks, I had completely lost track of the status of that. But there are parallels here, and I think it's best that CF deal with UGRID and SGRID (and maybe other GRids in the future?) in similar ways. So we should reference the UGRID status when talking about what to do with SGRIDS. In particular, it seem that idea in this thread is to make a "grid variable" or "mesh variable"as a standard place to put the grid mapping info part of CF itself. Then we could say that the contents of that variable are left up to auxiliary standards, such as UGRID and SGRID -CHB |
Hello, The timings and order of the breakout groups for the CF meeting next week has now been set (see http://cfconventions.org/Meetings/2020-Workshop.html), and the discussion of this issue will be on Wednesday 10 June from 17:30-19:00 UTC, in parallel with three other topics. Thanks. |
I was looking forward to the CF meeting discussion today. But recent events in the US have caused my plans to change. In response to the murders of George Floyd, Breonna Taylor, and countless other Black people, today there will be a general strike across STEM and academia (https://www.shutdownstem.com/). The idea is to pause business as usual and take action against racism. I support this issue strongly, and so I will not participate in the CF discussion today. To make up for my absence, I have put my presentation online (https://speakerdeck.com/rabernat/cf-mesh-coordinates-proposal) and recorded a video of myself presenting it (https://vimeo.com/427709117). I know this is not the same as being there “in person,” but I hope it is enough. I will be happy to resume the discussion tomorrow via this github issue. |
A recent email exchange made me remember this issue. Has there been any progress? @rabernat, is there still an appetite to drive this forward? Perhaps it could be discussed at the next CF meeting? |
There is discussion of UGRID right now here: Not sure the status of SGRID, but they are closely related. |
Also checking in after a long time away. Lots of discussion but no decision in ugrid-conventions/ugrid-conventions#52. Looking at the CF convention rules, perhaps it would help the process move along if someone made a formal proposal following those rules which would then reach an up-or-down decision in a finite time. Otherwise I think we could remain in limbo for a very long time. I would prefer to bundle SGRID and UGRID into a single proposal, but others may disagree. |
Dear Ryan @rabernat Thanks for stirring this. It looks like both UGRID issue 52 and CF issue 153 have nearly reached agreement on definite proposals, but they need summarising, separately or together, so we can see where we are. Perhaps someone can do that before the CF workshop (21-23 Sep). Is it planned to discuss UGRID in a breakout group? I do not see it mentioned in this list.
I think we are quite likely to be able to reach agreement soon on UGRID on the basis of the discussions that have already been had, whereas we have not discussed SGRID much. I would therefore suggest that we try to conclude with UGRID before starting on SGRID. Best wishes Jonathan |
UGRID and SGRID have a lot in common, so it may have made sense to do them together. However, there's also sometyyhing to be said for focusing on UGRID, and then adding SGRID with the lessons learned. I'm hoping that will be an easy lift at that point :-) |
I work every day with ocean models that use orthogonal curvilinear coordinates (MITgcm, MOM, POP, ROMS, NEMO, etc. etc.). This is an example tripolar grid from CESM:
The grid cells in such models are contiguous quads, with four points specifying the lat / lon vertex locations of each cell. CF conventions tell me (Section 7.1: Cell Boundaries) that I should use a boundary_variable.
This convention is general enough to accommodate potentially overlapping or non-contiguous quads, essentially
n x m
totally unrelated four-sided shapes.My main point: It's inefficient to store structured grid geometry this way.
I don't want to have to check this, I want the conventions to tell me. In our latest global high-resolution ocean models, I have a mesh that is of size
n=12960, m=17280
, 223 million cells. I am interested in streamlining my analysis and visualization workflow as much as possible, which means minimizing the required memory and computational steps.Instead of specifying a boundary variable, I propose to introduce the concept of a mesh variable, with the following conventions:
A mesh variable will have the same number of dimensions as its associated coordinate or auxiliary coordinate variable, but with one extra element in each dimension.
In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude
lat(n,m)
and longitudelon(n,m)
, and the associated cells are four-sided and contiguous, then the mesh variables are given in the formlatmesh(n+1, m+1)
andlonmesh(n+1, m+1)
.It would not be hard to generate such data, since this is how most GCMs keep track of their own coordinate grids internally (e.g. MITgcm). This convention also aligns well with how most visualization software plots such data, e.g. matplotlib's pcolormesh function. So adding something like this to the CF conventions would streamline the path from model output to plotting, eliminating the potentially error-prone step of encoding, and then decoding, the "boundary variable" type coordinates. For the dataset I described above, the difference is about 3 GB of memory.
I don't feel strongly about what it's called. Maybe "mesh variable" is not the right choice. But I feel something like this is sorely needed.
cc @adcroft & @StephenGriffies, with whom this topic has come up repeatedly.
The text was updated successfully, but these errors were encountered: