-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating definition of coordinate variable to account for NUG changes #174
Comments
Thanks for raising this issue, Martin. I agree with option (1), which means this is a defect, as you have labelled it. Proposals to remedy defects are accepted by default if no-one objects within three weeks. |
I support option 1) as well. Thanks for raising this, Martin. |
I concur. Option 1) is the right choice. Do we need to add any clarifying verbiage regarding "label" coordinates? In the old character array approach, a label coordinate variable was, by definition, an auxiliary coordinate since it was (almost) never 1D. A 1D string variable can meet the dimensional requirements for a coordinate variable. You can construct a variable with matching name and dimension name, for example, A) State that this form is not allowed. Such variables would always need to have non-matching name and dimension name. This implies a cf-checker test that would fail if a 1D string variable had a matching dimension name. B) State that this form is allowed, but that it will only be considered as an auxiliary coordinate for a variable if it is included in a C) State that string variables that would otherwise look like they could be coordinate variables are always auxiliary coordinate variables. This implies that a string variable such as In every case, there are implications for the data model, and for software packages such as cdms that attempt to build coordinate domains for data variables will need to deal appropriately with 1D string variables that appear to match the requirements for a coordiate variable. (@taylor13, you might want to chime in.) |
Hi Jim, I hadn't spotted that problem. My preference is for (A). As I understand it, the CF data model has a single namespace, so there can only be one Option (C) looks awkward to me. The size of an auxiliary coordinate is usually determined by its own coordinates. If there are no true coordinates, it is not really "auxiliary" to anything. Perhaps @davidhassell can comment on the data model issues. |
See also #139 .. which is a proposed enhancement to support string variables. |
I don't think that this is a data model issue, which ever option we choose. The data model doesn't care how its constructs are encoded - all it needs is to be able to do is unambiguously identify its constructs from a file. For example, if we were to say in the conventions "when you see a string variable like |
I agree with @davidhassell that whatever option we choose, it's not difficult to incorporate into the data model. @martinjuckes, I have to confess that I don't understand your argument against options B) or C). I prefer options B) and C) myself. I don't see a good reason to make the entirely natural choice of creating a 1D string coordinate variable named |
OK, it is good to see that there are no obstacles from the data model side of things. My mistake there. Adopting (B) or (C) would require a change to the definition of an Auxiliary Coordinate, which currently includes the statement that "Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s)" -- it would be good to know what alternative is being proposed. @JonathanGregory : you took an interest in this topic during the email discussions --- do you have a preference for any of the options Jim has outlined above (on the 16th). |
I like option one, i.e., sticking with the restriction that coordinate variables are 1-D numeric, monotonic. To me this is more of a correction than a change because the main NUG section on coordinate variables is not actually particularly precise in its definition of coordinate variables. It says
The NUG Best Practices section on coordinate variables is a bit more precise. These should be made more consistent. There's discussion at Unidata on getting the NUG in its own repo so changes like this can be a bit more transparent. |
Hello All, There appears to be consensus on point 1: treating the wording of the definition of a coordinate variable as a defect and modifying it to state clearly that CF requires coordinate variables to be of numeric data type. I don't think we have identified a clear preference regarding Jim's suggestions about auxiliary coordinate variables. In going through the changes needed in the Conventions document I noticed that we have the sentences That is,
is recommended against. This is phrased as a restriction on i.e. we have an option (D): If a string or character variable has a single dimension matching its own name, it will be treated as a data variable with an index dimension. It is recommended that such variables should not be used as auxiliary coordinate variables. I also noticed that the first sentence of Section 1.2 is also out of date, in that it states that the terms defined come from the NUG. I think this has been wrong for some time, as most of the terms appear to be specific to CF. I've drafted some proposed updates to the document, in the 4 places that I believe need updating: 1. Update 1st sentence of Section 1.2Current:
Proposed:
2. Update terms in Section 1.2 to indicate those based on NUG
I think these 4 are the only terms that have a specific meaning in NUG. 3. Update Definition of Coordinate Variable in Section 1.2Current:
Proposed:
4. Recommendation on Auxiliary Coordinates (Chapter 5)Fourth paragraph of chapter 5: Current
Proposed (
|
@martinjuckes As I read CF, there is no such thing as a "multi-dimensional coordinate variable" that is anything but an auxiliary coordinate variable. There is no provision in CF for connecting an auxiliary coordinate variable with a data variable apart from including the name in a Relevant parts of Section 1.2 declare
These definitions make it quite clear that a non-numeric variable cannot ever be a coordinate variable. I've got no problem with clearing up the wording in Section 1.2, but everything that follows is dependent on these definitions of terms. Section 5 paragraph 4 states
I included the definition of "recommendation" earlier because this paragraph is a recommendation, not a requirement. Notice that the definition of a recommendation states "An application must not depend on a dataset’s adherence to recommendations." I can see a valid argument against human confusion for recommending against allowing a multidimensional auxiliary coordinate variable to have a dimension that has the same name as the variable name, but I think the assertion that such a construction precludes providing a coordinate variable for the dimension is incorrect, and deciding that a variable is a coordinate variable on the basis of a match between one dimension and the variable name is a particularly bad practice. An auxiliary coordinate variable can be fully compliant with CF and not follow this recommendation. I think we should consider this recommendation to be defective. I'm going to break here on account of the length of this comment and continue in another. |
I realized that I left a bit out above. The definitions in Section 1.2 also make it clear that a multidimensional numeric variable cannot ever be a coordinate variable. Both multidimensional numeric variables and string variables can be auxiliary coordinate variables. |
@martinjuckes Now to the question of 1D string auxiliary coordinate variables. A 1D string variable with matching dimension and variable names is, per the Section 1.2 definitions (see my previous comment), a fully-compliant auxiliary coordinate variable. I believe that the construction
is valid according to the current version of CF. It satisfies all the requirements. It is also compliant with the current (non-binding) recommendation from Section 5 paragraph 4, which doesn't mention 1D string auxiliary coordinate variables. I personally think it is fine for a 1D type |
@martinjuckes After all that is said and done, I like your change to the Section 1.2 definition of coordinate variable. I disagree with your change to the Section 5 paragraph 4 recommendation that I believe to be defective. Here's an alternative suggestion. We could change Section 5 paragraph 4 from
to
|
Hi Jim, (1) yes, it is clear that a multidimensional coordinate variable is always an auxiliary coordinate, and that the converse is not true; (2) yes, it is clear that Section 5, para 4 is a recommendation, not a requirement; (3) A construction of the form (4) Removing the sentence from Section 5 para 4 that states |
@martinjuckes Regarding your point (3): What convention prevents the construction Regarding your point (4): The sentence in the recommendation in Section 5 paragraph 4 that I am saying is defective is based on an appeal to a bad programming practice. The definition of recommendation contains the statement, "An application must not depend on a dataset’s adherence to recommendations." Applied to the sentence I am suggesting we remove, this definitional statement reads, |
Hello Jim, Under the existing convention Perhaps it would help to have some other views on these points. @JonathanGregory , @ethanrd : do you have any views on Jim's suggestion that a variable of the form |
Dear Martin and Jim |
Dear all, I also think that if a string aux. coord. var. name and its dimension's name are identical, this could unnecessarily mislead some into thinking it is a coordinate variable (because of the NUG convention), so CF should not allow it. best regards, |
@martinjuckes @JonathanGregory I hear where you are coming from. I may have been somewhat unclear before. What I am trying to point out is that the Conventions don't currently proscribe such a form. There is no prohibition in the text against having a 1D variable of non-numeric type that has matching variable and dimension name. Such a variable cannot be a coordinate variable, by definition, because it is non-numeric. It meets all the requirements for a valid auxiliary coordinate variable. There is also no prohibition in the text against having a multidimensional variable with a dimension name that matches the variable name. Such a variable meets all the requirements for a valid auxiliary coordinate variable. The only basis I have found for any assertion regarding such variables is the defective recommendation in Section 5 paragraph 4, which can't actually be regarded as proscriptive because it is a recommendation. If we wish to proscribe a variable of the form |
Is there a backward compatibility issue here? If we allow |
@davidhassell It's possible. Software that didn't check on the type might end up doing something unexpected. An appeal to potential software problems is problematic, as software that properly implements CF as written should check the type of a variable like There is not a scientific use case. You could say the same for a number of aspects of CF. The assumption appears to have been that auxiliary coordinate variables wouldn't ever look like coordinate variables. That's probably why we have the recommendation in Section 5 paragraph 4. We just didn't write the conventions to expressly prohibit such a case. |
If we don't want to allow 1D non-numeric auxiliary coordinate variables to have the form
to
|
Hello All, thanks for those comments. I realise now that there was an error in my proposed new definition of the *coordinate variable" (item 3 here): it implied an unintended change in the interpretation of a variable As Jim has pointed out, there is a choice about how we deal with string variables here. We could say 3(a) Revised proposal for Coordinate Variable:
3(b) Alternative revised proposal for Coordinate Variable:
A also prefer 3(a), as it reduces the room for confusion which might arise if If we accept 3(a), I'm not sure of necessity to change the auxiliary coordinate variable definition. In the current convention the form On the other hand, if we are going to be precise there is a problem with the phrase @JimBiardCics : what do you think of the following alternative for auxiliary coordinate variable:
|
@martinjuckes If we go with the majority of responders on this thread regarding the acceptability of Regarding your statement:
NUG does consider My next comment will include suggested new text. I disagree with your comments regarding auxiliary coordinate variables. First and foremost, a recommendation does not, per its own definition, define anything that a person writing software to read a netCDF file should depend on. Regarding your statement:
I'm guessing there's a typo in this sentence, because there's not an 'auxiliary coordinate' as opposed to an 'auxiliary coordinate variable'. Looking at the rest of the paragraph that begins "If we accept 3(a) ...", I believe that if we are going to disallow I agree that the phrase about relationship between variable and dimension name in the definition of an auxiliary coordinate variable is unclear. We should change it. |
@martinjuckes As I mentioned before, I am confused by your sentence
As I read it again this morning (my time), I think I may see what you are getting at. Are you saying that the Conventions define a variable with the form |
How about this approach? Define a coordinate variable to be a 1D numeric monotonic variable with matching variable and dimension name that does not contain any fill or missing values. Define an auxiliary coordinate variable to be an N-D variable with a name that does not match any dimension name that contains data that is intended to be interpreted as coordinate information. Remove the recommendation from Section 5. This would allow someone to make a variable of the form In Section 1.2 (and in the order below)Coordinate VariableA coordinate variable is a one-dimensional variable with a numeric type that has the same name as the name of its dimension (e.g., Auxiliary Coordinate VariableAn auxiliary coordinate variable is a variable containing coordinate information which does not meet all the requirements of a coordinate variable. An auxiliary coordinate variable shall not have a name matching any of the names of its dimensions. An auxiliary coordinate variable may have a non-numeric type (allowing it to represent a category or label axis), may be non-monotonic, and may contain fill and missing values. In Section 5Delete paragraph 4. |
@JimBiardCics I like this approach. I'd like to pepper in a few "strictly"s, and I'd rather shy away from your use of "domain axis" in the text, simply because a "domain axis construct" is a CF data model construct that does not map to a CF-netCDF coordinate variable. How about (new text in italics): A coordinate variable is a one-dimensional variable with a numeric type that has the same name as the name of its dimension (e.g., and in the auxiliary coordinate paragraph: "non-strictly-monotic", (if that makes grammatical sense!). |
@JimBiardCics : for completeness I'd like to note that in addition to Jonathan and myself, Karl has expressed opposition to accepting string valued dimension coordinate variables (repeatedly), and David has expressed reservations. Do you acknowledge that the current convention, and version 1.0, contain the statement that a coordinate variable " is defined as a numeric data type", and that this would appear to rule out string coordinate variables? You are right in observing that we are talking past each other .. thank you for acknowledging that. I'm puzzled by your confidence that "anyone given that text" could work out which variables in a file were coordinate variables. My objection, I'd like to remind you, was that both applications and users need to be able to identify coordinate variables. Can I interpret the fact that you have not addressed the question about applications as an admission that your definition cannot be converted into a logical algorithm which can be run in applications? A tutorial is not good enough, we need a coherent set of logical rules. |
Hi all - While I agree that CF intends to only recognize numeric-valued coordinate variables, I think the reference to the NUG definition makes the CF statement somewhat ambiguous. (A careful reading of various sections helps make the intent clear. However, not everyone reads the specification text that carefully. I, for one, did not realize this restriction was intended and I’ve read many parts of the CF spec pretty carefully but never before, it turns out, while questioning my NUG coordinate variable assumptions.) A bit of a side-note, or FYI: |
@JonathanGregory If "string front_type(front_type)" could be considered a valid auxiliary coordinate variable, I'd be fine with that. In fact, I think that is a great approach. The message I have gotten repeatedly from this discussion is that there were strong feelings that this should not be allowed. I agree that, whatever flaws there may be in the current language, CF currently disallows non-numeric dimension coordinate variables. As far as it goes, I don't think we should define a collating order if we allowed string dimension coordinate variables. In fact, I think that would be the wrong way to approach it. |
@martinjuckes wrote:
Yes, absolutely. The language in versions of CF developed before the string datatype was available did not envision string dimension coordinate variables and did exclude char dimension coordinate variables.
Not at all. In fact, I claim the exact opposite. I claim that the text as I wrote it is easily used by people to write software to create coordinate variables in netCDF files and software to read netCDF files and find coordinate variables. |
Speaking as someone that has been trying to make sense of very diverse CF files with nothing but the CF-Convention in my hand, I have to say the fact that dimension coordinates can be identified by name and dimension being the same is a good thing. It is very hard to correctly identify, for example, auxiliary coordinates and cell_measures because this status can not be inferred from the variables themselves, but only from analyzing all relevant variables. This is possible in an ad-hoc fashion, but hard to implement in a parser. It becomes harder when "all relevant variables" might be spread over several files or exist only in an object storage or similar. Generally, the convention does a good job of telling people with data how to put this into netcdf files. It is far more difficult to work with in the other direction. In fact, I would like to see CF move in a direction where it becomes easier to identify the character of all variables, but that is a discussion for another day. |
As far as I can tell this issue has no moderator as yet. I would be happy to take this on, if everyone else is OK with that. I will try to collate a summary of the points raised, sometime (hopefully early) next week. Thanks, David |
Thanks David, I believe that would be very helpful. We have agreed to change it from a |
Yes, thanks for moderating, David. |
I agree with @zklaus, meaning I haven't changed my mind. @JimBiardCics wrote "current CF understanding requires me to construct an auxiliary coordinate variable |
@davidhassell : this issue is still in need of a moderator -- is your offer from March 2020 still open? As far as I'm aware, the issues is still unresolved. I've revised the top comment to note an additional problem with a broken link in section 1.3 of the convention. |
Hi @martinjuckes, Yes, of course! I notice that my offer came just before everything changed last year, so I guess it got lost in the noise. I shall remind myself of the discussion thus far and post a summary. Thanks, |
Dear @davidhassell Are you willing to summarise this issue? Best wishes Jonathan |
Thanks Jonathan, I am indeed (finally!). I shall do it tomorrow. |
Hello, Here is my summary of this issue ...The arguments presented are quite subtle at times, and I would recommend re-reading yourself if you want to pick up on this, rather than relying on this very compressed representation. However, it will hopefully act as a good reminder or introduction to the topic. The issue is about clarifying the definitions of CF-netCDF coordinate variables and auxiliary coordinate variables. The CF conventions in section 1.3 terminology say coordinate variable We use this term precisely as it is defined in the NUG section on coordinate variables. It is a one-dimensional variable with the same name as its dimension [e.g., time(time)], and it is defined as a numeric data type with values in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. but the word "precisely" in the above is problematic, because the NUG definition also allows variables of the form The discussion, I think, coalesced into two basic questions:
I would say that the majority of support was for:
There were plenty of other points raised in the discussion (such as whether or not to have better names for coordinate and auxiliary coordinate variables), but these, I feel, could be pursued elsewhere. If you think I've missed out something important (quite possible!), please let me know and I'll update the summary. Many thanks, |
Dear @davidhassell Thanks for this very useful and clear summary. I have reread the discussion quickly and I think the summary is correct, as well as consistent with memory (as far as memory goes). If we agree with the majority opinion in the early discussion, I think we need to modify the definition in in 1.3. I suggest:
and we need a corresponding prohibition of Best wishes Jonathan |
I tripped over the "and consequently" phrase when I first read it. I think it's because the restriction against variables of the form Perhaps simply moving that phrase, minus the "consequently", to the end of the suggested text and adding a bit of an explanation, something like: "To avoid some complexity and possible confusion, CF does not permit a one-dimensional string-valued variable to have the same name as its dimension." |
Dear @ethanrd Thanks for your suggestion, which would make the text: A coordinate variable is a one-dimensional variable with the same name as its dimension e.g., Is that OK? Jonathan |
Looks good. Thanks@JonathanGregory |
I have created PR 531 with these changes. I have put the new requirement in Sect 2.5 "Variables" in the conformance document, with a reference to Sect 1.3, because it didn't seem logical to have a requirement corresponding to "Terminology". If there are no further objections or comments requiring changes, this can be merged in three weeks from now (31st July). |
In NetCDF4, coordinate variables can be string valued or character arrays. This is a change from NetCDF3 --- and, because of this change, the section of the CF Convention which refers to the NetCDF definition of coordinate variables contains a contradictions.
Section 1.2 on terminology states that a Coordinate Variable is defined "precisely as it is defined in the NUG section on coordinate variables": this now implies string and character values are allowed. However, the following sentence in the definition of a Coordinate Variable states that it should be "numeric data type with values that are ordered monotonically".
We could resolve this contradiction by either (1) retaining the restriction to numeric data types and dropping precise equivalence with NUG or (2) retaining precise equivalence with NUG and allowing string and char coordinate variables. Initial discussion on the CF Discussions email list has two votes in favour of option 1. This would require minor changes to the text. In principle there would be no change to the conformance requirements, but the requirement for numeric data types does not appear be represented in the conformance document and should be added.
If option (2) is taken, there is some ambiguity about the meaning of the monotonicity requirement which we would need to resolve.
PR #531 implements the decisions made by the following discussion.
The text was updated successfully, but these errors were encountered: