You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on the udunits2 grammar for a situation where I'd like to produce LaTeX representation of an un-interpreted udunits2 valid string (ref). To be clear, I do mean un-interpreted here - km km-1 and km/km should both produce something like \frac{km, km}, which I believe rules out using the actual ut_parse parser (happy to hear otherwise!).
I've found a number of cases with the documented grammar that should fail to produce a successful ut_unit. In most cases the behaviour of udunits-2 is the correct thing, and the documented grammar is just wrong.
Cases of incorrect grammar identified:
Shift spec words must have leading spaces. For example m from2 is valid, but mfrom2 is not, yet m@2 is fine.
<shift_op>: one of
"@"
"after"
"from"
"since"
"ref"
should be
<shift_op>: one of
"@"
" after"
" from"
" since"
" ref"
(same is true for per and PER). EDIT: I was wrong about this. I got my identifiers wrong.
The grammar states that "ISO-8859-1 alphabetic characters" may be part of <id> (via <alpha>), but it isn't clear that other characters may also work (e.g. π) (I think I'm right in saying that π isn't in ISO-8859-1, but unicode has never been my strong suit).
CLOCK is documented as <hour> ":" <minute> (":" <second>)? but it looks like it is really <hour> (":" <minute> (":" <second>)?)?. (Does this happen because of the packed_clock format?
Cases that udunits might be doing the wrong thing:
It seems that ut_parse can't handle unicode exponents greater than 3 for non numeric values. m³ is fine but m⁴ is not. Interestingly, ut_format produces m⁴ for an input of m+4 (as expected). 2⁴ works just fine though (as does 2⁻⁴²).
The grammar states that:
<second>:
(<minute>|60) (\.[0-9]*)?
But I can't see that udunits is actually enforcing this:
$ udunits2 -H 's since 1990-1-1 0:0:61' -W 's since 1990-1-1 0:0:0'
1 s since 1990-1-1 0:0:61 = -3593 (s since 1990-1-1 0:0:0)
x/(s since 1990-1-1 0:0:0) = (x/(s since 1990-1-1 0:0:61)) - 3594
The same appears to be true for all other clamped timestamp components.
UPDATE: It seems that s since 1990-1-1 0:0:62 is actually identified as s since 1990-1-1 0:0:06 +2(hours), which is definitely valid as part of the grammar (but is that the behaviour that was intended?)
ut_parse reads s since 199022T1 as s @ 19911003T010000.00000000 UTC (that's s @ 1991-10-03). Given the definition of <month> ("0"?[1-9]|1[0-2]) I was expecting this to be 1990-02-02, though to be honest I would have preferred it to fail.
I'm raising this issue as I will keep track of what I found here, and so that I can start the ball rolling with having a machine&human readable grammar that can be tested systematically (either here or upstream in a project like cf-units). My intention is to re-create a grammar based on the ANTRL specification - the choice is somewhat arbitrary, but ANTRL does allow a number of useful tools, including multi-language support (pretty useful for testing!) and debugging/visualisation of the grammar (the latter I've not yet gotten working on my machine though 😞). Naturally I'm aware of the Lex-Yacc content of the udunits-2 codebase, but have found very few tools other than bison for working with the format.
I hope you don't find this issue to be pernickety - that is definitely not my intention!
My main question is: Do you support me updating the documented grammar to be a readable AND machine/testable ANTLR grammar (subject to readability, of course)?
The text was updated successfully, but these errors were encountered:
I'm working on the udunits2 grammar for a situation where I'd like to produce LaTeX representation of an un-interpreted udunits2 valid string (ref). To be clear, I do mean un-interpreted here -
km km-1
andkm/km
should both produce something like\frac{km, km}
, which I believe rules out using the actualut_parse
parser (happy to hear otherwise!).I've found a number of cases with the documented grammar that should fail to produce a successful
ut_unit
. In most cases the behaviour of udunits-2 is the correct thing, and the documented grammar is just wrong.Cases of incorrect grammar identified:
Shift spec words must have leading spaces. For examplem from2
is valid, butmfrom2
is not, yetm@2
is fine.should be(same is true forper
andPER
).EDIT: I was wrong about this. I got my identifiers wrong.
The grammar states that "ISO-8859-1 alphabetic characters" may be part of
<id>
(via<alpha>
), but it isn't clear that other characters may also work (e.g.π
) (I think I'm right in saying that π isn't in ISO-8859-1, but unicode has never been my strong suit).CLOCK
is documented as<hour> ":" <minute> (":" <second>)?
but it looks like it is really<hour> (":" <minute> (":" <second>)?)?
. (Does this happen because of the packed_clock format?There is no mention of the special cases of
UTC
,Z
andGMT
for the caseDATE CLOCK ID
seen in https://github.com/Unidata/UDUNITS-2/blob/v2.2.27.6/lib/parser.y#L447-L451.TIMSTAMP
->TIMESTAMP
(typo)Cases that udunits might be doing the wrong thing:
It seems that
ut_parse
can't handle unicode exponents greater than 3 for non numeric values.m³
is fine butm⁴
is not. Interestingly,ut_format
producesm⁴
for an input ofm+4
(as expected).2⁴
works just fine though (as does2⁻⁴²
).The grammar states that:But I can't see that udunits is actually enforcing this:The same appears to be true for all other clamped timestamp components.UPDATE: It seems that
s since 1990-1-1 0:0:62
is actually identified ass since 1990-1-1 0:0:06 +2(hours)
, which is definitely valid as part of the grammar (but is that the behaviour that was intended?)ut_parse
readss since 199022T1
ass @ 19911003T010000.00000000 UTC
(that'ss @ 1991-10-03
). Given the definition of<month>
("0"?[1-9]|1[0-2]
) I was expecting this to be1990-02-02
, though to be honest I would have preferred it to fail.I'm raising this issue as I will keep track of what I found here, and so that I can start the ball rolling with having a machine&human readable grammar that can be tested systematically (either here or upstream in a project like cf-units). My intention is to re-create a grammar based on the ANTRL specification - the choice is somewhat arbitrary, but ANTRL does allow a number of useful tools, including multi-language support (pretty useful for testing!) and debugging/visualisation of the grammar (the latter I've not yet gotten working on my machine though 😞). Naturally I'm aware of the Lex-Yacc content of the
udunits-2
codebase, but have found very few tools other than bison for working with the format.I hope you don't find this issue to be pernickety - that is definitely not my intention!
My main question is: Do you support me updating the documented grammar to be a readable AND machine/testable ANTLR grammar (subject to readability, of course)?
The text was updated successfully, but these errors were encountered: