Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udunits2 grammar doesn't reflect the implementation #81

Open
pelson opened this issue Jan 23, 2019 · 0 comments
Open

udunits2 grammar doesn't reflect the implementation #81

pelson opened this issue Jan 23, 2019 · 0 comments

Comments

@pelson
Copy link

pelson commented Jan 23, 2019

I'm working on the udunits2 grammar for a situation where I'd like to produce LaTeX representation of an un-interpreted udunits2 valid string (ref). To be clear, I do mean un-interpreted here - km km-1 and km/km should both produce something like \frac{km, km}, which I believe rules out using the actual ut_parse parser (happy to hear otherwise!).

I've found a number of cases with the documented grammar that should fail to produce a successful ut_unit. In most cases the behaviour of udunits-2 is the correct thing, and the documented grammar is just wrong.

Cases of incorrect grammar identified:

  1. Shift spec words must have leading spaces. For example m from2 is valid, but mfrom2 is not, yet m@2 is fine.

     <shift_op>: one of
            "@"
            "after"
            "from"
            "since"
            "ref"
    

    should be

     <shift_op>: one of
             "@"
             " after"
             " from"
             " since"
             " ref"
    

    (same is true for per and PER).
    EDIT: I was wrong about this. I got my identifiers wrong.

  2. The grammar states that "ISO-8859-1 alphabetic characters" may be part of <id> (via <alpha>), but it isn't clear that other characters may also work (e.g. π) (I think I'm right in saying that π isn't in ISO-8859-1, but unicode has never been my strong suit).

  3. CLOCK is documented as <hour> ":" <minute> (":" <second>)? but it looks like it is really <hour> (":" <minute> (":" <second>)?)?. (Does this happen because of the packed_clock format?

  4. There is no mention of the special cases of UTC, Z and GMT for the case DATE CLOCK ID seen in https://github.com/Unidata/UDUNITS-2/blob/v2.2.27.6/lib/parser.y#L447-L451.

  5. TIMSTAMP -> TIMESTAMP (typo)

Cases that udunits might be doing the wrong thing:

  1. It seems that ut_parse can't handle unicode exponents greater than 3 for non numeric values. is fine but m⁴ is not. Interestingly, ut_format produces m⁴ for an input of m+4 (as expected). 2⁴ works just fine though (as does 2⁻⁴²).

  2. The grammar states that:

    <second>:
              (<minute>|60) (\.[0-9]*)?
    

    But I can't see that udunits is actually enforcing this:

    $ udunits2 -H 's since 1990-1-1 0:0:61' -W 's since 1990-1-1 0:0:0'
    1 s since 1990-1-1 0:0:61 = -3593 (s since 1990-1-1 0:0:0)
    x/(s since 1990-1-1 0:0:0) = (x/(s since 1990-1-1 0:0:61)) - 3594
    

    The same appears to be true for all other clamped timestamp components.

    UPDATE: It seems that s since 1990-1-1 0:0:62 is actually identified as s since 1990-1-1 0:0:06 +2(hours), which is definitely valid as part of the grammar (but is that the behaviour that was intended?)

  3. ut_parse reads s since 199022T1 as s @ 19911003T010000.00000000 UTC (that's s @ 1991-10-03). Given the definition of <month> ("0"?[1-9]|1[0-2]) I was expecting this to be 1990-02-02, though to be honest I would have preferred it to fail.

I'm raising this issue as I will keep track of what I found here, and so that I can start the ball rolling with having a machine&human readable grammar that can be tested systematically (either here or upstream in a project like cf-units). My intention is to re-create a grammar based on the ANTRL specification - the choice is somewhat arbitrary, but ANTRL does allow a number of useful tools, including multi-language support (pretty useful for testing!) and debugging/visualisation of the grammar (the latter I've not yet gotten working on my machine though 😞). Naturally I'm aware of the Lex-Yacc content of the udunits-2 codebase, but have found very few tools other than bison for working with the format.

I hope you don't find this issue to be pernickety - that is definitely not my intention!
My main question is: Do you support me updating the documented grammar to be a readable AND machine/testable ANTLR grammar (subject to readability, of course)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant