negative character classes produce an ABNF output that kgt does not parse #317

classabbyamp · 2021-02-20T02:47:23Z

Not sure if this should be here or on katef/kgt.

When converting a regex that includes a negative character class to ABNF form, it generates the output fine, but when passing this to kgt, it fails with 1:11: Syntax error: expected production rule separator

Minimal Example:

$ re -bpl abnf '[^0-9]' | tee /dev/stderr | kgt -l abnf -e svg | isvg
e = OCTET - %x30-39

1:11: Syntax error: expected production rule separator
...  # errors continue from other parts of the pipeline

The text was updated successfully, but these errors were encountered:

katef · 2021-03-04T02:39:30Z

Ah thank you!

I think this is valid ABNF output, and re(1) is doing the right thing here (although of course it could be written more compactly without the subtraction).

However kgt doesn't implement subtraction. It really ought to give a more helpful error message than "syntax error". The reason it's not implemented, is that subtraction for CFGs in general isn't well defined! It's bewildering that it's part of the ABNF spec.

Any suggestions for how to proceed?

classabbyamp · 2021-03-07T01:34:57Z

Not a clue. You know a lot more about all of this than I do, I just stumbled on this as I was messing around.

hvdijk · 2021-03-07T02:26:07Z

Sorry for potentially making this worse, but isn't subtraction only defined in ISO EBNF, rather than ABNF? If so, that would mean there's both an issue in libfsm for using subtraction in ABNF output, and an issue in kgt for not handling subtraction in ISO EBNF input.

For the libfsm side: the libfsm parsers only generate trivial subtractions where both the LHS and RHS can only ever match a single character, right? If so, would it make sense to (re)define the subtraction operator to have that as a hard requirement? The printers could then make use of that to transform them to avoid the subtraction by rewriting [^0-9] as [\x00-/:-\xFF] where subtraction is not supported, or actually write out [^0-9] where subtraction is supported.

katef · 2024-08-24T14:27:23Z

This is tricky, because the AST node allows for subtracting sub-trees in general. The AST also allows unicode ranges here.

@hvdijk you're right that we do only construct 8-bit values and we could emit this as a positive-only set. but I don't like special-casing this given the more general AST.

Also I couldn't find a way to remove that AST node (it's there because of SQL99's dialect, which has explicit syntax for subtraction). But if there's a way to not have it in the first place, I'd much prefer that.

Other ideas?

@classabbyamp

This doesn't help for #317, but whatever the solution is there, asserting about it is the wrong thing to do. Spotted by @classabbyamp, thank you

katef added the bug label Aug 24, 2024

katef added a commit that referenced this issue Aug 25, 2024

Stray assertion.

5010a40

This doesn't help for #317, but whatever the solution is there, asserting about it is the wrong thing to do. Spotted by @classabbyamp, thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

negative character classes produce an ABNF output that kgt does not parse #317

negative character classes produce an ABNF output that kgt does not parse #317

classabbyamp commented Feb 20, 2021

katef commented Mar 4, 2021

classabbyamp commented Mar 7, 2021

hvdijk commented Mar 7, 2021

katef commented Aug 24, 2024

negative character classes produce an ABNF output that kgt does not parse #317

negative character classes produce an ABNF output that kgt does not parse #317

Comments

classabbyamp commented Feb 20, 2021

katef commented Mar 4, 2021

classabbyamp commented Mar 7, 2021

hvdijk commented Mar 7, 2021

katef commented Aug 24, 2024