Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the disjointness requirement for call and exit symbols really necessary? #44

Open
modulovalue opened this issue Nov 17, 2023 · 4 comments

Comments

@modulovalue
Copy link

According to #43 (comment), call and return symbols need to be disjoint.

Therefore, #38 is not possible because string literals viewed as a VPL would have an identical call and return symbol, and would not be a VPL.

However, I'm wondering, could it be that this requirement by Rajeev Alur & P. Madhusudan for VPLs is too restrictive?

I don't have an intuitive understanding of the inner workings of owl, and this might be obvious, but @ianh can you think of a counterexample where having identical call and return symbols would break any of the invariants that owl depends on?

What would happen (where would owl break) if owl ignored this requirement, and a grammar like [ '[' ... '[' ] would be accepted and compiled by owl?

e.g.?:

Open Ast

@ianh
Copy link
Owner

ianh commented Nov 17, 2023

The basic thing you need is, for each input token, to know whether it will cause the automaton to perform a call transition, a return transition, or an internal transition. It's technically possible, given owl's implementation, to have the same token used to perform different types of transition, as long as you can distinguish the different types of transition deterministically without any lookahead.

In this example, it wouldn't be possible to do that -- [[ could perform a call and a return or two calls.

And in any case, supporting stuff like this makes it much more difficult to explain what a valid grammar looks like. It would be similar to "shift-reduce conflicts" in LALR parser generators, where the error has nothing to do with the grammar itself but with some conflict within the automaton construction process.

@modulovalue
Copy link
Author

In this example, it wouldn't be possible to do that -- [[ could perform a call and a return or two calls.

So, if I understood you correctly, what about the following grammar, which is rejected by owl:

a = [ '"' (b (',' b)*)? '"' ]
b = [ '{' (a (',' a)*)? '}' ]

An '"' after the initial '"' in a should unambiguously mean that it is a return symbol, so owl might be able to support it, or am I missing something?


Or more generally, consider [ X Y Z ], if the firstset of Y doesn't contain X, wouldn't it then be possible to support X == Z because a Z after an X would always be a return symbol?

@ianh
Copy link
Owner

ianh commented Nov 17, 2023

It would technically be possible, but I'm not interested in doing it in owl -- I don't want to write error messages that involve terms like "firstset". One of my main motivations for writing owl was frustration with LALR-type shift-reduce errors that make me have to understand how the parser generator actually works internally to diagnose.

@modulovalue
Copy link
Author

I completely agree that LALR conflicts are hard to debug.

However, the thing about firstsets is that they are actually relatively easy to understand and explain. One only needs to be able to walk subtrees from left to right to be able to reason about firstsets.

Furthermore, most implementations use a worklist-based approach for calculating them, but if one collects constraints and treats the problem of finding firstsets as a dataflow problem, it becomes possible to precisely output why the error exists in the first place.

Error messages like the following become possible by walking the dataflow graph in reverse:

X and Z can't be identical because Y can start with X:

1. Y can start with X because of:

[ X Y Z ]
    |
    v
    A B C
    |
    v
    G? H I
    |
    v
    X

2. Y can start with X because of:
   
[ X Y Z ]
    |
    v
    A B C
    |
    v
    G? H I
       |
       v
       X

3. Y can start with X because of:
   ...
     
The specification does not model a visibly pushdown language.

I think it would be really cool to see owl become more expressive like that and be able to support identical call and return symbols in cases where that is theoretically possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants