Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Error tolerant parsing #102

Open
alecthomas opened this issue Aug 27, 2020 · 3 comments
Open

Proposal: Error tolerant parsing #102

alecthomas opened this issue Aug 27, 2020 · 3 comments
Labels

Comments

@alecthomas
Copy link
Owner

Participle currently supports returning a partial AST when parsing fails, but only up to the point of failure. This issue proposes that Participle reject failed tokens but continue to parse, if possible.

(transcribed from Slack)

For participle to produce a full AST even when errored, my initial thought is that it would need to combinatorially match tokens to all branches of the parse tree, back to the root

eg. (hypothesising here) given this kind of input

a = "foo" bar
c = 10

and this grammar

type Assignments struct {
  Assignment []*Assignment `@@*`
}
type Assignment struct {
  Ident string `@Ident "="`
  Value *Value `@@`
}
type Value struct {
  String *string  `  @String`
  Number *float64 `| @Number`
}

Once it hits the tokens [bar, c, =, 10], bar would correctly parse into Assignment.Ident but Value would fail, so the parser would perhaps reset back to the start of Assignment, but reject the token bar as an error.

There'd have to be some mechanism for capturing the failed branches. Probably out of band somehow, or possibly the partially valid sub-tree could be inserted with a marker indicating it's an error. Similar to how Pos lexer.Position is special cased there could be something like ParseFailure lexer.Position that is populated if that branch has errors.

It would have to be opt-in though, as that kind of backtracking could be quite expensive.

I don't like that ParseFailure. What might be better is RejectedTokens []lexer.Token. It might not be ideal to require that be part of the AST 🤔.

So the AST in this case might end up something like:

Assignments{
  RejectedTokens []lexer.Token{...},
  Assignment: []*Assignment{
    Assignment{ /* a = "foo" */ },
    Assignment{ /* bar */, RejectedTokens: ...},
    Assignment{ /* c = 10 */ },
  },
}

The RejectedTokens could be populated all the way back to the root, potentially.

@alecthomas
Copy link
Owner Author

As per further conversation in Slack, it is probably a better idea not to preserve rejected nodes as multiple rejections could pollute the AST. Unclear until implementation.

@alecthomas
Copy link
Owner Author

A more complex example:

AST

type Decls struct {
  Funcs []*Func `@@*`
}

type Func struct {
  Name string `"func" @Ident "(" ")" "{"`
  Body []*Stmt `@@* "}"`
}

type Stmt struct {
  Asgn *Asgn `  @@`
  If *If  #   `| @@`
}

type Asgn struct {
  Name string `@Ident "="`
  Value int `@Int`
}

type If struct {
  Condition string `@Ident "{"`
  Body []*Stmt `@@* "}"`
}

Source

func foo() {
  a = 10
  if true {
}

func bar() {
}

Parsed

Decls{
  Func{Name: "foo", Body: {
    Stmt{Asgn{Name: "a", Value: 10}},
    Stmt{If{Condition: "true", Body: {}}},
  }}
  Func{Name: "bar", Body: {}}
}

In this case there are no additional tokens. On the contrary, there are missing tokens. Need a way to convey this.

Maybe an Error participle.Error field?

@sowelipililimute
Copy link

The proposed error mechanism here sounds like what's called "synchronisation", where a rule is marked as a synchronisation point. Whenever a token that doesn't fit is encountered, the parser unwinds until it hits a synchronisation point, at which point it starts discarding tokens until it finds one that fits in the current context, and resumes parsing from there.

IMO ideally the synchronisation points would be user specifiable with the ability to implement an interface for even more control over how the parser synchronises back to a sane state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants