Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected backtracking behavior or bad error message #5

Open
dvogel opened this issue Oct 5, 2012 · 1 comment
Open

Unexpected backtracking behavior or bad error message #5

dvogel opened this issue Oct 5, 2012 · 1 comment

Comments

@dvogel
Copy link

dvogel commented Oct 5, 2012

from parcon import SignificantLiteral, Regex

Digits = Regex(ur"[0-9]+")
DecimalLiteral = ((Digits + SignificantLiteral(".") + Digits)
                  | (SignificantLiteral(".") + Digits))
SinglePeriod = SignificantLiteral(".")
BadExpr = (SignificantLiteral("junk that won't match") | SinglePeriod)
WrappingExpr = (BadExpr | DecimalLiteral | SignificantLiteral(".."))

"""
Here's my understanding of what should happen:

Given the parser definitions above, when parsing "..", WrappingExpr
should attempt to parse using BadExpr. The first parser tried by
BadExpr will obviously fail, as designed. The second parser,
SinglePeriod will succeed. BadExpr will then succeed if all=False,
but with all=True it will fail. WrappingExpr will then attempt to
parse using DecimalLiteral. The first form of DecimalLiteral requires
a digit prefix, which fails to match. The second form of DecimalLiteral
matches the period prefix but then fails because it is not followed by
digits. WrappingExpr should then attempt the SignificantLiteral("..")
parser and that parser should succeed but instead I get:

    ParseException: Parse failure: At position 0: expected "junk that won't match"
"""
print "all=False"
print WrappingExpr.parse_string("..", all=False)
print "all=True"
print WrappingExpr.parse_string("..", all=True)
@BrenBarn
Copy link

The problem appears to be due to the way results and expectations are combined with filter_expectations. Specifically, it excludes "unsatisfiable" matches without considering whether they were successes or failures, and excludes them even if they occurred at a later match point than earlier "satisfiable" expectations.

The behavior can be reproduced more simply:

>>> g =  (SignificantLiteral("junko") | SignificantLiteral("."))
>>> g.parse_string('.x', all=False)
'.'
>>> g.parse_string('.x', all=True)
Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    g.parse_string('.x', all=True)
  File "C:\Users\BrenBarn\Documents\Python\extensions\parcon\parcon\__init__.py", line 641, in parse_string
    raise ParseException("Parse failure: " + format_failure(result.expected), result.expected)
ParseException: Parse failure: At position 0: expected "junko"

It says it expects "junko" at the beginning, but that's wrong: it can accept either "junko" or "." at that position. I think the right behavior in this situation is to say "expected end of input at position 1" (i.e., after the period). The period did match, so our "best match" has already consumed the period, and our best guess about what should come next should start from after the period.

I can think of two ways to handle this. One is to modify filter_expectations so that it doesn't exclude EUnsatisfiable expectations at the maximum position. I'd have to look at the expectation handling more, but at first glance I don't understand why it's excluded; end-of-input is a perfectly reasonable thing to expect. The other solution is to make end-of-input its own kind of expectation, separate from EUnsatisfiable. It does seem a bit strange that EUnsatisfiable is used both for "we expect no more input" and for "logically impossible" cases like Invalid(). Invalid() cannot match even at the end of input, so it's different from expecting end of input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants