Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser is not stack-safe for deep records #2

Open
travisbrown opened this issue Apr 12, 2020 · 2 comments
Open

Parser is not stack-safe for deep records #2

travisbrown opened this issue Apr 12, 2020 · 2 comments
Labels
bug Something isn't working parser
Milestone

Comments

@travisbrown
Copy link
Owner

travisbrown commented Apr 12, 2020

All operations currently work just fine on arbitrarily long lists:

scala> import org.dhallj.core.Expr
import org.dhallj.core.Expr

scala> import org.dhallj.parser.DhallParser
import org.dhallj.parser.DhallParser

scala> import org.dhallj.core.converters.JsonConverter
import org.dhallj.core.converters.JsonConverter

scala> val longList = DhallParser.parse((0 to 100000).mkString("[", ",", "]"))
longList: org.dhallj.core.Expr.Parsed = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

scala> longList.normalize
res0: org.dhallj.core.Expr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...

scala> longList.typeCheck
res1: org.dhallj.core.Expr = List Natural

scala> JsonConverter.toCompactString(longList)
res2: String = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,...

…and most things work just fine on arbitrarily deep record literals (or record types, or union types):

scala> val deepRecord = (0 to 10000).foldLeft(longList: Expr) { 
     |   case (expr, _) => Expr.makeRecordLiteral("foo", expr)
     | }
deepRecord: org.dhallj.core.Expr = {foo = {foo = {foo = {foo = {foo = {foo = ...

scala> deepRecord.normalize.alphaNormalize.hash
res3: String = f41d556f987dd60c59e9b49a367b0bf907dba111c904c88dfda27e2a599a07bc

scala> JsonConverter.toCompactString(deepRecord)
res4: String = {"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo": ...

Note that dhall produces the same hash for this expression:

$ dhall hash < deep-record.dhall
sha256:f41d556f987dd60c59e9b49a367b0bf907dba111c904c88dfda27e2a599a07bc

Unfortunately the parser can't handle this expression:

scala> DhallParser.parse(deepRecord.toString)
java.lang.StackOverflowError
  at org.dhallj.parser.support.JavaCCParser.jj_ntk_f(JavaCCParser.java:4508)
  at org.dhallj.parser.support.JavaCCParser.BASE_EXPRESSION(JavaCCParser.java:2425)
  at org.dhallj.parser.support.JavaCCParser.RECORD_LITERAL_ENTRY(JavaCCParser.java:390)
  at org.dhallj.parser.support.JavaCCParser.RECORD_LITERAL_OR_TYPE(JavaCCParser.java:666)
  at org.dhallj.parser.support.JavaCCParser.PRIMITIVE_EXPRESSION(JavaCCParser.java:874)
  at org.dhallj.parser.support.JavaCCParser.SELECTOR_EXPRESSION(JavaCCParser.java:1011)
  at org.dhallj.parser.support.JavaCCParser.COMPLETION_EXPRESSION(JavaCCParser.java:1080)
  ...

I think this should just be a matter of doing some more left-factoring, but I'm fairly new to JavaCC and I don't really know how much work this will be, so I've decided not to let this issue block the 0.1.0 release.

@Gabriella439
Copy link

Gabriella439 commented Apr 14, 2020

@travisbrown: The Haskell implementation had the exact same issue. See this thread starting here: dhall-lang/dhall-haskell#108 (comment).

That issue has an even smaller reproduction which is just ((((((((Bool))))))))

Basically, your intuition is right that the grammar needs to be left-factored. Specifically, the problematic grammar rule is the one for function types:

https://github.com/dhall-lang/dhall-lang/blob/371a43a9aa4bcc0adc91cfd6c9c281a837dace45/standard/dhall.abnf#L758

To see the issue, let's imagine that we're parsing the following simple expression:

Natural

Until the parse hits the end of the file, the parser isn't sure whether or not it is in the middle of parsing a function's input type. For example, the parser doesn't know that the Natural won't be followed by -> Bool or something similar.

Now, imagine that you parse the following expression:

(Natural)

... and now suppose that we temporarily halt the parser after it's done parsing up to here:

(Natural
        ^

At this point in the expression the parser has to entertain 4 possible branches:

  • The parser is parsing the input of the input of a function type

    e.g. (Natural -> Bool) -> Bool

  • The parser is parsing a function type where the entire function type is parenthesized

    e.g. (Natural -> Bool)

  • The parser is parsing a function type where the function's input type is parenthesized

    e.g. (Natural) -> Bool

  • The parser is parsing a non-function type that is parenthesized

    e.g. (Natural)

More generally, you essentially double the number of possible parses every time you nest things. This leads to a slowdown that is exponential in the nesting level.

The solution is to left-factor the top-level parser so that the parser branches after parsing an expression that might be a function's input type instead of ebfore. I don't know exactly what the Java analog of this, but if you're familiar with Haskell it would mean changing a parser like this:

(parseOperatorExpression *> arrow *> parseExpression) <|> parseOperatorExpression

... to instead be left-factored like this:

parseOperatorExpression *> optional (arrow *> parseExpression)

I've oversimplified things, but that is the basic idea.

@Nadrieril
Copy link

Note that the recent precedence change of with (dhall-lang/dhall-lang#954) also has that problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser
Projects
None yet
Development

No branches or pull requests

3 participants