Parser is not stack-safe for deep records #2

travisbrown · 2020-04-12T10:33:55Z

All operations currently work just fine on arbitrarily long lists:

scala> import org.dhallj.core.Expr
import org.dhallj.core.Expr

scala> import org.dhallj.parser.DhallParser
import org.dhallj.parser.DhallParser

scala> import org.dhallj.core.converters.JsonConverter
import org.dhallj.core.converters.JsonConverter

scala> val longList = DhallParser.parse((0 to 100000).mkString("[", ",", "]"))
longList: org.dhallj.core.Expr.Parsed = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

scala> longList.normalize
res0: org.dhallj.core.Expr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...

scala> longList.typeCheck
res1: org.dhallj.core.Expr = List Natural

scala> JsonConverter.toCompactString(longList)
res2: String = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,...

…and most things work just fine on arbitrarily deep record literals (or record types, or union types):

scala> val deepRecord = (0 to 10000).foldLeft(longList: Expr) { 
     |   case (expr, _) => Expr.makeRecordLiteral("foo", expr)
     | }
deepRecord: org.dhallj.core.Expr = {foo = {foo = {foo = {foo = {foo = {foo = ...

scala> deepRecord.normalize.alphaNormalize.hash
res3: String = f41d556f987dd60c59e9b49a367b0bf907dba111c904c88dfda27e2a599a07bc

scala> JsonConverter.toCompactString(deepRecord)
res4: String = {"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo":{"foo": ...

Note that dhall produces the same hash for this expression:

$ dhall hash < deep-record.dhall
sha256:f41d556f987dd60c59e9b49a367b0bf907dba111c904c88dfda27e2a599a07bc

Unfortunately the parser can't handle this expression:

scala> DhallParser.parse(deepRecord.toString)
java.lang.StackOverflowError
  at org.dhallj.parser.support.JavaCCParser.jj_ntk_f(JavaCCParser.java:4508)
  at org.dhallj.parser.support.JavaCCParser.BASE_EXPRESSION(JavaCCParser.java:2425)
  at org.dhallj.parser.support.JavaCCParser.RECORD_LITERAL_ENTRY(JavaCCParser.java:390)
  at org.dhallj.parser.support.JavaCCParser.RECORD_LITERAL_OR_TYPE(JavaCCParser.java:666)
  at org.dhallj.parser.support.JavaCCParser.PRIMITIVE_EXPRESSION(JavaCCParser.java:874)
  at org.dhallj.parser.support.JavaCCParser.SELECTOR_EXPRESSION(JavaCCParser.java:1011)
  at org.dhallj.parser.support.JavaCCParser.COMPLETION_EXPRESSION(JavaCCParser.java:1080)
  ...

I think this should just be a matter of doing some more left-factoring, but I'm fairly new to JavaCC and I don't really know how much work this will be, so I've decided not to let this issue block the 0.1.0 release.

The text was updated successfully, but these errors were encountered:

Gabriella439 · 2020-04-14T02:57:27Z

@travisbrown: The Haskell implementation had the exact same issue. See this thread starting here: dhall-lang/dhall-haskell#108 (comment).

That issue has an even smaller reproduction which is just ((((((((Bool))))))))

Basically, your intuition is right that the grammar needs to be left-factored. Specifically, the problematic grammar rule is the one for function types:

https://github.com/dhall-lang/dhall-lang/blob/371a43a9aa4bcc0adc91cfd6c9c281a837dace45/standard/dhall.abnf#L758

To see the issue, let's imagine that we're parsing the following simple expression:

Natural

Until the parse hits the end of the file, the parser isn't sure whether or not it is in the middle of parsing a function's input type. For example, the parser doesn't know that the Natural won't be followed by -> Bool or something similar.

Now, imagine that you parse the following expression:

(Natural)

... and now suppose that we temporarily halt the parser after it's done parsing up to here:

(Natural…
        ^

At this point in the expression the parser has to entertain 4 possible branches:

The parser is parsing the input of the input of a function type

e.g. (Natural -> Bool) -> Bool
The parser is parsing a function type where the entire function type is parenthesized

e.g. (Natural -> Bool)
The parser is parsing a function type where the function's input type is parenthesized

e.g. (Natural) -> Bool
The parser is parsing a non-function type that is parenthesized

e.g. (Natural)

More generally, you essentially double the number of possible parses every time you nest things. This leads to a slowdown that is exponential in the nesting level.

The solution is to left-factor the top-level parser so that the parser branches after parsing an expression that might be a function's input type instead of ebfore. I don't know exactly what the Java analog of this, but if you're familiar with Haskell it would mean changing a parser like this:

(parseOperatorExpression *> arrow *> parseExpression) <|> parseOperatorExpression

... to instead be left-factored like this:

parseOperatorExpression *> optional (arrow *> parseExpression)

I've oversimplified things, but that is the basic idea.

Nadrieril · 2020-04-14T14:52:47Z

Note that the recent precedence change of with (dhall-lang/dhall-lang#954) also has that problem

travisbrown mentioned this issue Apr 12, 2020

Type checker is not stack-safe for (extremely) deep records #3

Open

travisbrown added the parser label Apr 12, 2020

travisbrown added this to the 1.0.0 milestone Apr 12, 2020

travisbrown added the bug Something isn't working label Apr 12, 2020

travisbrown mentioned this issue Apr 16, 2020

Parser clean-up #31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser is not stack-safe for deep records #2

Parser is not stack-safe for deep records #2

travisbrown commented Apr 12, 2020 •

edited

Loading

Gabriella439 commented Apr 14, 2020 •

edited

Loading

Nadrieril commented Apr 14, 2020

Parser is not stack-safe for deep records #2

Parser is not stack-safe for deep records #2

Comments

travisbrown commented Apr 12, 2020 • edited Loading

Gabriella439 commented Apr 14, 2020 • edited Loading

Nadrieril commented Apr 14, 2020

travisbrown commented Apr 12, 2020 •

edited

Loading

Gabriella439 commented Apr 14, 2020 •

edited

Loading