Expected list of tokens provided by `%errordefaulthandler explist` only lists shift tokens of most recent state #265

sgraf812 · 2024-01-24T14:44:15Z

Here's a reproducer exhibiting two issues:

{
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
-- For ancient GHC 7.0.4
{-# LANGUAGE MultiParamTypeClasses #-}
module Main where

import Control.Monad (when)
import Data.Char
import System.Exit
}

%name parseStmts
%tokentype { Token }
%errorhandlertype explist
%error { handleError }

%monad { ParseM } { (>>=) } { return }

%token
  '1' { TOne }
  '+' { TPlus }
  ';' { TSemi }

%%

Stmts : {- empty -}    { [] }
      | Stmt           { [$1] }
      | Stmts ';' Stmt { $1 ++ [$3] }

Stmt : Exp { ExpStmt $1 }

Exp : '1'                { One }
    | Exp '+' Exp %shift { Plus $1 $3 }

{
data Token = TOne | TPlus | TSemi
  deriving (Eq,Show)

type Stmts = [Stmt]
data Stmt = ExpStmt Exp
  deriving (Eq, Show)
data Exp = One | Plus Exp Exp
  deriving (Eq, Show)

type ParseM = Either ParseError

data ParseError
        = ParseError [String]
    deriving Eq
instance Show ParseError where
  show (ParseError exp) = "Parse error. Expected: " ++ show exp

recordParseError :: [String] -> ParseM a
recordParseError expected = Left (ParseError expected)

handleError :: [Token] -> [String] -> ParseM a
handleError ts expected = recordParseError expected

lexer :: String -> [Token]
lexer [] = []
lexer (c:cs)
    | isSpace c = lexer cs
    | c == '1'  = TOne:(lexer cs)
    | c == '+'  = TPlus:(lexer cs)
    | c == ';'  = TSemi:(lexer cs)
    | otherwise = error "lexer error"

main :: IO ()
main = do
  test "11" $ \res -> res == Left (ParseError ["';'","'+'"])
  where
    test inp p = do
      putStrLn $ "testing " ++ inp
      let tokens = lexer inp
      let res = parseStmts tokens
      when (not (p res)) $ do
        print res
        exitWith (ExitFailure 1)
}

Note that it tests an input 11 which leads to a syntax error after the first 1 and checks whether both ; and + were listed as expected tokens.
But if you compile and run it,

$ happy issue265.y && ghc issue265.hs && issue265
testing 11
Left Parse error. Expected: ["';'"]

You can see that it only suggests ;. That is due to two separate issues:

In order to generate smaller code, happy replaces erroring default actions with the most common reduction (see getDefault). After we shift the first 1 and detect the error, we end up default-reducing all the way back up to Stmts -> Stmts . ';' Stmt, which neglects the item Exp -> Exp . '+' Exp.
Even if getDefault was "fixed", we'd be stuck in the reduction state Exp -> 1 . , where there is no expected shift token whatsoever. That points out another flaw: The implementation of %errorhandlertype explist is insufficient, because it only reports the tokens to be shifted for the topmost state on the stack. This strategy isn't so bad, but we'd better walk the whole state stack as if we successfully reduced and collect all shiftable tokens we encounter on the way (so it's rather not simply "walking the stack" I'm afraid). It ought to be possible to simulate this to get quite context-sensitive expected token lists. The next best solution would be to consider the set of tokens of the topmost state with a shift or a reduce action.

Of course, (2) is infeasible for non-array-based parsers, or at least would require quite a bit of extra code.

The text was updated successfully, but these errors were encountered:

sgraf812 · 2024-03-06T13:27:02Z

I implemented a solution in #272, specifically

8e94258, that is,

we'd better walk the whole state stack as if we successfully reduced and collect all shiftable tokens we encounter on the way (so it's rather not simply "walking the stack" I'm afraid). It ought to be possible to simulate this to get quite context-sensitive expected token lists.

sgraf812 · 2024-10-23T11:15:03Z

Fixed in Happy 2.1.

sgraf812 · 2024-10-23T11:33:07Z

Well, the new algorithm is implemented, but it does not really fix the OP because Happy tries very hard to compress tables. Specifically after shifting 1 and reducing once, we end up in the following state

State 4

	Stmt -> Exp .                                       (rule 4)
	Exp -> Exp . '+' Exp                                (rule 6)

	'+'            shift, and enter state 6
	';'            reduce using rule 4
	%eof           reduce using rule 4

Note that the current input token is the second 1. There is no action listed for this input, so the automaton should error.

However, Happy chooses to pick reduce using rule 4 as the default action here in order to get smaller action tables. (The mechanism is documented in LALR.ProduceCode, "Action Tables.", and does not influence error sites.) This causes Happy to reduce on input 1 instead, and then once more, at which point it is forced to either accept %eof or shift ; and hence emits the error here. But at this point, we are no longer able to shift +, so the new algorithm does not list it as an expected token.

This is a bit sad, but not doing the default action compression trick increases the size of the parsing tables significantly (x3 on issue 93).

andreasabel added re: building happy re: error handler Concerning (custom) error handlers labels Jan 24, 2024

sgraf812 added this to the 2.1 milestone Sep 19, 2024

sgraf812 removed this from the 2.1 milestone Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected list of tokens provided by `%errordefaulthandler explist` only lists shift tokens of most recent state #265

Expected list of tokens provided by `%errordefaulthandler explist` only lists shift tokens of most recent state #265

sgraf812 commented Jan 24, 2024 •

edited

Loading

sgraf812 commented Mar 6, 2024

sgraf812 commented Oct 23, 2024

sgraf812 commented Oct 23, 2024

Expected list of tokens provided by %errordefaulthandler explist only lists shift tokens of most recent state #265

Expected list of tokens provided by %errordefaulthandler explist only lists shift tokens of most recent state #265

Comments

sgraf812 commented Jan 24, 2024 • edited Loading

sgraf812 commented Mar 6, 2024

sgraf812 commented Oct 23, 2024

sgraf812 commented Oct 23, 2024

Expected list of tokens provided by `%errordefaulthandler explist` only lists shift tokens of most recent state #265

Expected list of tokens provided by `%errordefaulthandler explist` only lists shift tokens of most recent state #265

sgraf812 commented Jan 24, 2024 •

edited

Loading