-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add grmtools app. #68
Conversation
grmtools (or, more specifically from this commit's perspective lrpar) is a Yacc-compatible parser written for Rust. Although it hasn't really been optimised, I thought this benchmark is sufficiently interesting that users might like to know where grmtools/lrpar fits in. I have deliberately written the lexer and parser in not only "normal/proper Lex/Yacc style" but also "full grmtools style", including good support for error recovery, because I think that's the only mode in which grmtools makes sense. This might mean that this lexer/grammar is doing a bit more work than other parsers, but since I don't expect grmtools to be especially fast anyway, I don't suppose another few percent slowdown will hurt!
Generally we've had a "notoriety | uniqueness" policy. I feel like this isn't quite there on the notoriety. For uniquiness, maybe I don't deal with this area enough but I feel like lalrpop and pest are similar enough. |
Notoriety I will leave to you to judge and I won't be offended either way. LALRPOP is an LR parser, but it doesn't accept Yacc syntax and, indeed, converting Yacc grammars to it can be a challenge. Since every language needs a Yacc parser (there are Yacc grammars available for pretty much every language under the sun, and consequently Yacc-compatible implementations for pretty much every language), there's a need for a Rust Yacc system. grmtools is the only viable Yacc-compatible(ish) implementation I know of Pest is a PEG parser, so is completely different to LALRPOP and grmtools. |
I feel I'm missing something here. The tokens file is universal but doesn't have much value-add. The grammar file though is coupled to the Yacc implementation language and so it seems like "universality of yacc" is fairly limited. |
Yes, this is lrlex input (similar, but not identical, to classic Lex). Lexing is boring, which is why I didn't bother mentioning it.
I'm not sure I've understood this part though? Yacc is the most widely used grammar specification language; only ANTLR comes close. For example, you can take a Yacc grammar from 40+ years ago and it'll probably run through a modern Yacc system on any language (modulo the "actions", which are necessarily language specific). |
%start Object
%expect-unused Unmatched "UNMATCHED"
%%
Object -> Result<Value, Box<dyn Error>>:
"{" ObjectMembersOpt "}" { Ok(Value::Object(HashMap::from_iter($2?))) }
;
ObjectMembersOpt -> Result<Vec<(String, Value)>, Box<dyn Error>>:
ObjectMembers { $1 }
| { Ok(Vec::new()) }
; This is a yacc grammar for parsing json and generating Rust. The grammar being parsed and the language being generated to are coupled together. Your earlier statement made it sound like this would allow leveraging a lot of existing yacc files which would be a value-add. It lets you leverage yacc knowledge, reduces the porting work, but it isn't 1:1 reuse of a grammar from a C application to a Rust application, |
grmtools can, if you want, ignore the action code and produce a parse tree (nimbleparse uses this functionality) -- but that wouldn't be in the spirit of the benchmarks in this repo AFAICS, where a specific In practise, the "difficult" part of a |
Thanks for the explanations. I'll go ahead and err on the side of including it. We can always remove it later if there is a reason to. |
grmtools (or, more specifically from this commit's perspective lrpar) is a Yacc-compatible parser written for Rust. Although it hasn't really been optimised, I thought this benchmark is sufficiently interesting that users might like to know where grmtools/lrpar fits in.
I have deliberately written the lexer and parser in not only "normal/proper Lex/Yacc style" but also "full grmtools style", including good support for error recovery, because I think that's the only mode in which grmtools makes sense. This might mean that this lexer/grammar is doing a bit more work than other parsers, but since I don't expect grmtools to be especially fast anyway, I don't suppose another few percent slowdown will hurt!