-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miden IR lexer, parser and AST #43
Conversation
The CI check is failing due to |
@jjcnn The build error is because of the |
It's a remnant from what I copied from AirScript. I'll remove it. |
The CI now fails because of some failing tests in the wasm frontend. I don't believe these have anything to do with my changes. |
The changes to the formatter for the IR broke those tests because they are asserting that Wasm lowered to the IR produces particular output. Specifically, it seems braces were added to blocks, and that seems to be tripping up those tests. |
Yes, added braces seem to be the reason. You can regenerate the expected results in all tests with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @jjcnn! I haven't fully completed my review yet, but there are a handful of changes I know we'll need to make already. I will put together a PR against your branch later today/tonight that implements most/all the described changes, so you don't need to tackle them unless you'd like to do so yourself. Here's the list of things I'd like to change:
- The
hir-parser
crate should be merged into thehir
crate, i.e. movehir-parser/src
tohir/src/parser
, movehir-parser/build.rs
tohir/build.rs
, and add thelalrpop/lalrpop-util
crates to the dependencies inhir/Cargo.toml
- After that, remove
symbols.rs
, as that is provided byhir-symbol
which is already re-exported in thehir
crate - The
ast/types.rs
file can also go away, since that is provided byhir-types
already - A number of AST nodes can go away since their actual HIR definitions are available after the changes above, e.g.
CallConvention
,Parameter(Extension|Purpose)
,Identifier
,FunctionIdentifier
, etc. - The
Program
AST type appears to be a leftover from AirScript, and can be removed, files will only be allowed to contain a singleModule
. Additionally,GlobalVariable
declarations are at the module level - The
FunctionIdentifier
grammar rule should be handled by the lexer instead, i.e. the lexer should parse two different types of identifiers based on whether a::
string is found while lexing an identifier (i.e. if it is observed, lex it as aFunctionIdent
, if not, then as a regularIdent
). The lexer can handle computing the span which covers the module name(space) vs the function name, and intern those strings to produce theIdent
/FunctionIdent
we want. This will simplify the grammar a bit, while also giving it more contextual information to reduce the chance of ambiguities. - There are a couple of rules which can be simplified (namely those that handle parenthesized parameter lists, see
CommaOpt
below). I'll do a separate review covering those in more detail when I have a bit more time.
I also noticed that there are a number of clippy
warnings unhandled, you'll want to make sure to run cargo clippy -p <crate>
and pick those off.
// Rule to parse comma-delimited values with zero or more elements
CommaOpt<T>: Vec<T> = {
<vals:(<T> ",")*> <last: T?> => {
let mut vals = vals;
vals.extend(last);
vals
},
};
It's not, actually. I took the program definition from |
Doesn't this rule allow for a sequence that ends with a comma? |
We don't currently indicate an entry point at all in the output format, and I don't think we ever need to do so. If we do decide to do so at some point, it would probably be by adding support for function attributes, and then decorating the entry function with an
Yes, but that's fine IMO. In most cases where comma-separated items are parsed, allowing trailing commas simplifies things and doesn't really have any downside. |
…Purpose, Identifier and FunctionIdentifier, remove Program, allow globals in module declarations
5acb983
to
23fb33c
Compare
I have now addressed all the review comments, added a test of the parser, and rebased the PR against main. |
The lexer, parser and AST are ready for review.
The structure of the files is stolen from the AirScript parser. It is possible that I have kept some things from AirScript that we don't need here, and similarly that I have removed some things we do need, so keep an eye out for that during the review. This is especially true for
hir-parser/src/parser/mod.rs
andhir-parser/src/lexer/mod.rs
.The grammar is almost precisely the one described here: #14 (comment), but with the following adjustments:
i8
,u32
, etc.) have been disregarded. The parser parses all numbers asu128
s, and the transformation to other integer size is left to later stages.false
andtrue
have not been added. Booleans are represented using0
and1
.{
and}
) rather than double braces ({{
and}}
), because I misunderstood how Rust formatting strings deal with braces.ret
operation must be contained in parentheses ((
and)
). This is to avoid an annoying shift/reduce conflict that could otherwise only be handled by adding some sort of end token for instructions (e.g., ';' or '\n') or by adding some really counter-intuitive non-terminals to the grammar. Thehir
formatter has been updated to reflect this change.//
type are allowed.Apart from the tokenization and grammar structure no validation has been implemented. In particular:
The transformation from AST to
hir
has not been implemented yet, and is not intended to be part of this PR.I have not yet added any tests. I intend to add a few tomorrow before this is merged, but I believe the code is ready for review as it is.