Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement modules #297

Merged
merged 30 commits into from
Jun 23, 2023
Merged

feat: implement modules #297

merged 30 commits into from
Jun 23, 2023

Conversation

bitwalker
Copy link
Contributor

@bitwalker bitwalker commented May 16, 2023

NOTE: I've broken up this PR into a sequence of commits that are smaller/more focused, and in the order that will give you the best context as you review each of them. These commits do not compile individually, but all together they do, as well as pass all of the parser crate tests - they are simply broken up this way to make it easier to review. Please take note of the commit messages, they provide a lot of detail.


This PR refactors the parser crate to implement modules, while in the process simplifying the grammar and the syntax tree itself. It adds support for rich diagnostics to all AST nodes, and makes heavy use of the new diagnostics infrastructure during program compilation/validation. It implements a generic visitor for implementing passes over the AST, and uses this both for import resolution and semantic analysis, both of which are also implemented in this PR. As an aside, this PR also removes the few vestiges of match enf currently in the compiler, which we can add back once the discussion around its syntax and semantics has been pinned down.

Now, this PR is a draft PR because after doing some initial review of how this affects upstream crates, namely the IR, I did not want to dive into updating the IR crate (which does not compile on this branch) until these changes have been signed off on, and we've discussed how to proceed. Many of the things which the IR crate does in terms of validation and constructing state for the constraint graph are actually made redundant by the changes in this PR, and I'm unsure how we want to approach that going forward.

To describe a bit what I mean: when you invoke the parser after these changes, you are asking the parser to produce a Program. In the process of producing that, the parser does all of the important work of semantic analysis, and in particular, resolving identifiers to their definitions, propagating types, etc. This means that when you get a Program, it is still a relatively high level representation, however all names have been resolved, most dead code has been eliminated, and it is ready for lowering to a form more amenable for optimization and execution (i.e. the IR). However, the IR (currently) is doing a lot of that work as it translates from the AST to the IR, which simply falls apart when modules are involved, and in general is not the ideal place in a compilation pipeline to do some of that stuff.

What I'd like to do to get the IR crate compiling again is first figure out how close the Program structure is to "ideal" form for lowering to the IR, and what we need to translate directly from it to the IR. It may be the case that some of the information gathered (and subsequently discarded) during semantic analysis, such as the dependency graph, would actually be useful during lowering to the IR, and if so, we might want to defer invoking semantic analysis as part of the parser, and instead put it in the hands of the IR lowering pass. If on the other hand, all of the information needed is already in the Program structure, then I suspect we can vastly simplify parts of the IR crate now.

@bitwalker bitwalker requested review from grjte and tohrnii May 16, 2023 00:32
@bitwalker bitwalker self-assigned this May 16, 2023
@bitwalker bitwalker force-pushed the bitwalker/modules branch 7 times, most recently from 00ec1a0 to 517b684 Compare May 19, 2023 07:24
Copy link
Contributor

@grjte grjte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review. I'm not quite done going through the AST yet and haven't looked at tests/lexer/grammar changes, etc, so some of these comments/questions might be handled elsewhere, in which case just clarifying the comments a bit would help.

One question - so far it looks to me like we may have lost the functionality for applying selectors to constraints?

parser/src/ast/declarations.rs Outdated Show resolved Hide resolved
parser/src/ast/declarations.rs Outdated Show resolved Hide resolved
parser/src/ast/declarations.rs Outdated Show resolved Hide resolved
parser/src/ast/declarations.rs Outdated Show resolved Hide resolved
parser/src/ast/statement.rs Show resolved Hide resolved
parser/src/ast/expression.rs Outdated Show resolved Hide resolved
parser/src/ast/expression.rs Outdated Show resolved Hide resolved
parser/src/ast/expression.rs Outdated Show resolved Hide resolved
parser/src/ast/expression.rs Show resolved Hide resolved
parser/src/ast/declarations.rs Show resolved Hide resolved
@bitwalker
Copy link
Contributor Author

@grjte Support for constraint selectors is still there, just wrapped up in the ListComprehension type which does double duty for regular comprehensions as well as constraints with selectors, by representing those constraints as comprehensions over a range of 1, with the selector set to some expression. It's represented in the AST this way because the syntax for such constraints and list comprehensions is virtually identical, as are all the analyses/transformations, with a few extra validations performed when such comprehensions are part of a constraint. During lowering to the algebraic graph though, that abstraction is erased and it has the same form as before.

@bobbinth
Copy link
Contributor

It's represented in the AST this way because the syntax for such constraints and list comprehensions is virtually identical

I may be missing some context here - but I think the syntax is quite different:

# list comprehension
enf s[i]' = s[i + 1] for i in 2..15

# selectors
match enf:
  noop([stack_top])                     when op_flags[0]
  eqz([stack_top, stack_helpers[2]])    when op_flags[1]

# or selectors with new proposed syntax
match:
  when op_flags[0]: noop([stack_top])
  when op_flags[1]: eqz([stack_top, stack_helpers[2]])

Am I missing something?

@grjte
Copy link
Contributor

grjte commented May 19, 2023

Support for constraint selectors is still there, just wrapped up in the ListComprehension type which does double duty for regular comprehensions as well as constraints with selectors, by representing those constraints as comprehensions over a range of 1, with the selector set to some expression.

Hm ok - I did see that it was in there. I'll look at this more to get my head around it as I finish the full AST & tests. I think I have a similar confusion to @bobbinth, that it doesn't feel natural to me to represent it as a list comprehension, but that may change once I've gotten through everything

@bitwalker
Copy link
Contributor Author

bitwalker commented May 19, 2023

@bobbinth It's important to note that the match construct is not present in this PR. In my staged changes that bring it back, it has its own AST representation because it is considerably different syntactically than "comprehension" constraints. This trick is only used to represent single constraints with selectors as if they were comprehension constraints with selectors so that they are treated uniformly.

@bitwalker
Copy link
Contributor Author

bitwalker commented May 19, 2023

I think I have a similar confusion to @bobbinth, that it doesn't feel natural to me to represent it as a list comprehension

Consider a portion of the current grammar (i.e. before this PR):

BoundaryConstraintExpr: BoundaryConstraint = {
    <column: SymbolAccess> "." <boundary: Boundary> "=" <value: BoundaryExpr>
        <comprehension: ConstraintComprehension<BoundaryExpr>?> =>

ConstraintComprehension<T>: ComprehensionContext = {
    <l:@L> "for" <members: Members> "in" <iterables: Iterables> <r:@R> =>? ...

ListComprehension<T>: ListComprehension = {
    <l:@L> <expr: T> "for" <members: Members> "in" <iterables: Iterables> <r:@R> =>? ...


The representation of these two things is virtually identical, but the body of the comprehension is defined in the BoundaryConstraint rule for ComprehensionConstraint.

@grjte
Copy link
Contributor

grjte commented May 19, 2023

The representation of these two things is virtually identical, but the body of the comprehension is defined in the BoundaryConstraint rule for ComprehensionConstraint.

Yeah, I'm sure it will be clearer once I get to the new grammar. I guess we essentially treat

enf a = a + 1

as

enf a = a + 1 for a in [a]

@bitwalker
Copy link
Contributor Author

bitwalker commented May 19, 2023

@grjte For simple constraints there is no need to wrap them in a comprehension, so enf a = a + 1 remains that way. Specifically we're representing two variants of constraints in comprehension form:

  • Actual constraint comprehensions, e.g. enf a = a + 1 for a in [a] when k0 (this example has a selector, but that's optional of course)
  • Simple constraints with selectors, e.g. enf a = a + 1 when k0

The latter is parsed into comprehension form, as if it had been written as enf a = a + 1 for i in 0..1 when k0.

The result is that we end up with two types of constraints to deal with: simple, and comprehension. When match is added back, we'll have a third type which represents its unique semantics.

@bitwalker bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from ac5fc13 to d64f8c7 Compare May 19, 2023 18:32
Copy link
Contributor

@grjte grjte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another partial review. It covers almost everything except the parser test files and the semantic analysis, which I'll finish tomorrow. I think the new design makes sense and the changes look great overall. Most of my comments are fairly small ones regarding clarity. At this point I would say I don't anticipate that I'll have any major change requests for this PR, so small issues aside it should be pretty much ready to merge once I get through the rest of it

Comment on lines +19 to +21
/// The name of this trace segment, e.g. `$main`
///
/// NOTE: The name of a trace segment is always a special identifier (i.e. has the `$` prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for the future: I think we should get rid of the $ for the trace segment names, since they're not actually globally accessible. Personally, I also think we should let people name these segments whatever they want, and it will only be the order that determines the trace segment. main and aux are very Winterfell-/Miden-specific names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree that main and aux should not be special in any way, but I think there is probably value in making the top-level trace segment names global/special identifiers, i.e. if you name a segment foo, you should be able to reference a specific column of foo directly as $foo[N]. That said, I suspect in practice such a feature wouldn't be used much if at all, so perhaps it is only useful when performing certain transformations on the AST internally in the compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my thinking was that a named trace segment foo is only accessible within the root module, whereas a named random values vector rand is actually globally accessible. Using foo[i] and $rand[i] distinguishes those things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear about what I meant - your example is how things work with regards to module-local bindings, i.e. within the root module, references to bindings in the trace_columns declaration use the $-less identifier syntax, e.g. foo[i], as would referring to specific random_values bindings; but what I am suggesting is that within any module, you could always refer directly to the primary declaration (e.g. trace segment name) for both random_values and trace_columns segments using the $-prefixed form. These names are guaranteed to be unique, as we don't allow conflicts in top-level identifiers, and the $ prefix guarantees they are distinct from other identifiers within any given module, so disambiguating them from the compiler-side is trivial (this in fact happens automatically when constructing the map of identifiers and their "binding type" when starting semantic analysis). From the perspective of someone writing AirScript it is perhaps less clear, but it seems to me to be unlikely that there would be confusion about the name associated with random_values and the trace_columns segment names from a readability point of view - but that's naturally debatable.

I guess bottom line is that it is easy to support with how things work currently, and is useful for certain transformations on the AST. That said, we don't have to support it when actually writing AirScript, i.e. we disallow it when validating modules, but then allow it internally in the compiler. They are useful in that context because of how identifiers are scoped, it is desirable to have an escape hatch in the form of global/fully-qualified identifiers.

parser/src/ast/trace.rs Show resolved Hide resolved
parser/src/ast/trace.rs Outdated Show resolved Hide resolved
parser/src/ast/trace.rs Outdated Show resolved Hide resolved
parser/src/ast/mod.rs Outdated Show resolved Hide resolved
parser/src/lexer/mod.rs Outdated Show resolved Hide resolved
parser/src/lexer/mod.rs Outdated Show resolved Hide resolved
parser/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
parser/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
parser/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
@bitwalker bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from d0d4416 to eac0f51 Compare May 22, 2023 23:06
Copy link
Contributor

@grjte grjte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another partial review which now mostly covers the parser tests. (Semantic analysis is still remaining)

One broad comment regarding the parser test macros - some of these are quite complex and I would feel more comfortable if we had a couple of tests to cover those ones. Adding some docs & comments would help as well for the macros with many cases

parser/src/ast/expression.rs Outdated Show resolved Hide resolved
parser/src/ast/trace.rs Show resolved Hide resolved
parser/src/ast/mod.rs Show resolved Hide resolved
parser/src/ast/mod.rs Outdated Show resolved Hide resolved
parser/src/parser/tests/utils.rs Show resolved Hide resolved
parser/src/parser/tests/mod.rs Outdated Show resolved Hide resolved
parser/src/parser/tests/mod.rs Outdated Show resolved Hide resolved
name: ResolvableIdentifier::Local(ident!($name)),
access_type: AccessType::Default,
offset: 0,
ty: Some($ty),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never be able to have a non-Felt type for a bounded symbol access (although maybe this is for testing invalid cases?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that can probably be hardcoded here. If it ever comes up that we need to specify arbitrary types for certain tests, we can handle that as an additional case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I know why this exists at the moment - for tests which do not run through semantic analysis, the AST nodes are untyped, so we need a case that is untyped, and one that is typed, even if that type is always the same. So I made bounded_access mirror access here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It might be clearer for the future to add a comment which says that

parser/src/parser/tests/mod.rs Outdated Show resolved Hide resolved
parser/src/parser/tests/mod.rs Show resolved Hide resolved
@bitwalker bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from daa45ad to 9cb3060 Compare May 24, 2023 17:18
Copy link
Contributor

@grjte grjte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review completes all of the test files. There are a bunch of tests with invalid AirScript that was previously acceptable at the parsing stage and addressed in the IR (or in some cases not addressed yet, but with issues open). I've marked everything that is invalid or nonsensical, since most validation is now moving to the semantic analysis pass.

It probably makes sense to adjust the tests to valid AirScript during this PR, but the checks for each of these things could be part of the next PR(s) if you think that makes more sense.

Comment on lines -13 to +16
integrity_constraints:
mod test

ev test([clk]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove the comment above this that says "// the operation must be put into a source section, or parsing will fail", since it was explaining why the integrity_constraints section was used, and that's no longer relevant. It can be removed everywhere in these test files.

ident!(test),
vec![trace_segment!(0, "%0", [(clk, 1)])],
vec![enforce!(eq!(
exp!(access!(clk, 1), add!(access!(clk), int!(2))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow this anymore. I don't think there's any good reason for allowing non-constant exponents. It was previously allowed in the parser and blocked in the IR, but now that we're moving errors out of the IR and handling them during parsing & semantic analysis we should probably cover this case here.

Comment on lines +19 to +20
integrity_constraints:
enf clk = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it doesn't matter for the parser, but this is a useless integrity constraint. If we actually had something like this, we could just delete the entire column (since it would just be a column of 0s). It would be more realistic to do clk' = clk + 1

/// ```
///
/// This is used as a common base for most tests in this module
fn test_module() -> Module {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we name this base_module so it's more obvious that it's just forming the AST of the BASE_MODULE? either that or maybe add a trivial test for equality between this and parsed BASE_MODULE? I know it's just a test, but I always like to avoid having things that are expected to be equivalent but which don't either depend on each other or assert that equivalence

]);

ParseTest::new().expect_ast(source, expected);
enf x.first = 0 for x in inputs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't valid for a few reasons:

  1. Public inputs are just vectors of values, e.g. [0, 1, 2, 3], so a notion of inputs[0].first is meaningless
  2. Constraints must be enforced against trace columns, so enf inputs[0] = 2 is semantically like trying to do enf 1 = 2

The previous test was valid because it iterated over trace columns

SourceSpan::UNKNOWN,
vec![
let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() =>
enforce!(eq!(bounded_access!(a, Boundary::First), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: x is undefined in this context, which was probably not intentional for this test

Comment on lines +338 to +339
let_!(x = lc!(((c, expr!(slice!(c, 0..3)))) => access!(c)).into() =>
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this shouldn't be valid, because the range syntax should not be inclusive, so x only has length 3

SourceSpan::UNKNOWN,
vec![
let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() =>
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: x is not defined in this context, which was probably not intended for this test. It was probably supposed to be diff

vec![
let_!(diff = lc!(((w, range!(0..3)), (x, expr!(access!(b))), (y, expr!(slice!(c, 0..3))), (z, expr!(slice!(d, 0..3)))) =>
sub!(sub!(add!(access!(w), access!(x)), access!(y)), access!(z))).into() =>
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: same issue with x being undefined. This was probably supposed to be diff

const B = [[1, 1], [2, 2]]

integrity_constraints:
enf test_constraint(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be enf test_constraint([b]) here and in the comment below. It shouldn't be valid to call an evaluator without the brackets that indicate the trace segment(s)

This commit introduces the following changes to the AST:

1. Change `Expr::Vector` to contain `Expr` elements rather than just
  `ScalarExpr` elements. This is necessary in order to properly
  represent the semantics of vectors in AirScript
2. Introduce `Statement::EnforceIf` for use during translation to
   represent individual scalar constraints with runtime selectors. We do
   not use this variant until inlining is performed to retain the
   benefits of the previous representation using `EnforceAll`
3. Introduce `Statement::Expr` for use during translation to represent
   statements used in an expression context. Specifically, when
   inlining is performed on a let-bound expression, the inlined
   expression may require the use of let-bound variables, so we
   need a way to represent such statements during the inlining
   transformation.
Now that the `air-parser` crate handles the bulk of the semantic
analysis and transformation passes, most of what is in the `air-ir`
crate is no longer needed, or has been superceded in some way. This
commit removes the unnecessary parts, and reorganizes what remains to
simplify the structure of the crate.

NOTE: This commit does not update the tests, or include the AST to IR
lowering pass. That will follow in subsequent commits.
@bitwalker
Copy link
Contributor Author

@hackaugusto I've updated the air-codegen-masm crate as part of rebasing this PR on next before I merge it, and there are 5 broken tests which I think are just due to changes in what IR is produced from a given source program + optimizations (e.g. constant prop), but would like to verify.

The commit which modifies the MASM code generator is b1440c1, and the broken tests are:

  • codegen::masm::bitwise
  • codegen::masm::constants
  • codegen::masm::list_comprehension
  • codegen::masm::pub_inputs
  • codegen::masm::variables

I suspect we can just update the expected output to account for the changes to the IR lowering phase of the compiler, but don't want to do that without checking with you first. Let me know if you see any bugs.

@bobbinth @grjte This branch is now rebased against the latest code in next, so this is ready to merge once @hackaugusto signs off on updating those tests.

@hackaugusto
Copy link
Contributor

hackaugusto commented Jun 22, 2023

Yeah, it is because of the new optimizations, it seems to be constants propagation:

  • subtraction by zero are gone
  • exponentiation of a constant by a constant are pre-computed
  • basic operations like add/sub/mul of constants are also pre-computed

and there seems to be a case of #314 in there.

Thank you for handling the rebase!

@bitwalker
Copy link
Contributor Author

Thanks @hackaugusto! You're right that this changeset includes a patch that orders declarations by name in the intermediate representation, which was done for determinism purposes, but seems to also solve #314, so we may be able to close that out when this is merged

@bitwalker bitwalker merged commit 4332348 into next Jun 23, 2023
@bitwalker bitwalker deleted the bitwalker/modules branch June 23, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants