feat: implement modules #297

bitwalker · 2023-05-16T00:32:10Z

NOTE: I've broken up this PR into a sequence of commits that are smaller/more focused, and in the order that will give you the best context as you review each of them. These commits do not compile individually, but all together they do, as well as pass all of the parser crate tests - they are simply broken up this way to make it easier to review. Please take note of the commit messages, they provide a lot of detail.

This PR refactors the parser crate to implement modules, while in the process simplifying the grammar and the syntax tree itself. It adds support for rich diagnostics to all AST nodes, and makes heavy use of the new diagnostics infrastructure during program compilation/validation. It implements a generic visitor for implementing passes over the AST, and uses this both for import resolution and semantic analysis, both of which are also implemented in this PR. As an aside, this PR also removes the few vestiges of match enf currently in the compiler, which we can add back once the discussion around its syntax and semantics has been pinned down.

Now, this PR is a draft PR because after doing some initial review of how this affects upstream crates, namely the IR, I did not want to dive into updating the IR crate (which does not compile on this branch) until these changes have been signed off on, and we've discussed how to proceed. Many of the things which the IR crate does in terms of validation and constructing state for the constraint graph are actually made redundant by the changes in this PR, and I'm unsure how we want to approach that going forward.

To describe a bit what I mean: when you invoke the parser after these changes, you are asking the parser to produce a Program. In the process of producing that, the parser does all of the important work of semantic analysis, and in particular, resolving identifiers to their definitions, propagating types, etc. This means that when you get a Program, it is still a relatively high level representation, however all names have been resolved, most dead code has been eliminated, and it is ready for lowering to a form more amenable for optimization and execution (i.e. the IR). However, the IR (currently) is doing a lot of that work as it translates from the AST to the IR, which simply falls apart when modules are involved, and in general is not the ideal place in a compilation pipeline to do some of that stuff.

What I'd like to do to get the IR crate compiling again is first figure out how close the Program structure is to "ideal" form for lowering to the IR, and what we need to translate directly from it to the IR. It may be the case that some of the information gathered (and subsequently discarded) during semantic analysis, such as the dependency graph, would actually be useful during lowering to the IR, and if so, we might want to defer invoking semantic analysis as part of the parser, and instead put it in the hands of the IR lowering pass. If on the other hand, all of the information needed is already in the Program structure, then I suspect we can vastly simplify parts of the IR crate now.

grjte

This is a partial review. I'm not quite done going through the AST yet and haven't looked at tests/lexer/grammar changes, etc, so some of these comments/questions might be handled elsewhere, in which case just clarifying the comments a bit would help.

One question - so far it looks to me like we may have lost the functionality for applying selectors to constraints?

parser/src/ast/declarations.rs

parser/src/ast/statement.rs

parser/src/ast/expression.rs

parser/src/ast/declarations.rs

bitwalker · 2023-05-19T16:00:37Z

@grjte Support for constraint selectors is still there, just wrapped up in the ListComprehension type which does double duty for regular comprehensions as well as constraints with selectors, by representing those constraints as comprehensions over a range of 1, with the selector set to some expression. It's represented in the AST this way because the syntax for such constraints and list comprehensions is virtually identical, as are all the analyses/transformations, with a few extra validations performed when such comprehensions are part of a constraint. During lowering to the algebraic graph though, that abstraction is erased and it has the same form as before.

bobbinth · 2023-05-19T16:14:10Z

It's represented in the AST this way because the syntax for such constraints and list comprehensions is virtually identical

I may be missing some context here - but I think the syntax is quite different:

# list comprehension
enf s[i]' = s[i + 1] for i in 2..15

# selectors
match enf:
  noop([stack_top])                     when op_flags[0]
  eqz([stack_top, stack_helpers[2]])    when op_flags[1]

# or selectors with new proposed syntax
match:
  when op_flags[0]: noop([stack_top])
  when op_flags[1]: eqz([stack_top, stack_helpers[2]])

Am I missing something?

grjte · 2023-05-19T16:21:50Z

Support for constraint selectors is still there, just wrapped up in the ListComprehension type which does double duty for regular comprehensions as well as constraints with selectors, by representing those constraints as comprehensions over a range of 1, with the selector set to some expression.

Hm ok - I did see that it was in there. I'll look at this more to get my head around it as I finish the full AST & tests. I think I have a similar confusion to @bobbinth, that it doesn't feel natural to me to represent it as a list comprehension, but that may change once I've gotten through everything

bitwalker · 2023-05-19T16:35:35Z

@bobbinth It's important to note that the match construct is not present in this PR. In my staged changes that bring it back, it has its own AST representation because it is considerably different syntactically than "comprehension" constraints. This trick is only used to represent single constraints with selectors as if they were comprehension constraints with selectors so that they are treated uniformly.

bitwalker · 2023-05-19T16:40:52Z

I think I have a similar confusion to @bobbinth, that it doesn't feel natural to me to represent it as a list comprehension

Consider a portion of the current grammar (i.e. before this PR):

BoundaryConstraintExpr: BoundaryConstraint = {
    <column: SymbolAccess> "." <boundary: Boundary> "=" <value: BoundaryExpr>
        <comprehension: ConstraintComprehension<BoundaryExpr>?> =>

ConstraintComprehension<T>: ComprehensionContext = {
    <l:@L> "for" <members: Members> "in" <iterables: Iterables> <r:@R> =>? ...

ListComprehension<T>: ListComprehension = {
    <l:@L> <expr: T> "for" <members: Members> "in" <iterables: Iterables> <r:@R> =>? ...

The representation of these two things is virtually identical, but the body of the comprehension is defined in the BoundaryConstraint rule for ComprehensionConstraint.

grjte · 2023-05-19T16:44:57Z

The representation of these two things is virtually identical, but the body of the comprehension is defined in the BoundaryConstraint rule for ComprehensionConstraint.

Yeah, I'm sure it will be clearer once I get to the new grammar. I guess we essentially treat

enf a = a + 1

as

enf a = a + 1 for a in [a]

bitwalker · 2023-05-19T18:08:14Z

@grjte For simple constraints there is no need to wrap them in a comprehension, so enf a = a + 1 remains that way. Specifically we're representing two variants of constraints in comprehension form:

Actual constraint comprehensions, e.g. enf a = a + 1 for a in [a] when k0 (this example has a selector, but that's optional of course)
Simple constraints with selectors, e.g. enf a = a + 1 when k0

The latter is parsed into comprehension form, as if it had been written as enf a = a + 1 for i in 0..1 when k0.

The result is that we end up with two types of constraints to deal with: simple, and comprehension. When match is added back, we'll have a third type which represents its unique semantics.

grjte

This is another partial review. It covers almost everything except the parser test files and the semantic analysis, which I'll finish tomorrow. I think the new design makes sense and the changes look great overall. Most of my comments are fairly small ones regarding clarity. At this point I would say I don't anticipate that I'll have any major change requests for this PR, so small issues aside it should be pretty much ready to merge once I get through the rest of it

grjte · 2023-05-22T10:41:31Z

parser/src/ast/trace.rs

+    /// The name of this trace segment, e.g. `$main`
+    ///
+    /// NOTE: The name of a trace segment is always a special identifier (i.e. has the `$` prefix)


note for the future: I think we should get rid of the $ for the trace segment names, since they're not actually globally accessible. Personally, I also think we should let people name these segments whatever they want, and it will only be the order that determines the trace segment. main and aux are very Winterfell-/Miden-specific names

Definitely agree that main and aux should not be special in any way, but I think there is probably value in making the top-level trace segment names global/special identifiers, i.e. if you name a segment foo, you should be able to reference a specific column of foo directly as $foo[N]. That said, I suspect in practice such a feature wouldn't be used much if at all, so perhaps it is only useful when performing certain transformations on the AST internally in the compiler.

I guess my thinking was that a named trace segment foo is only accessible within the root module, whereas a named random values vector rand is actually globally accessible. Using foo[i] and $rand[i] distinguishes those things

To be clear about what I meant - your example is how things work with regards to module-local bindings, i.e. within the root module, references to bindings in the trace_columns declaration use the $-less identifier syntax, e.g. foo[i], as would referring to specific random_values bindings; but what I am suggesting is that within any module, you could always refer directly to the primary declaration (e.g. trace segment name) for both random_values and trace_columns segments using the $-prefixed form. These names are guaranteed to be unique, as we don't allow conflicts in top-level identifiers, and the $ prefix guarantees they are distinct from other identifiers within any given module, so disambiguating them from the compiler-side is trivial (this in fact happens automatically when constructing the map of identifiers and their "binding type" when starting semantic analysis). From the perspective of someone writing AirScript it is perhaps less clear, but it seems to me to be unlikely that there would be confusion about the name associated with random_values and the trace_columns segment names from a readability point of view - but that's naturally debatable.

I guess bottom line is that it is easy to support with how things work currently, and is useful for certain transformations on the AST. That said, we don't have to support it when actually writing AirScript, i.e. we disallow it when validating modules, but then allow it internally in the compiler. They are useful in that context because of how identifiers are scoped, it is desirable to have an escape hatch in the form of global/fully-qualified identifiers.

parser/src/ast/trace.rs

parser/src/ast/mod.rs

parser/src/lexer/mod.rs

parser/src/parser/grammar.lalrpop

grjte

This is another partial review which now mostly covers the parser tests. (Semantic analysis is still remaining)

One broad comment regarding the parser test macros - some of these are quite complex and I would feel more comfortable if we had a couple of tests to cover those ones. Adding some docs & comments would help as well for the macros with many cases

parser/src/ast/expression.rs

parser/src/ast/trace.rs

parser/src/ast/mod.rs

parser/src/parser/tests/utils.rs

parser/src/parser/tests/mod.rs

grjte · 2023-05-23T22:31:36Z

parser/src/parser/tests/mod.rs

+                name: ResolvableIdentifier::Local(ident!($name)),
+                access_type: AccessType::Default,
+                offset: 0,
+                ty: Some($ty),


We should never be able to have a non-Felt type for a bounded symbol access (although maybe this is for testing invalid cases?)

Yeah that can probably be hardcoded here. If it ever comes up that we need to specify arbitrary types for certain tests, we can handle that as an additional case.

Ah, I know why this exists at the moment - for tests which do not run through semantic analysis, the AST nodes are untyped, so we need a case that is untyped, and one that is typed, even if that type is always the same. So I made bounded_access mirror access here.

Makes sense. It might be clearer for the future to add a comment which says that

parser/src/parser/tests/mod.rs

grjte

This review completes all of the test files. There are a bunch of tests with invalid AirScript that was previously acceptable at the parsing stage and addressed in the IR (or in some cases not addressed yet, but with issues open). I've marked everything that is invalid or nonsensical, since most validation is now moving to the semantic analysis pass.

It probably makes sense to adjust the tests to valid AirScript during this PR, but the checks for each of these things could be part of the next PR(s) if you think that makes more sense.

grjte · 2023-05-25T12:18:26Z

parser/src/parser/tests/arithmetic_ops.rs

-    integrity_constraints:
+    mod test
+
+    ev test([clk]):


nit: we can remove the comment above this that says "// the operation must be put into a source section, or parsing will fail", since it was explaining why the integrity_constraints section was used, and that's no longer relevant. It can be removed everywhere in these test files.

grjte · 2023-05-25T12:21:43Z

parser/src/parser/tests/arithmetic_ops.rs

+            ident!(test),
+            vec![trace_segment!(0, "%0", [(clk, 1)])],
+            vec![enforce!(eq!(
+                exp!(access!(clk, 1), add!(access!(clk), int!(2))),


I don't think we should allow this anymore. I don't think there's any good reason for allowing non-constant exponents. It was previously allowed in the parser and blocked in the IR, but now that we're moving errors out of the IR and handling them during parsing & semantic analysis we should probably cover this case here.

grjte · 2023-05-25T12:37:05Z

parser/src/parser/tests/boundary_constraints.rs

+integrity_constraints:
+    enf clk = 0


nit: it doesn't matter for the parser, but this is a useless integrity constraint. If we actually had something like this, we could just delete the entire column (since it would just be a column of 0s). It would be more realistic to do clk' = clk + 1

grjte · 2023-05-25T12:40:38Z

parser/src/parser/tests/boundary_constraints.rs

+/// ```
+///
+/// This is used as a common base for most tests in this module
+fn test_module() -> Module {


nit: should we name this base_module so it's more obvious that it's just forming the AST of the BASE_MODULE? either that or maybe add a trivial test for equality between this and parsed BASE_MODULE? I know it's just a test, but I always like to avoid having things that are expected to be equivalent but which don't either depend on each other or assert that equivalence

grjte · 2023-05-25T13:12:15Z

parser/src/parser/tests/boundary_constraints.rs

-    ]);
-
-    ParseTest::new().expect_ast(source, expected);
+        enf x.first = 0 for x in inputs"


This isn't valid for a few reasons:

Public inputs are just vectors of values, e.g. [0, 1, 2, 3], so a notion of inputs[0].first is meaningless

Constraints must be enforced against trace columns, so enf inputs[0] = 2 is semantically like trying to do enf 1 = 2

The previous test was valid because it iterated over trace columns

grjte · 2023-05-25T14:25:30Z

parser/src/parser/tests/list_comprehension.rs

+        SourceSpan::UNKNOWN,
+        vec![
+            let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() =>
+                  enforce!(eq!(bounded_access!(a, Boundary::First), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),


note: x is undefined in this context, which was probably not intentional for this test

grjte · 2023-05-25T14:27:47Z

parser/src/parser/tests/list_comprehension.rs

+            let_!(x = lc!(((c, expr!(slice!(c, 0..3)))) => access!(c)).into() =>
+                   enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),


note: this shouldn't be valid, because the range syntax should not be inclusive, so x only has length 3

grjte · 2023-05-25T14:29:23Z

parser/src/parser/tests/list_comprehension.rs

+        SourceSpan::UNKNOWN,
+        vec![
+            let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() =>
+                  enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),


note: x is not defined in this context, which was probably not intended for this test. It was probably supposed to be diff

grjte · 2023-05-25T14:29:56Z

parser/src/parser/tests/list_comprehension.rs

+        vec![
+            let_!(diff = lc!(((w, range!(0..3)), (x, expr!(access!(b))), (y, expr!(slice!(c, 0..3))), (z, expr!(slice!(d, 0..3)))) =>
+                             sub!(sub!(add!(access!(w), access!(x)), access!(y)), access!(z))).into() =>
+                  enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),


note: same issue with x being undefined. This was probably supposed to be diff

grjte · 2023-05-25T14:36:42Z

parser/src/parser/tests/constant_propagation.rs

+    const B = [[1, 1], [2, 2]]
+
+    integrity_constraints:
+        enf test_constraint(b)


this should be enf test_constraint([b]) here and in the comment below. It shouldn't be valid to call an evaluator without the brackets that indicate the trace segment(s)

This commit introduces the following changes to the AST: 1. Change `Expr::Vector` to contain `Expr` elements rather than just `ScalarExpr` elements. This is necessary in order to properly represent the semantics of vectors in AirScript 2. Introduce `Statement::EnforceIf` for use during translation to represent individual scalar constraints with runtime selectors. We do not use this variant until inlining is performed to retain the benefits of the previous representation using `EnforceAll` 3. Introduce `Statement::Expr` for use during translation to represent statements used in an expression context. Specifically, when inlining is performed on a let-bound expression, the inlined expression may require the use of let-bound variables, so we need a way to represent such statements during the inlining transformation.

Now that the `air-parser` crate handles the bulk of the semantic analysis and transformation passes, most of what is in the `air-ir` crate is no longer needed, or has been superceded in some way. This commit removes the unnecessary parts, and reorganizes what remains to simplify the structure of the crate. NOTE: This commit does not update the tests, or include the AST to IR lowering pass. That will follow in subsequent commits.

bitwalker · 2023-06-22T21:04:10Z

@hackaugusto I've updated the air-codegen-masm crate as part of rebasing this PR on next before I merge it, and there are 5 broken tests which I think are just due to changes in what IR is produced from a given source program + optimizations (e.g. constant prop), but would like to verify.

The commit which modifies the MASM code generator is b1440c1, and the broken tests are:

codegen::masm::bitwise
codegen::masm::constants
codegen::masm::list_comprehension
codegen::masm::pub_inputs
codegen::masm::variables

I suspect we can just update the expected output to account for the changes to the IR lowering phase of the compiler, but don't want to do that without checking with you first. Let me know if you see any bugs.

@bobbinth @grjte This branch is now rebased against the latest code in next, so this is ready to merge once @hackaugusto signs off on updating those tests.

hackaugusto · 2023-06-22T21:33:39Z

Yeah, it is because of the new optimizations, it seems to be constants propagation:

subtraction by zero are gone
exponentiation of a constant by a constant are pre-computed
basic operations like add/sub/mul of constants are also pre-computed

and there seems to be a case of #314 in there.

Thank you for handling the rebase!

bitwalker · 2023-06-22T23:32:41Z

Thanks @hackaugusto! You're right that this changeset includes a patch that orders declarations by name in the intermediate representation, which was done for determinism purposes, but seems to also solve #314, so we may be able to close that out when this is merged

bitwalker added IR parser labels May 16, 2023

bitwalker requested review from grjte and tohrnii May 16, 2023 00:32

bitwalker self-assigned this May 16, 2023

bitwalker force-pushed the bitwalker/modules branch 7 times, most recently from 00ec1a0 to 517b684 Compare May 19, 2023 07:24

grjte reviewed May 19, 2023

View reviewed changes

bitwalker force-pushed the bitwalker/modules branch from 517b684 to 5c5985d Compare May 19, 2023 17:55

bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from ac5fc13 to d64f8c7 Compare May 19, 2023 18:32

grjte suggested changes May 22, 2023

View reviewed changes

bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from d0d4416 to eac0f51 Compare May 22, 2023 23:06

grjte reviewed May 23, 2023

View reviewed changes

bitwalker force-pushed the bitwalker/modules branch 2 times, most recently from daa45ad to 9cb3060 Compare May 24, 2023 17:18

grjte reviewed May 25, 2023

View reviewed changes

bitwalker added 17 commits June 22, 2023 15:23

feat: add utilities for pretty-printing ast

62544bb

chore: move type enums to their own module

d996872

chore: track type of random value and trace bindings

6d86d5f

chore: rename ModuleError to SemanticAnalysisError

f440738

fix: use btreemap for more deterministic ast diffs

4e15705

feat: add LexicalScope data structure

095a752

refactor: clean up semantic analysis implementation

e75347e

refactor: clean up constant propagation pass

516e00e

feat: implement inlining pass

e7d2a26

feat: add parse_file and parse_str helpers to parser api

59f5142

feat: implement ast to air lowering

471f7a0

chore: update air tests

f5a3f11

feat: add codegenerator trait to air-ir crate

6901c2c

chore: update codegen/winterfell

5a5b53a

chore: update codegen/masm

b1440c1

bitwalker force-pushed the bitwalker/modules branch from f552d16 to 23e98d1 Compare June 22, 2023 20:58

bitwalker added 5 commits June 22, 2023 23:38

chore: update air-script cli

6c0c357

fix(clippy): incorrect module doc comments

e67df58

fix(clippy): unnecessary into_iter call on range

8813a2d

fix(test): changes to implementation of ord for symbol

e5b2add

fix: incorrect result when indexing a slice access

f08b2f8

bitwalker force-pushed the bitwalker/modules branch from 23e98d1 to f08b2f8 Compare June 23, 2023 03:39

bitwalker merged commit 4332348 into next Jun 23, 2023

bitwalker deleted the bitwalker/modules branch June 23, 2023 08:13

bobbinth mentioned this pull request Jun 30, 2023

Remove size from TraceAccess #285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement modules #297

feat: implement modules #297

bitwalker commented May 16, 2023 •

edited

Loading

grjte left a comment

bitwalker commented May 19, 2023

bobbinth commented May 19, 2023

grjte commented May 19, 2023

bitwalker commented May 19, 2023 •

edited

Loading

bitwalker commented May 19, 2023 •

edited

Loading

grjte commented May 19, 2023

bitwalker commented May 19, 2023 •

edited

Loading

grjte left a comment

grjte May 22, 2023

bitwalker May 22, 2023

grjte May 23, 2023

bitwalker May 23, 2023

grjte left a comment

grjte May 23, 2023

bitwalker May 24, 2023

bitwalker May 24, 2023

grjte May 25, 2023

grjte left a comment

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

grjte May 25, 2023

bitwalker commented Jun 22, 2023

hackaugusto commented Jun 22, 2023 •

edited

Loading

bitwalker commented Jun 22, 2023

		let_!(x = lc!(((c, expr!(slice!(c, 0..3)))) => access!(c)).into() =>
		enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))),

feat: implement modules #297

feat: implement modules #297

Conversation

bitwalker commented May 16, 2023 • edited Loading

grjte left a comment

Choose a reason for hiding this comment

bitwalker commented May 19, 2023

bobbinth commented May 19, 2023

grjte commented May 19, 2023

bitwalker commented May 19, 2023 • edited Loading

bitwalker commented May 19, 2023 • edited Loading

grjte commented May 19, 2023

bitwalker commented May 19, 2023 • edited Loading

grjte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grjte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grjte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitwalker commented Jun 22, 2023

hackaugusto commented Jun 22, 2023 • edited Loading

bitwalker commented Jun 22, 2023

bitwalker commented May 16, 2023 •

edited

Loading

bitwalker commented May 19, 2023 •

edited

Loading

bitwalker commented May 19, 2023 •

edited

Loading

bitwalker commented May 19, 2023 •

edited

Loading

hackaugusto commented Jun 22, 2023 •

edited

Loading