MassaLabs: Allow named constants range notation #355

Leo-Besancon · 2024-08-13T12:22:41Z

Draft PR for #266

I wanted some input for the design there @bitwalker.
The main issue is that AirScript aggregate types contain the size in their definitions, and so we don't know the type of ranges containing named constants until we resolve their value.
Moreover, the semantic analysis pass expects to know the various sizes.
For example, while checking list comprehension, we check that all iterator types are the same. Some code use placeholder (u32::MAX as usize) when the size is unknown, (I think this is currently unused), but I'm thinking of two options:

Either try and deduce the expected type or use the placeholder if needed for the semantic analysis pass. Then, during the constant propagation pass, we resolve the named constant and update the type (perhaps also make additional checks that
we couldn't do during semantic analysis without knowing the full type). I tried to do this at first but it seemed a bit ugly
Or we resolve the constant value before / during semantic analysis (done in this draft PR)

What are your thoughts? Is there anything else you think I'm missing?

TODO:

Improve the constant value resolution (for ranges) during semantic analysis
Improve robustness (e.g various commented sections / todo!() that should become either become unreachable or better handled)
Parser tests rely on PartialEq, and there is some inconsistency between ranges that have a resolved type and the others. As a workaround I changed the PartialEq for RangeExpr to not check the type.
Clean commit history
Add documentation

bitwalker · 2024-08-15T21:49:15Z

What are your thoughts? Is there anything else you think I'm missing?

@Leo-Besancon I needed to get my head back in the gritty details a bit, so I ended up prototyping a solution for this in my bitwalker/range-constants branch. Feel free to build or riff off of it as you see fit.

My implementation is based on the assumption that named constants can be replaced with the actual values very early during compilation. This is because we have no reason to preserve the use of the name, and everywhere else relies on being able to access the actual value (as you've discovered). So this is exactly what I've implemented in my branch - after parsing, but during the initial visit of a module during semantic analysis, any time an expression is evaluated, we are first resolving all symbols of that expression, so we take that opportunity to ensure that references to constant identifiers are valid. Then, in the case of ranges, the next thing we do is visit the range bounds, and after resolving the symbol, rewrite the bound with the actual constant that was referenced, or raise an error if the reference was invalid (either the symbol doesn't exist, isn't a constant, or the value is not a valid constant scalar value).

Then in all subsequent parts of the pipeline, it is safe to assume that range bounds are always constant, and panic if that assumption is violated.

The solution is a bit hacky, but largely because we are using the AST as an IR as well, so we end up having to encode a lot of information in the same data structure, which lends to it feeling hacky. Once we have a proper IR in place, the IR can assume all references to named constants are reified and things become a lot cleaner.

Let me know if you have further questions, or run into any issues with your implementation!

Leo-Besancon · 2024-08-16T14:36:14Z

What are your thoughts? Is there anything else you think I'm missing?

@Leo-Besancon I needed to get my head back in the gritty details a bit, so I ended up prototyping a solution for this in my bitwalker/range-constants branch. Feel free to build or riff off of it as you see fit.

My implementation is based on the assumption that named constants can be replaced with the actual values very early during compilation. This is because we have no reason to preserve the use of the name, and everywhere else relies on being able to access the actual value (as you've discovered). So this is exactly what I've implemented in my branch - after parsing, but during the initial visit of a module during semantic analysis, any time an expression is evaluated, we are first resolving all symbols of that expression, so we take that opportunity to ensure that references to constant identifiers are valid. Then, in the case of ranges, the next thing we do is visit the range bounds, and after resolving the symbol, rewrite the bound with the actual constant that was referenced, or raise an error if the reference was invalid (either the symbol doesn't exist, isn't a constant, or the value is not a valid constant scalar value).

Then in all subsequent parts of the pipeline, it is safe to assume that range bounds are always constant, and panic if that assumption is violated.

The solution is a bit hacky, but largely because we are using the AST as an IR as well, so we end up having to encode a lot of information in the same data structure, which lends to it feeling hacky. Once we have a proper IR in place, the IR can assume all references to named constants are reified and things become a lot cleaner.

Let me know if you have further questions, or run into any issues with your implementation!

Hello, perfect, thank you for answering my question in such a detailed manner! It helps clear things up and aligns with what I thought was the more realistic approach.
I see some things in your code that's certainly cleaner than what I was trying to do, so I'll build on some of it.

This makes it possible to reference named constants in a range, or slice indexing operation. NOTE: The named constants are rewritten with the values they refer to early during compilation, so in most of the pipeline you will never be able to observe anything but constant range bounds. However, rather than complicate the AST further by making that distinction explicit, we reuse a single type for all ranges, which means we have a lot of apparently-fallible operations in places where they are actually infallible, but this seems like a decent compromise.

Fix doc

Fix example.air after 0xPolygonMiden#353

bitwalker

LGTM! Definitely a handy improvement!

bitwalker · 2024-09-04T03:08:46Z

@bobbinth Let me know if you still plan to review this, otherwise I'll try to get this merged tomorrow

bobbinth

Looks good! Thank you! Though this is a pretty shallow review from me.

Leo-Besancon · 2024-09-18T07:52:57Z

Hello @bitwalker did you want to merge this one? :)

bitwalker · 2024-09-18T18:08:43Z

@Leo-Besancon Sorry about that, forgot to follow up on this, thanks for the ping!

sydhds and others added 7 commits August 22, 2024 09:15

feat: use semicolons as statement terminators (0xPolygonMiden#353)

2cadf30

feat: add error handling to range conversion

eb8686b

chore: refactor, add comments and remove unused code

b8bae19

test: add constant_in_range test

f312232

docs: add constant in range example

e619913

Fix doc

fix: fix example and update documentation following 0xPolygonMiden#353

0a7796c

Fix example.air after 0xPolygonMiden#353

Leo-Besancon force-pushed the allow-named-constants-range-notation branch from 20e2ea4 to 0a7796c Compare August 22, 2024 07:16

Merge branch 'next' into allow-named-constants-range-notation

13f34f6

Leo-Besancon marked this pull request as ready for review August 22, 2024 07:45

bobbinth requested review from bobbinth and bitwalker August 25, 2024 18:50

bitwalker approved these changes Sep 4, 2024

View reviewed changes

bobbinth approved these changes Sep 4, 2024

View reviewed changes

bitwalker merged commit 900a80a into 0xPolygonMiden:next Sep 18, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MassaLabs: Allow named constants range notation #355

MassaLabs: Allow named constants range notation #355

Leo-Besancon commented Aug 13, 2024 •

edited

Loading

bitwalker commented Aug 15, 2024 •

edited

Loading

Leo-Besancon commented Aug 16, 2024

bitwalker left a comment

bitwalker commented Sep 4, 2024

bobbinth left a comment

Leo-Besancon commented Sep 18, 2024

bitwalker commented Sep 18, 2024

MassaLabs: Allow named constants range notation #355

MassaLabs: Allow named constants range notation #355

Conversation

Leo-Besancon commented Aug 13, 2024 • edited Loading

bitwalker commented Aug 15, 2024 • edited Loading

Leo-Besancon commented Aug 16, 2024

bitwalker left a comment

Choose a reason for hiding this comment

bitwalker commented Sep 4, 2024

bobbinth left a comment

Choose a reason for hiding this comment

Leo-Besancon commented Sep 18, 2024

bitwalker commented Sep 18, 2024

Leo-Besancon commented Aug 13, 2024 •

edited

Loading

bitwalker commented Aug 15, 2024 •

edited

Loading