Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match raw pretty printing of optional_else and indented_block #51

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

mmcloughlin
Copy link

Currently optional_else AST is printed in the format (else ...) which
is inconsistent with the format used for other comparable blocks such as
indented_block.

This PR changes the format to directly invoke pp_raw_indented_block.

@katrinafyi
Copy link
Member

katrinafyi commented Mar 8, 2024

Thanks! This is a notable inconsistency in the pp-raw output.

If it's not too much trouble, do you have an example of the new format, in particular with chained else-if cases? Instead of finding an example opcode, it is probably easiest to manually construct a stmt list to test the method.

In any case, I will take a closer look next week!

@mmcloughlin
Copy link
Author

I just saw you have a grammar for parsing this AST output, and it accounts for this else syntax:

https://github.com/UQ-PAC/bil-to-boogie-translator/blob/b6a2409f60b977985ee0c97959cdcfe04bdb35f0/src/main/antlr4/Semantics.g4#L22

For what it's worth, in my fork I additionally changed the raw dump such that it wraps these blocks in braces { stmt+ } rather than the current [ stmt+ ]. This was slightly simpler to parse (at least the way I was doing it) because it distinguishes between arrays and blocks of statements.

If it's not too much trouble, do you have an example of the new format, in particular with chained else-if cases? Instead of finding an example opcode, it is probably easiest to manually construct a stmt list to test the method.

I can provide an example of the new else output. I'm not sure I've seen an example of else if yet. The ASL grammar handles else if a little strangely. Your AST grammar has a comment about this too:

https://github.com/UQ-PAC/bil-to-boogie-translator/blob/b6a2409f60b977985ee0c97959cdcfe04bdb35f0/src/main/antlr4/Semantics.g4#L14

@katrinafyi
Copy link
Member

I'm not sure I've seen an example of else if yet.

The partial evaluation translates else-if into nested if/else statements, which is why you won't see it in disassembly results.

@mmcloughlin
Copy link
Author

The partial evaluation translates else-if into nested if/else statements, which is why you won't see it in disassembly results.

Ah! Yes that would explain it.

For what it's worth, in my fork I additionally changed the raw dump such that it wraps these blocks in braces { stmt+ } rather than the current [ stmt+ ]. This was slightly simpler to parse (at least the way I was doing it) because it distinguishes between arrays and blocks of statements.

@katrinafyi Would you be okay with me putting up a PR for this change ^ also?

@katrinafyi
Copy link
Member

katrinafyi commented Mar 11, 2024

I have tested it with the following code and output:

    let t = [Stmt_Assert (Expr_LitInt "0", Unknown); Stmt_TCall (FIdent ("asdf", 0), [], [], Unknown)] in
    let els = [S_Elsif_Cond (Expr_LitInt "2", t); S_Elsif_Cond (Expr_LitInt "3", t)] in
    let s = [Stmt_If (Expr_LitInt "1", t, els, t, Unknown)] in
    List.iter (fun x -> print_endline @@ Utils.to_string @@ Asl_parser_pp.pp_raw_stmt x) s;
Stmt_If(Expr_LitInt("1"),[            
Stmt_Assert(Expr_LitInt("0"))
Stmt_TCall(asdf.0,[],[])
],[(S_Elsif_Cond(Expr_LitInt("2"),[
Stmt_Assert(Expr_LitInt("0"))
Stmt_TCall(asdf.0,[],[])
]));(S_Elsif_Cond(Expr_LitInt("3"),[
Stmt_Assert(Expr_LitInt("0"))
Stmt_TCall(asdf.0,[],[])
]))],[
Stmt_Assert(Expr_LitInt("0"))
Stmt_TCall(asdf.0,[],[])
])

This looks to be reasonable, it just replaces (else STMTS) with [ STMTS ]. When the else stmt list is empty, it does leave a hole but we can fix that later.

As we do rely on this format, do you mind retargeting this PR to the newly-created aslt-changes branch? I'll bundle this up with some more changes we've wanted to make, then merge them when the ANTLR grammars have been updated.

it distinguishes between arrays and blocks of statements. [...] Would you be okay with me putting up a PR for this change ^ also?

We don't have syntax for array literals (?) so I'm not sure where this could be confused. In any case, I'd prefer to keep the aslt format minimal and not add another set of brackets.

@mmcloughlin mmcloughlin changed the base branch from partial_eval to aslt-changes March 11, 2024 03:50
@mmcloughlin
Copy link
Author

As we do rely on this format, do you mind retargeting this PR to the newly-created aslt-changes branch? I'll bundle this up with some more changes we've wanted to make, then merge them when the ANTLR grammars have been updated.

Done.

We don't have syntax for array literals (?) so I'm not sure where this could be confused. In any case, I'd prefer to keep the aslt format minimal and not add another set of brackets.

I have changed my parser so I don't need this anymore. My new approach has context to know which type of list to expect at each argument position.

My approach before was a two stage parser. In the first stage I parsed the ASLT into generic nodes, and the second stage converted from nodes into Stmt, Expr, ... types. If you do it this way, it's problematic that the current format has two types of delimited lists. One type is inline lists delimited by semicolons [<item>;<item>;...] (for example expression lists), and the other is statement blocks [\n<stmt>\n<stmt>\n]. If the parser doesn't have context on which of the two to expect (if you are parsing as generic nodes) then this is a problem. It can be resolved by changing statement blocks to {\n<stmt>\n<stmt>\n}. Anyway, it's not important now, just explaining why I suggested this in the first place.

Thanks.

@mmcloughlin
Copy link
Author

If you are working on the ASLT format, another annoyance that might be nice to fix up is redundant wrapping parentheses. You see things like: Expr_TApply(add_bits.0,[(Expr_LitInt("64"))],... where the (Expr_LitInt("64")) is wrapped in parens unnecessarily.

My guess is this has something to do with the wrapper sym type, where each symbolic value is wrapped in Val ... or Exp ...?

Anyway, this really isn't that important because it can be trivially fixed up in the parser. But it is a little odd, that's all.

@katrinafyi
Copy link
Member

Thanks!

(if you are parsing as generic nodes) then this is a problem.

I see, that is fair and we would like to support this kind of generic parser. You could say lists are either whitespace or semicolon separated, but that is certainly less preferred. The changes we plan to make will be aimed at making it easier to parse and reducing ambiguities.

another annoyance that might be nice to fix up is redundant wrapping parentheses.

I can look at this as well. (I don't think it's related to sym since that doesn't appear in the printed syntax tree.)

@katrinafyi katrinafyi merged commit 9a5d887 into UQ-PAC:aslt-changes Mar 11, 2024
1 check passed
@katrinafyi katrinafyi mentioned this pull request Mar 11, 2024
9 tasks
katrinafyi pushed a commit that referenced this pull request Mar 12, 2024
* pp-raw for optional_else uses same format as indented_block

* call pp_raw_indented_block from optional_else
katrinafyi pushed a commit that referenced this pull request Mar 18, 2024
* pp-raw for optional_else uses same format as indented_block

* call pp_raw_indented_block from optional_else
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants