Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new AST node Group in the expression language, that represents value in parenthesis #214

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Nov 19, 2020

This change fixes https://github.com/kaitai-io/kaitai_struct_tests/blob/master/formats/expr_ops_parens.ksy test
for Java and, I'm sure, for all other targets

@generalmimon
Copy link
Member

generalmimon commented Nov 19, 2020

Unfortunately, I have to disagree with this one, please see kaitai-io/kaitai_struct#69.

Adding "force parentheses" operators seems to be a misunderstanding how AST works. These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.

@generalmimon generalmimon requested a review from GreyCat November 19, 2020 23:10
@Mingun
Copy link
Contributor Author

Mingun commented Nov 20, 2020

I don't think so. First of all, group node can be used to attach positional information if that even will be implemented (and I think it should if we want to produce nice error messages). Positional information is different for

(1+2)

and

(      2+3     )

So you can't ignore that node.

@Mingun
Copy link
Contributor Author

Mingun commented Nov 20, 2020

These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.

Just the opposite -- the group node accurately laided out to the expression language structure. The kaitai-io/kaitai_struct#69 just highlight problem which is gracefully and simply solved by this node.

Adding "force parentheses"

On the contrary, without a group node, you are forced to insert brackets even where they are not needed, simply because you cannot distinguish between places where they are needed and those where they are not needed. Example:

// X * Y
BinOp(X, Mul, Y)

Should you wrap X and Y with parenthesis when generate text? This depends heavily on the X and Y expressions. If it something like IntNum, then parenthesis is not required, but if that is BinOp(..., Add, ...) they are definitely needed. So, BinOp should not wrap their children into parenthesis.

Then, maybe, need just to wrap BinOp itself? But then you'll get unnecessary parenthesis for

// ((a ? b) ? ...)
BinOp(BinOp(...), ...)

which you are trying to avoid. So, you can't add parenthesises neither inside BinOp, nor outside without doing some tricks for determine actual expression type of X and Y. You can't just delegate that work to the children, because they'll face with the same problem.

So you need to decide types right and here. In each place where you process such expression. Not scalable.

Group node is the baked result of such computations. You can infer it automatically from more human-like AST representation, like

// (1 + 2) * (3 + 4)
BinOp(
  BinOp(IntNum(1), Add, IntNum(2))
  Mul,
  BinOp(IntNum(3), Add, IntNum(4))
)

and then always work only with that normalized representation. Another question, is you will so often create AST by hands?... You should create it when you parse text. There is no reason to drop this node.

I'm also not great fan of ladders of parenthesises, so you can see that I try to remove pathological cases by merging Group nodes in things like

(((1) + (((2 - 3)) * 4)))

to something like

((1) + ((2 - 3) * 4))

Of course it is not ideal (but right now I'm even not see if tests are running, and from my local running I see 100 failed tests on slightly tuned master), but this can be improved in the future. Anyway, this is generated code which is hardly ever even read by humans (if they not looking for bugs)

Copy link
Member

@GreyCat GreyCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I concur with @generalmimon: adding parenthesis into AST tree is fundamentally wrong.

KS expression language has certain operator precedence order, and target languages have their own precedence order. In theory, target language might not have any concept of parenthesis at all — e.g. assembler-style languages don't have them, instead spelling out individual operations one by one in correct order.

Another example is C: you would routinely will have expressions broken down into multiple lines, e.g. when dealing with string operations with strcat / strcpy.

@Mingun
Copy link
Contributor Author

Mingun commented Nov 22, 2020

You are wrongly think that new node is the "parenthesis node". This is not true. It's a Group node which is represented as expression in parenthesis in many languages including Kaitai expression language. It is naturally present in the AST, and trying to pretend that it doesn't exist only complicates things. Each time as you produce parenthesis in the AbstractTranslator descendants you are using that node but in complicated and implicit form which is much harder to understand. I bet that this led to the half of errors in the tests

KS expression language has certain operator precedence order, and target languages have their own precedence order.

This is only representation. AST in both cases will be the same. Some nodes may be redundant, but this should be handled in the translator (if you should worry about it at all. If you want a beautiful readable source after generation, then it is easier to use the means of automatic formatting of the target language)

In theory, target language might not have any concept of parenthesis at all — e.g. assembler-style languages don't have them, instead spelling out individual operations one by one in correct order.

Again, this is specific representation that very weakly related to AST structure. Remember, AST is Abstract Syntax Tree. By default Group node renders as expression in parenthesizes but you are not forced to use that representation. Of course, I had no chance to check that generated sources in all languages is correct because I even couldn't see test output, but direction is right. Now I know how to see test output and can check this.

Another example is C: you would routinely will have expressions broken down into multiple lines, e.g. when dealing with string operations with strcat / strcpy.

This is orthogonal example: you should split out expressions into statements if in the target language some Kaitai expressions represented as statements. This is not related to the Group node

@Mingun
Copy link
Contributor Author

Mingun commented Nov 23, 2020

Anyway, you feel free to close this PR is you think that Group node shouldn't exist

@Mingun
Copy link
Contributor Author

Mingun commented May 13, 2021

Ping. If none of you have changed your mind, then it is better to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants