Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add unstable formatter #10971

Closed
wants to merge 58 commits into from
Closed

Conversation

JCWasmx86
Copy link
Contributor

@JCWasmx86 JCWasmx86 commented Oct 31, 2022

Opening this for eventual discussion.

This is a draft.

Things that are at least done partially:

  • Test cases (Albeit they will be extended over the time)
  • Release note snippet
  • Added to commands.md
  • Add ParenthesizedNode in order to be able to reconstruct the AST. The other tests pass, so I assume there will be no fallout for other components.
  • Formatting seems to be idempotent, e.g. format(format(code)) seems to give the same output as format(code) in around 90%
  • The formatting attempts to approximate the formatting of muon, but does not match 100% (More like 80%)
  • 94% code coverage for formatter.py

Things that aren't good at the moment:

  • Readding comments is especially in weird locations quite spotty.
  • The maximum line length is not a hard limit, e.g. if you have a limit of 80, some lines can sometimes reach around 85-87 chars.

Important things before merging:

  • Remove all unrelated output like the comments that couldn't be readded or the warning that it may eat your comments

Important non-code things before merging:

  • Cleanup git history

And maybe setup a CI-workflow that downloads some big meson files (Like from mesa, GNOME-Builder) to check for these things:

  • That all comments are readded
  • Idempotency
  • No invalid code is generated

@tristan957
Copy link
Contributor

It might help to split some of these commits into their own PRs. For instance, the parentesized node stuff

@tristan957
Copy link
Contributor

Probably also logically squashing commits would help too

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 2 alerts when merging 6a0d258 into 0ddca4d - view on LGTM.com

new alerts:

  • 1 for Unused import
  • 1 for Implicit string concatenation in a list

@JCWasmx86 JCWasmx86 force-pushed the master branch 2 times, most recently from be044f5 to 1a6b24d Compare October 31, 2022 15:43
@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 1 alert when merging 1a6b24d into 0ddca4d - view on LGTM.com

new alerts:

  • 1 for Unused import

@tristan957
Copy link
Contributor

What is missing to help put comments in the right places?

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 1, 2022

It's complicated, I can pinpoint it to exactly one thing. A bit how the comment-readding heuristics work:

Basically everything are CodeBlockNodes. At the start of each CodeBlock, I check, whether there are some comments before, e.g.

# Foo
# Bar
x = 5
y = 5

Then for every line, I check, whether there is a comment block above

x = 5
# FooBar
y = 6

and whether a comment is inline and after a line

y = 6 # FooBar

And after the codeblock I check if there is a trailing code block.

y = 6
# Foo
# Bar
<EOF>

This works as long as you do normal stuff and don't put comments in weird locations like this one: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/meson.build#L540
as the comment is in the middle of an expression and I currently don't break up many expressions. The comments below are readded, because visit_ArgumentsCall is probably adding those

But at least it is the only comment remove in this big meson file

@tristan957
Copy link
Contributor

So basically just more checks for comments is all that is missing?

@JCWasmx86
Copy link
Contributor Author

A bit simplified, but yes, that's correct.

@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2022

This pull request introduces 1 alert when merging 5522fe3 into 97ec20e - view on LGTM.com

new alerts:

  • 1 for Unused import

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 12, 2022

Does it make sense to add a CI pass that does the following:

  1. Clone a few big repos, because more code means more coverage of existing constructs (E.g. systemd, mesa)
  2. Reformat all meson files + All comments must be readded
  3. Check that all files are parseable
  4. Check that it configures (Optional)
  5. Reformat all meson files and assert that they didn't change

As the formatter is platform-independent, it just would have to run on e.g. linux and e.g. only if the file formatter.py was changed

@lgtm-com
Copy link

lgtm-com bot commented Nov 12, 2022

This pull request introduces 1 alert when merging 4be7fae into 611bd4e - view on LGTM.com

new alerts:

  • 1 for Unused import

@tristan957
Copy link
Contributor

@annacrombie what are your thoughts here?

@annacrombie
Copy link
Contributor

I just checked and muon also deletes comments after a opening paren, e.g.

a = ( # comment
1
)

This is impossible handle with the strategy both muon and this PR use because you only have so many nodes to attach comments to, but you can comment anywhere inside a parenthesized expression. For example, this expression:

(a = (1+2))

corresponds roughly to the following AST:

  paren
    |
assignment
  /   \
id     paren
         |
        plus        
        / \
       1   2

Which gives us 7 nodes to attach comments to.

But it is entirely possible to write:

( # comment 1
# comment 2
# comment 3
a # comment 4
= # comment 5
# comment 6
( # comment 7
1 # comment 8
+ # comment 9
2 # comment 10
) # comment 11
# comment 12
) # comment 13

Which already greatly exceeds the amount of slots we have, and this example could technically have an unlimited amount of comments inside the () anyway.

muon partially solves this problem for lists, by adding another special formatting node fmt_eol, which keeps newlines inside of [], {}, and () expressions (the standard parser throws them away), and special code in the list parser to handle them. Extending this handling to the general parser so that newlines could be handled anywhere inside a statement could be difficult, I'm not sure.


Unrelated, but @JCWasmx86, how would you feel about renaming some of the formatting options, e.g. kwa_ml? I find it isn't a very intuitive name and would be happy to change it in muon.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 2 alerts when merging a3be218 into 8dfa550 - view on LGTM.com

new alerts:

  • 1 for Unused import
  • 1 for Nested loops with same variable

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 15, 2022

Thanks a lot for your thoughts.
With those new commits (I will clean it up, as soon as #10972 is reviewed+merged, I was able to find those numbers:

Project name Number of comments Number of lost comments Percentage of lost comments
systemd 340 0 0.00%
mesa 3931 1 0.025%
gnome-builder 335 0 0.00%
gstreamer 1045 6 (*) 0.57%
gtk 273 1 0.36%
glib 619 3 0.48 %

As you can see from these huge meson-code bases (Assuming a lot of syntax is used), we only have a quite low error rate. For gstreamer, 36 of those are from one bug, the other 6 are more difficult to solve

(*) Those were 42, but I fixed a bug, so only 6 comments are lost now.

muon partially solves this problem for lists, by adding another special formatting node fmt_eol, which keeps newlines inside of [], {}, and () expressions (the standard parser throws them away), and special code in the list parser to handle them. Extending this handling to the general parser so that newlines could be handled anywhere inside a statement could be difficult, I'm not sure.

I have a lot of special cases, too in order to improve the formatting output, but I use more some heuristics and a bit of praying. But interesting approach, I will look into it.

But it is entirely possible to write:

I would say we shouldn't optimize for the 0.001% of insane code. Sure you could consider it a bug, but at the end you probably invest a few hours/days to have a similar formatting for this one insane case, while using this time would have benefited a huge majority. I think those really-really-edge cases aren't important right now.

Unrelated, but @JCWasmx86, how would you feel about renaming some of the formatting options, e.g. kwa_ml? I find it isn't a very intuitive name and would be happy to change it in muon.

Yes, just go ahead, I will follow you :)

@JCWasmx86
Copy link
Contributor Author

And in addition to that, I did a few experiments regarding idempotency. After maximum 3 iterations every project i have tested did converge to one formatting, so it would be probably wise to do something like:

lines = ...
for i in range(0, MAX_ITERATIONS):
    new_lines = format(lines)
    if equal(lines, new_lines):
       break
    lines = new_lines

Sure it would cost a bit of performance, but I think that it is worth it

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert when merging 4fa74cb into 8dfa550 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 1 alert when merging b81eb65 into 8ee4660 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 1 alert when merging 910c7c2 into 8ee4660 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 1 alert when merging f0ed332 into bfc8132 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@codecov
Copy link

codecov bot commented Nov 16, 2022

Codecov Report

Merging #10971 (bfc8132) into master (8ee4660) will not change coverage.
The diff coverage is 100.00%.

❗ Current head bfc8132 differs from pull request most recent head df4d72a. Consider uploading reports for the commit df4d72a to get more accurate results

@@           Coverage Diff           @@
##           master   #10971   +/-   ##
=======================================
  Coverage   68.58%   68.58%           
=======================================
  Files         412      412           
  Lines       87861    87861           
  Branches    20728    20728           
=======================================
  Hits        60261    60261           
  Misses      23093    23093           
  Partials     4507     4507           
Impacted Files Coverage Δ
mesonbuild/interpreter/interpreter.py 84.58% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 1 alert when merging df4d72a into bfc8132 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@JCWasmx86
Copy link
Contributor Author

JCWasmx86 commented Nov 16, 2022

Add a script for automatically formatting a project, counting the number of lost comments and how long it takes to converge to one formatting, example results:

systemd has 340, lost 0 during formatting (0.0%)
systemd needed 2 formatting passes to be stable
mesa has 3931, lost 1 during formatting (0.025%)
mesa needed 3 formatting passes to be stable
gnome-builder has 335, lost 0 during formatting (0.0%)
gnome-builder needed 2 formatting passes to be stable
gstreamer has 1338, lost 6 during formatting (0.448%)
gstreamer needed 2 formatting passes to be stable
gtk has 273, lost 1 during formatting (0.366%)
gtk needed 2 formatting passes to be stable
glib has 619, lost 3 during formatting (0.485%)
glib needed 2 formatting passes to be stable
Found 6836 comments, but lost 11 during formatting (0.161%)

This shows that probably after five passes the vast majority of files won't change

@tristan957
Copy link
Contributor

I think it would help to get this reviewed if the commit history was cleaned up a bit.

@JCWasmx86
Copy link
Contributor Author

Hey, I still need to fully implement the approach described here: #10971 (comment) But yes, I can clean up the history (Albeit I play with the thought to make just one/two commits of it ^^)

@bruchar1
Copy link
Member

bruchar1 commented Mar 2, 2023

@JCWasmx86 I had a similar idea. However, my approach has subtle differences with yours that I think could solve or simplify some of your problems:
1- I return a Token object for comments as well
2- There is a Node object for standalone comments (i.e. comments on their own line outside any other nodes)
3- Every Node can have a comment attached to it

Therefore, if you have something like this:

# this belong to a CommentNode
function( # comment belongs to args node
    posarg, # comment belongs to the posarg node (e.g. stringnode)
    kw: kwargvalue,  # comment belongs to the kwargvalue
)  # comment belongs to the function node

when the printer encounters a comment, it prints it and breaks the line.

since comments are added to the nodes by the parser, it should not be possible to have a comment in a wrong place. But since every line is represented by a node, there should be no places where comments are not possible (except between args (but we could allow a CommentNode as a pos arg) and between kwargs (not sure how to handle this...)

My proof of concept is here: 72c3773

@JCWasmx86
Copy link
Contributor Author

Interesting idea, albeit I don't know how well it would handle e.g. this file https://git.sr.ht/~lattis/muon/tree/master/item/tests/fmt/crazy_comments.meson

But after a few more thoughts I doubt the sense of a second formatter. There is already muon - working perfectly- , so why have a second formatter, that while approximating the formatting of muon, will probably never reach equivalent output for each possible input?

@bruchar1
Copy link
Member

bruchar1 commented Mar 3, 2023

But after a few more thoughts I doubt the sense of a second formatter. There is already muon - working perfectly- , so why have a second formatter, that while approximating the formatting of muon, will probably never reach equivalent output for each possible input?

I see the formatter itself as only one possible application of having a complete representation of the meson.build file in AST.

Another important application, imho, is to allow the rewriter to keep comments in modified nodes, and in a more general way, to allow writing script for generating and modifying meson.build files.

@tristan957
Copy link
Contributor

I still think there is value in Meson having a formatter, but obviously don't waster your time on it if you don't see value. Until the entire world moves to muon, having to have 2 developer tools kind of sucks.

@JCWasmx86
Copy link
Contributor Author

Superseded by #12318

@JCWasmx86 JCWasmx86 closed this Oct 3, 2023
@JCWasmx86 JCWasmx86 mentioned this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants