Draft: Add unstable formatter #10971

JCWasmx86 · 2022-10-31T15:27:53Z

Opening this for eventual discussion.

This is a draft.

Things that are at least done partially:

Test cases (Albeit they will be extended over the time)
Release note snippet
Added to commands.md
Add ParenthesizedNode in order to be able to reconstruct the AST. The other tests pass, so I assume there will be no fallout for other components.
Formatting seems to be idempotent, e.g. format(format(code)) seems to give the same output as format(code) in around 90%
The formatting attempts to approximate the formatting of muon, but does not match 100% (More like 80%)
94% code coverage for formatter.py

Things that aren't good at the moment:

Readding comments is especially in weird locations quite spotty.
The maximum line length is not a hard limit, e.g. if you have a limit of 80, some lines can sometimes reach around 85-87 chars.

Important things before merging:

Remove all unrelated output like the comments that couldn't be readded or the warning that it may eat your comments

Important non-code things before merging:

Cleanup git history

And maybe setup a CI-workflow that downloads some big meson files (Like from mesa, GNOME-Builder) to check for these things:

That all comments are readded
Idempotency
No invalid code is generated

tristan957 · 2022-10-31T15:34:14Z

It might help to split some of these commits into their own PRs. For instance, the parentesized node stuff

tristan957 · 2022-10-31T15:34:41Z

Probably also logically squashing commits would help too

lgtm-com · 2022-10-31T15:38:40Z

This pull request introduces 2 alerts when merging 6a0d258 into 0ddca4d - view on LGTM.com

new alerts:

1 for Unused import
1 for Implicit string concatenation in a list

lgtm-com · 2022-10-31T16:01:06Z

This pull request introduces 1 alert when merging 1a6b24d into 0ddca4d - view on LGTM.com

new alerts:

1 for Unused import

tristan957 · 2022-11-01T16:05:31Z

What is missing to help put comments in the right places?

JCWasmx86 · 2022-11-01T16:16:51Z

It's complicated, I can pinpoint it to exactly one thing. A bit how the comment-readding heuristics work:

Basically everything are CodeBlockNodes. At the start of each CodeBlock, I check, whether there are some comments before, e.g.

# Foo
# Bar
x = 5
y = 5

Then for every line, I check, whether there is a comment block above

x = 5
# FooBar
y = 6

and whether a comment is inline and after a line

y = 6 # FooBar

And after the codeblock I check if there is a trailing code block.

y = 6
# Foo
# Bar
<EOF>

This works as long as you do normal stuff and don't put comments in weird locations like this one: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/meson.build#L540
as the comment is in the middle of an expression and I currently don't break up many expressions. The comments below are readded, because visit_ArgumentsCall is probably adding those

But at least it is the only comment remove in this big meson file

tristan957 · 2022-11-01T16:40:49Z

So basically just more checks for comments is all that is missing?

JCWasmx86 · 2022-11-01T16:41:49Z

A bit simplified, but yes, that's correct.

lgtm-com · 2022-11-05T20:22:17Z

This pull request introduces 1 alert when merging 5522fe3 into 97ec20e - view on LGTM.com

new alerts:

1 for Unused import

JCWasmx86 · 2022-11-12T19:02:48Z

Does it make sense to add a CI pass that does the following:

Clone a few big repos, because more code means more coverage of existing constructs (E.g. systemd, mesa)
Reformat all meson files + All comments must be readded
Check that all files are parseable
Check that it configures (Optional)
Reformat all meson files and assert that they didn't change

As the formatter is platform-independent, it just would have to run on e.g. linux and e.g. only if the file formatter.py was changed

lgtm-com · 2022-11-12T19:16:26Z

This pull request introduces 1 alert when merging 4be7fae into 611bd4e - view on LGTM.com

new alerts:

1 for Unused import

tristan957 · 2022-11-13T22:06:22Z

@annacrombie what are your thoughts here?

annacrombie · 2022-11-15T13:52:43Z

I just checked and muon also deletes comments after a opening paren, e.g.

a = ( # comment
1
)

This is impossible handle with the strategy both muon and this PR use because you only have so many nodes to attach comments to, but you can comment anywhere inside a parenthesized expression. For example, this expression:

(a = (1+2))

corresponds roughly to the following AST:

  paren
    |
assignment
  /   \
id     paren
         |
        plus        
        / \
       1   2

Which gives us 7 nodes to attach comments to.

But it is entirely possible to write:

( # comment 1
# comment 2
# comment 3
a # comment 4
= # comment 5
# comment 6
( # comment 7
1 # comment 8
+ # comment 9
2 # comment 10
) # comment 11
# comment 12
) # comment 13

Which already greatly exceeds the amount of slots we have, and this example could technically have an unlimited amount of comments inside the () anyway.

muon partially solves this problem for lists, by adding another special formatting node fmt_eol, which keeps newlines inside of [], {}, and () expressions (the standard parser throws them away), and special code in the list parser to handle them. Extending this handling to the general parser so that newlines could be handled anywhere inside a statement could be difficult, I'm not sure.

Unrelated, but @JCWasmx86, how would you feel about renaming some of the formatting options, e.g. kwa_ml? I find it isn't a very intuitive name and would be happy to change it in muon.

lgtm-com · 2022-11-15T18:29:14Z

This pull request introduces 2 alerts when merging a3be218 into 8dfa550 - view on LGTM.com

new alerts:

1 for Unused import
1 for Nested loops with same variable

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

JCWasmx86 · 2022-11-15T18:30:47Z

Thanks a lot for your thoughts.
With those new commits (I will clean it up, as soon as #10972 is reviewed+merged, I was able to find those numbers:

Project name	Number of comments	Number of lost comments	Percentage of lost comments
systemd	340	0	0.00%
mesa	3931	1	0.025%
gnome-builder	335	0	0.00%
gstreamer	1045	6 (*)	0.57%
gtk	273	1	0.36%
glib	619	3	0.48 %

As you can see from these huge meson-code bases (Assuming a lot of syntax is used), we only have a quite low error rate. For gstreamer, 36 of those are from one bug, the other 6 are more difficult to solve

(*) Those were 42, but I fixed a bug, so only 6 comments are lost now.

muon partially solves this problem for lists, by adding another special formatting node fmt_eol, which keeps newlines inside of [], {}, and () expressions (the standard parser throws them away), and special code in the list parser to handle them. Extending this handling to the general parser so that newlines could be handled anywhere inside a statement could be difficult, I'm not sure.

I have a lot of special cases, too in order to improve the formatting output, but I use more some heuristics and a bit of praying. But interesting approach, I will look into it.

But it is entirely possible to write:

I would say we shouldn't optimize for the 0.001% of insane code. Sure you could consider it a bug, but at the end you probably invest a few hours/days to have a similar formatting for this one insane case, while using this time would have benefited a huge majority. I think those really-really-edge cases aren't important right now.

Unrelated, but @JCWasmx86, how would you feel about renaming some of the formatting options, e.g. kwa_ml? I find it isn't a very intuitive name and would be happy to change it in muon.

Yes, just go ahead, I will follow you :)

JCWasmx86 · 2022-11-15T18:33:23Z

And in addition to that, I did a few experiments regarding idempotency. After maximum 3 iterations every project i have tested did converge to one formatting, so it would be probably wise to do something like:

lines = ...
for i in range(0, MAX_ITERATIONS):
    new_lines = format(lines)
    if equal(lines, new_lines):
       break
    lines = new_lines

Sure it would cost a bit of performance, but I think that it is worth it

lgtm-com · 2022-11-15T19:22:12Z

This pull request introduces 1 alert when merging 4fa74cb into 8dfa550 - view on LGTM.com

new alerts:

1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

lgtm-com · 2022-11-16T07:44:03Z

This pull request introduces 1 alert when merging b81eb65 into 8ee4660 - view on LGTM.com

new alerts:

1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

lgtm-com · 2022-11-16T08:22:32Z

This pull request introduces 1 alert when merging 910c7c2 into 8ee4660 - view on LGTM.com

new alerts:

1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

lgtm-com · 2022-11-16T08:44:39Z

This pull request introduces 1 alert when merging f0ed332 into bfc8132 - view on LGTM.com

new alerts:

1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

codecov · 2022-11-16T08:46:01Z

Codecov Report

Merging #10971 (bfc8132) into master (8ee4660) will not change coverage.
The diff coverage is 100.00%.

❗ Current head bfc8132 differs from pull request most recent head df4d72a. Consider uploading reports for the commit df4d72a to get more accurate results

@@           Coverage Diff           @@
##           master   #10971   +/-   ##
=======================================
  Coverage   68.58%   68.58%           
=======================================
  Files         412      412           
  Lines       87861    87861           
  Branches    20728    20728           
=======================================
  Hits        60261    60261           
  Misses      23093    23093           
  Partials     4507     4507

Impacted Files	Coverage Δ
mesonbuild/interpreter/interpreter.py	`84.58% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lgtm-com · 2022-11-16T09:07:32Z

This pull request introduces 1 alert when merging df4d72a into bfc8132 - view on LGTM.com

new alerts:

1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

JCWasmx86 · 2022-11-16T09:08:19Z

Add a script for automatically formatting a project, counting the number of lost comments and how long it takes to converge to one formatting, example results:

systemd has 340, lost 0 during formatting (0.0%)
systemd needed 2 formatting passes to be stable
mesa has 3931, lost 1 during formatting (0.025%)
mesa needed 3 formatting passes to be stable
gnome-builder has 335, lost 0 during formatting (0.0%)
gnome-builder needed 2 formatting passes to be stable
gstreamer has 1338, lost 6 during formatting (0.448%)
gstreamer needed 2 formatting passes to be stable
gtk has 273, lost 1 during formatting (0.366%)
gtk needed 2 formatting passes to be stable
glib has 619, lost 3 during formatting (0.485%)
glib needed 2 formatting passes to be stable
Found 6836 comments, but lost 11 during formatting (0.161%)

This shows that probably after five passes the vast majority of files won't change

Currently only prints the amount of comments, but will print other stats in future (Speed, comment-readding-quote, etc.)

This allows readding all comments from systemd

for a selected number of projects

…o one formatting

tristan957 · 2023-01-13T21:42:02Z

I think it would help to get this reviewed if the commit history was cleaned up a bit.

JCWasmx86 · 2023-01-14T08:14:30Z

Hey, I still need to fully implement the approach described here: #10971 (comment) But yes, I can clean up the history (Albeit I play with the thought to make just one/two commits of it ^^)

bruchar1 · 2023-03-02T18:40:52Z

@JCWasmx86 I had a similar idea. However, my approach has subtle differences with yours that I think could solve or simplify some of your problems:
1- I return a Token object for comments as well
2- There is a Node object for standalone comments (i.e. comments on their own line outside any other nodes)
3- Every Node can have a comment attached to it

Therefore, if you have something like this:

# this belong to a CommentNode
function( # comment belongs to args node
    posarg, # comment belongs to the posarg node (e.g. stringnode)
    kw: kwargvalue,  # comment belongs to the kwargvalue
)  # comment belongs to the function node

when the printer encounters a comment, it prints it and breaks the line.

since comments are added to the nodes by the parser, it should not be possible to have a comment in a wrong place. But since every line is represented by a node, there should be no places where comments are not possible (except between args (but we could allow a CommentNode as a pos arg) and between kwargs (not sure how to handle this...)

My proof of concept is here: 72c3773

JCWasmx86 · 2023-03-03T16:22:43Z

Interesting idea, albeit I don't know how well it would handle e.g. this file https://git.sr.ht/~lattis/muon/tree/master/item/tests/fmt/crazy_comments.meson

But after a few more thoughts I doubt the sense of a second formatter. There is already muon - working perfectly- , so why have a second formatter, that while approximating the formatting of muon, will probably never reach equivalent output for each possible input?

bruchar1 · 2023-03-03T16:27:42Z

But after a few more thoughts I doubt the sense of a second formatter. There is already muon - working perfectly- , so why have a second formatter, that while approximating the formatting of muon, will probably never reach equivalent output for each possible input?

I see the formatter itself as only one possible application of having a complete representation of the meson.build file in AST.

Another important application, imho, is to allow the rewriter to keep comments in modified nodes, and in a more general way, to allow writing script for generating and modifying meson.build files.

tristan957 · 2023-03-03T16:50:30Z

I still think there is value in Meson having a formatter, but obviously don't waster your time on it if you don't see value. Until the entire world moves to muon, having to have 2 developer tools kind of sucks.

JCWasmx86 · 2023-10-03T17:21:54Z

Superseded by #12318

JCWasmx86 force-pushed the master branch 2 times, most recently from be044f5 to 1a6b24d Compare October 31, 2022 15:43

JCWasmx86 mentioned this pull request Oct 31, 2022

Add ParenthesizedNode #10972

Closed

JCWasmx86 force-pushed the master branch from 1a6b24d to 5522fe3 Compare November 5, 2022 20:12

JCWasmx86 force-pushed the master branch from 083caea to 4be7fae Compare November 12, 2022 19:07

JCWasmx86 force-pushed the master branch from 08a220c to a3be218 Compare November 15, 2022 18:14

JCWasmx86 force-pushed the master branch from 4fa74cb to b81eb65 Compare November 16, 2022 07:31

JCWasmx86 added 17 commits December 28, 2022 20:42

Add support for -q flag

9029bed

Add verbose option.

6b6cab7

Currently only prints the amount of comments, but will print other stats in future (Speed, comment-readding-quote, etc.)

Fix linter

2a0fcb6

Allow invalid files during recursing

eb785db

Add comments in dict literals.

82ccd55

This allows readding all comments from systemd

Add more stuff

615f8d2

Minor bugfixes

d9cd7fe

Add support for things at the end of file

f975df2

Add reminder for broken gstreamer

11f9ab5

Add tests for GStreamer dicts

bafaeed

Fix LGTM warning

c7e28b2

Add more stuff for GStreamer

2c4216d

Partial fixes for GLib

d303f19

Print to stderr

6bc73e9

Add stats script for easily showing the amount of removed comments

c7f12c0

for a selected number of projects

stats.sh: Add confirmation prompt and tests until when it converges t…

7cf756e

…o one formatting

Update script to add total stats

23da1d7

JCWasmx86 force-pushed the master branch from b98b6bd to 23da1d7 Compare December 28, 2022 19:42

JCWasmx86 added 2 commits December 28, 2022 20:44

Merge branch 'comments'

d4c0776

Add formatter2

ca2a515

JCWasmx86 force-pushed the master branch from 96cea7f to ca2a515 Compare December 28, 2022 20:10

JCWasmx86 closed this Oct 3, 2023

JCWasmx86 mentioned this pull request Oct 3, 2023

Meson format #12318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add unstable formatter #10971

Draft: Add unstable formatter #10971

JCWasmx86 commented Oct 31, 2022 •

edited

Loading

tristan957 commented Oct 31, 2022

tristan957 commented Oct 31, 2022

lgtm-com bot commented Oct 31, 2022

lgtm-com bot commented Oct 31, 2022

tristan957 commented Nov 1, 2022

JCWasmx86 commented Nov 1, 2022 •

edited

Loading

tristan957 commented Nov 1, 2022

JCWasmx86 commented Nov 1, 2022

lgtm-com bot commented Nov 5, 2022

JCWasmx86 commented Nov 12, 2022 •

edited

Loading

lgtm-com bot commented Nov 12, 2022

tristan957 commented Nov 13, 2022

annacrombie commented Nov 15, 2022

lgtm-com bot commented Nov 15, 2022

JCWasmx86 commented Nov 15, 2022 •

edited

Loading

JCWasmx86 commented Nov 15, 2022

lgtm-com bot commented Nov 15, 2022

lgtm-com bot commented Nov 16, 2022

lgtm-com bot commented Nov 16, 2022

lgtm-com bot commented Nov 16, 2022

codecov bot commented Nov 16, 2022 •

edited

Loading

lgtm-com bot commented Nov 16, 2022

JCWasmx86 commented Nov 16, 2022 •

edited

Loading

tristan957 commented Jan 13, 2023

JCWasmx86 commented Jan 14, 2023

bruchar1 commented Mar 2, 2023

JCWasmx86 commented Mar 3, 2023

bruchar1 commented Mar 3, 2023

tristan957 commented Mar 3, 2023

JCWasmx86 commented Oct 3, 2023

Draft: Add unstable formatter #10971

Draft: Add unstable formatter #10971

Conversation

JCWasmx86 commented Oct 31, 2022 • edited Loading

tristan957 commented Oct 31, 2022

tristan957 commented Oct 31, 2022

lgtm-com bot commented Oct 31, 2022

lgtm-com bot commented Oct 31, 2022

tristan957 commented Nov 1, 2022

JCWasmx86 commented Nov 1, 2022 • edited Loading

tristan957 commented Nov 1, 2022

JCWasmx86 commented Nov 1, 2022

lgtm-com bot commented Nov 5, 2022

JCWasmx86 commented Nov 12, 2022 • edited Loading

lgtm-com bot commented Nov 12, 2022

tristan957 commented Nov 13, 2022

annacrombie commented Nov 15, 2022

lgtm-com bot commented Nov 15, 2022

JCWasmx86 commented Nov 15, 2022 • edited Loading

JCWasmx86 commented Nov 15, 2022

lgtm-com bot commented Nov 15, 2022

lgtm-com bot commented Nov 16, 2022

lgtm-com bot commented Nov 16, 2022

lgtm-com bot commented Nov 16, 2022

codecov bot commented Nov 16, 2022 • edited Loading

Codecov Report

lgtm-com bot commented Nov 16, 2022

JCWasmx86 commented Nov 16, 2022 • edited Loading

tristan957 commented Jan 13, 2023

JCWasmx86 commented Jan 14, 2023

bruchar1 commented Mar 2, 2023

JCWasmx86 commented Mar 3, 2023

bruchar1 commented Mar 3, 2023

tristan957 commented Mar 3, 2023

JCWasmx86 commented Oct 3, 2023

JCWasmx86 commented Oct 31, 2022 •

edited

Loading

JCWasmx86 commented Nov 1, 2022 •

edited

Loading

JCWasmx86 commented Nov 12, 2022 •

edited

Loading

JCWasmx86 commented Nov 15, 2022 •

edited

Loading

codecov bot commented Nov 16, 2022 •

edited

Loading

JCWasmx86 commented Nov 16, 2022 •

edited

Loading