Comment parser #11252

leana8959 · 2025-10-10T00:52:34Z

This is the first part of the exact print parser. In this PR I changed the lexer so instead of dropping the comments it emits them to the parser which is further stored in GenericPackageDescription.

Please let me know your thoughts!

Checklist below:

This PR modifies behaviour or interface

Include the following checklist in your PR:

Patches conform to the coding conventions.
Any changes that could be relevant to users have been recorded in the changelog.
- Is the change significant? If so, remember to add significance: significant in the changelog file.
The documentation has been updated, if necessary.
Manual QA notes have been included.
Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

This token is not needed, we will later use the position information to pad each token.

... which is not handling at all for the time being

We also reintroduced the flag "CABAL_PARSEC_DEBUG" to debug the lexer/parser.

yay

move ToExpr to orphan module

andreabedini

Thank you for taking on this work. I think adding a Comment constructor to Field is not a good design. It modifies the meaning of the type (and indeed this forces you to make functions like elementInLayoutContext return [Fields]).

I think this has been already discussed before: we have the ann parameter which can be used to keep hold on the comments. E.g.

data Comment ann = Comment !ann !ByteString
type FieldWithComments ann = Field ([Comment ann], ann)

In this design each Field carries the comments preceding it, annotated with their position. An extra annotation marks the position of the field itself. Any comment at the end of the file would need to be captured separately.

This is already the practice of few packages developed by the community. Is there a reason to deviate from this?

andreabedini · 2025-10-15T06:38:11Z

Cabal-syntax/src/Distribution/Fields/Parser.hs

 -- elementInLayoutContext ::= ':'  fieldLayoutOrBraces
 --                          | arg* sectionLayoutOrBraces
-elementInLayoutContext :: IndentLevel -> Name Position -> Parser (Field Position)
+elementInLayoutContext :: IndentLevel -> Name Position -> Parser [Field Position]


The comment suggest this function parses one field but the signature is changed to return a list of fields.

andreabedini · 2025-10-15T06:38:58Z

Cabal-syntax/src/Distribution/Fields/Parser.hs

-fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser (Field Position)
+-- fieldLayoutOrBraces   ::= '\\n'? '{' comment* (content comment*)* '}'
+--                         | comment* line? comment* ('\\n' line comment*)*
+fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser [Field Position]


Same as above.

leana8959 · 2025-10-15T11:45:00Z

Here's a preliminary benchmark done with hyperfine running on my computer with as little other programs running as possible, done in the same condition (beside being run 6 hours apart).
It seems like we are within the standard deviation, so there's no noticeable degrade of performance.

Upstream:

~/r/haskell/cabal λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
  Time (mean ± σ):     203.400 s ± 21.816 s    [User: 150.484 s, System: 39.487 s]
  Range (min … max):   183.805 s … 277.905 s    30 runs

This branch:

…/wt/haskell/cabal/exact-pp-leana λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
  Time (mean ± σ):     199.168 s ± 10.540 s    [User: 156.373 s, System: 39.818 s]
  Range (min … max):   184.443 s … 242.563 s    30 runs

Thank you for your response Andrea, I'll write up a response and get back to you soon :)

leana8959 · 2025-10-16T02:09:47Z

Thank you for your comment @andreabedini :)

I think this has been already discussed before: we have the ann parameter which can be used to keep hold on the comments.

That looks very interesting, but how would I deal with files that are just comments? To the point of view of readFields they should be valid yet we would have no Field to attach them to. Whether we should attach the comments above or below is yet another question. For example, in the sequence "comment element comment element comment", which element should grabs the comment in between?

I do think your model is very interesting so if you have the time to, please show working PR against mine so we can simply merge it in 🙏

This is already the practice of few packages developed by the community. Is there a reason to deviate from this?

Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems.

Are there other design issues that needs to be addressed ?

Bodigrim · 2025-10-16T23:51:48Z

That looks very interesting, but how would I deal with files that are just comments?

Are they valid Cabal files if there is nothing but comments? I don't think so.

Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems.

For instance, cabal-add has a function

annotateFieldsWithSource :: ByteString -> [Field Position] -> [Field ByteString]

which annotates each field with its source including all adjacent comments. It can be modified to return [Field (Position, ByteString) if you need both.

Bodigrim · 2025-10-16T23:55:36Z

Cabal-syntax/src/Distribution/FieldGrammar.hs

    match _ = Nothing
+
+-- | Collect comments into a map. The second field of the output will have no comment
+extractComments :: Ord ann => [Field ann] -> (Map.Map ann ByteString, [Field ann])


Could you possibly outline how the output of this function is supposed to be used? Now that we detached comments from fields, how do we reconstruct the original document? How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?

geekosaur · 2025-10-16T23:57:11Z

Are they valid Cabal files if there is nothing but comments? I don't think so.

Pretty sure you need at minimum name and one target.

jappeace · 2025-10-17T07:38:47Z

Hi friends, thanks for all your responses. Leana needs some time to read up on the exact proposal to see how it all fits together before replying.
After chatting with her, I think she wants to go with Andrea's design for the comment field parser. As you can see, she's deeply in the weeds about many of the details of the parser; she even corrected some of the field grammar comments!
I don't know what you guys think, but I think it's good progress 🚀

Bodigrim · 2025-10-17T21:54:43Z

Cabal is one of the toughest code bases I ever worked on, so I'm quite amazed by Leana making progress so quickly!

leana8959 added 30 commits September 25, 2025 18:58

add lexer tokens and rules

4e8ba54

remove lexer "Whitespace" token

df715b0

This token is not needed, we will later use the position information to pad each token.

implement "Comment" handling

c67466b

... which is not handling at all for the time being

temporary fix by dropping comments before parseGenericPackageDescription

14e384c

make metaFields a map of positions

6503ce8

rearrange and simplify field

58ad099

make lexer emit comment wherever they would occur

fa23509

stop parser from emitting indentation warning for comments

9d09bb5

fix: restore checkIndentation behaviour for Field

9a7d664

test: add dummy tests

f5dca10

test: accept new golden expressions

c329464

test: accept new golden expressions

edd270c

test: rename comment test group

53703c0

debug: trace tokens

2ef72c5

fix: split comments recursively

e5d0e91

fix: consume comments after colon in FieldLayoutOrBraces

a4455b3

debug: remove tracing

3bf77b2

test: update expected

29bf6af

test: improve comment tests

0a887c5

test: correct comment tests

3fc392c

test: assert interleaving comment parsing

2287f93

fix: correct interleaving comment parsing

a61a9e8

test: update expected

d5769d9

debug: remove tracing

bc9bd40

test: assert parsing of fieldline flag

8ad684e

test: update expected

f1abd47

fix: correct parsing fieldLine starting with -- as comment

70177ee

test: update expected

dc92b24

test: remove test case that doesn't pass on upstream

9091f3b

minor fixes

c42a3ba

leana8959 added 4 commits October 10, 2025 11:30

docs: update grammar specification for comments

d3a5620

ref: run hlint

d03d90d

improve describeToken on comments

b0e8d87

ref: make diff smaller

8828454

leana8959 mentioned this pull request Oct 10, 2025

Cabal exact printer in stages? #11227

Open

leana8959 added 16 commits October 10, 2025 12:14

test: fix no-thunks test

2ebec82

test: fix md5Check test

4e0876f

fix compiler errors and warnings

51fd822

test: add expectation for failing hackage test

4cee6fd

We also reintroduced the flag "CABAL_PARSEC_DEBUG" to debug the lexer/parser.

fix hackage test 001

b948bc4

fix hackage test

e317efb

test: disable comments in comparison in roundtrip hackage test

2f68c50

refactor parser

85d3016

refactor test

b3a1db3

style: run fourmolu

d2811d6

remove todos

0772f15

yay

test: remove test dependencies

40f9099

move ToExpr to orphan module

test: simplify

ac9e9bb

restore accidently formatted cabal

fa09f6d

restore previous debug behaviour

c8c6f65

refactor: don't use liftA2 and liftA3

e3b6a66

andreabedini reviewed Oct 15, 2025

View reviewed changes

leana8959 marked this pull request as ready for review October 16, 2025 10:21

Bodigrim reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comment parser #11252

Comment parser #11252

Uh oh!

leana8959 commented Oct 10, 2025 •

edited

Loading

Uh oh!

andreabedini left a comment

Uh oh!

andreabedini Oct 15, 2025

Uh oh!

andreabedini Oct 15, 2025

Uh oh!

leana8959 commented Oct 15, 2025

Uh oh!

leana8959 commented Oct 16, 2025

Uh oh!

Bodigrim commented Oct 16, 2025

Uh oh!

Bodigrim Oct 16, 2025

Uh oh!

geekosaur commented Oct 16, 2025

Uh oh!

jappeace commented Oct 17, 2025

Uh oh!

Bodigrim commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comment parser #11252

Are you sure you want to change the base?

Comment parser #11252

Uh oh!

Conversation

leana8959 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreabedini left a comment

Choose a reason for hiding this comment

Uh oh!

andreabedini Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

andreabedini Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

leana8959 commented Oct 15, 2025

Uh oh!

leana8959 commented Oct 16, 2025

Uh oh!

Bodigrim commented Oct 16, 2025

Uh oh!

Bodigrim Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

geekosaur commented Oct 16, 2025

Uh oh!

jappeace commented Oct 17, 2025

Uh oh!

Bodigrim commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leana8959 commented Oct 10, 2025 •

edited

Loading