Skip to content

Conversation

leana8959
Copy link

@leana8959 leana8959 commented Oct 10, 2025

This is the first part of the exact print parser. In this PR I changed the lexer so instead of dropping the comments it emits them to the parser which is further stored in GenericPackageDescription.

Please let me know your thoughts!


Checklist below:

This PR modifies behaviour or interface

Include the following checklist in your PR:

Copy link
Collaborator

@andreabedini andreabedini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking on this work. I think adding a Comment constructor to Field is not a good design. It modifies the meaning of the type (and indeed this forces you to make functions like elementInLayoutContext return [Fields]).

I think this has been already discussed before: we have the ann parameter which can be used to keep hold on the comments. E.g.

data Comment ann = Comment !ann !ByteString
type FieldWithComments ann = Field ([Comment ann], ann)

In this design each Field carries the comments preceding it, annotated with their position. An extra annotation marks the position of the field itself. Any comment at the end of the file would need to be captured separately.

This is already the practice of few packages developed by the community. Is there a reason to deviate from this?

-- elementInLayoutContext ::= ':' fieldLayoutOrBraces
-- | arg* sectionLayoutOrBraces
elementInLayoutContext :: IndentLevel -> Name Position -> Parser (Field Position)
elementInLayoutContext :: IndentLevel -> Name Position -> Parser [Field Position]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggest this function parses one field but the signature is changed to return a list of fields.

fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser (Field Position)
-- fieldLayoutOrBraces ::= '\\n'? '{' comment* (content comment*)* '}'
-- | comment* line? comment* ('\\n' line comment*)*
fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser [Field Position]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@leana8959
Copy link
Author

Here's a preliminary benchmark done with hyperfine running on my computer with as little other programs running as possible, done in the same condition (beside being run 6 hours apart).
It seems like we are within the standard deviation, so there's no noticeable degrade of performance.

Upstream:

~/r/haskell/cabal λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
  Time (mean ± σ):     203.400 s ± 21.816 s    [User: 150.484 s, System: 39.487 s]
  Range (min … max):   183.805 s … 277.905 s    30 runs

This branch:

…/wt/haskell/cabal/exact-pp-leana λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
  Time (mean ± σ):     199.168 s ± 10.540 s    [User: 156.373 s, System: 39.818 s]
  Range (min … max):   184.443 s … 242.563 s    30 runs

Thank you for your response Andrea, I'll write up a response and get back to you soon :)

@leana8959
Copy link
Author

Thank you for your comment @andreabedini :)

I think this has been already discussed before: we have the ann parameter which can be used to keep hold on the comments.

That looks very interesting, but how would I deal with files that are just comments? To the point of view of readFields they should be valid yet we would have no Field to attach them to. Whether we should attach the comments above or below is yet another question. For example, in the sequence "comment element comment element comment", which element should grabs the comment in between?

I do think your model is very interesting so if you have the time to, please show working PR against mine so we can simply merge it in 🙏

This is already the practice of few packages developed by the community. Is there a reason to deviate from this?

Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems.

Are there other design issues that needs to be addressed ?

@leana8959 leana8959 marked this pull request as ready for review October 16, 2025 10:21
@Bodigrim
Copy link
Collaborator

That looks very interesting, but how would I deal with files that are just comments?

Are they valid Cabal files if there is nothing but comments? I don't think so.

Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems.

For instance, cabal-add has a function

annotateFieldsWithSource :: ByteString -> [Field Position] -> [Field ByteString]

which annotates each field with its source including all adjacent comments. It can be modified to return [Field (Position, ByteString) if you need both.

match _ = Nothing

-- | Collect comments into a map. The second field of the output will have no comment
extractComments :: Ord ann => [Field ann] -> (Map.Map ann ByteString, [Field ann])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you possibly outline how the output of this function is supposed to be used? Now that we detached comments from fields, how do we reconstruct the original document? How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?

@geekosaur
Copy link
Collaborator

Are they valid Cabal files if there is nothing but comments? I don't think so.

Pretty sure you need at minimum name and one target.

@jappeace
Copy link
Collaborator

Hi friends, thanks for all your responses. Leana needs some time to read up on the exact proposal to see how it all fits together before replying.
After chatting with her, I think she wants to go with Andrea's design for the comment field parser. As you can see, she's deeply in the weeds about many of the details of the parser; she even corrected some of the field grammar comments!
I don't know what you guys think, but I think it's good progress 🚀

@Bodigrim
Copy link
Collaborator

Cabal is one of the toughest code bases I ever worked on, so I'm quite amazed by Leana making progress so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants