-
Notifications
You must be signed in to change notification settings - Fork 722
Comment parser #11252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Comment parser #11252
Conversation
This token is not needed, we will later use the position information to pad each token.
... which is not handling at all for the time being
We also reintroduced the flag "CABAL_PARSEC_DEBUG" to debug the lexer/parser.
yay
move ToExpr to orphan module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking on this work. I think adding a Comment
constructor to Field
is not a good design. It modifies the meaning of the type (and indeed this forces you to make functions like elementInLayoutContext
return [Fields]
).
I think this has been already discussed before: we have the ann
parameter which can be used to keep hold on the comments. E.g.
data Comment ann = Comment !ann !ByteString
type FieldWithComments ann = Field ([Comment ann], ann)
In this design each Field carries the comments preceding it, annotated with their position. An extra annotation marks the position of the field itself. Any comment at the end of the file would need to be captured separately.
This is already the practice of few packages developed by the community. Is there a reason to deviate from this?
-- elementInLayoutContext ::= ':' fieldLayoutOrBraces | ||
-- | arg* sectionLayoutOrBraces | ||
elementInLayoutContext :: IndentLevel -> Name Position -> Parser (Field Position) | ||
elementInLayoutContext :: IndentLevel -> Name Position -> Parser [Field Position] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggest this function parses one field but the signature is changed to return a list of fields.
fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser (Field Position) | ||
-- fieldLayoutOrBraces ::= '\\n'? '{' comment* (content comment*)* '}' | ||
-- | comment* line? comment* ('\\n' line comment*)* | ||
fieldLayoutOrBraces :: IndentLevel -> Name Position -> Parser [Field Position] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Here's a preliminary benchmark done with Upstream: ~/r/haskell/cabal λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 203.400 s ± 21.816 s [User: 150.484 s, System: 39.487 s]
Range (min … max): 183.805 s … 277.905 s 30 runs This branch: …/wt/haskell/cabal/exact-pp-leana λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 199.168 s ± 10.540 s [User: 156.373 s, System: 39.818 s]
Range (min … max): 184.443 s … 242.563 s 30 runs Thank you for your response Andrea, I'll write up a response and get back to you soon :) |
Thank you for your comment @andreabedini :)
That looks very interesting, but how would I deal with files that are just comments? To the point of view of readFields they should be valid yet we would have no I do think your model is very interesting so if you have the time to, please show working PR against mine so we can simply merge it in 🙏
Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems. Are there other design issues that needs to be addressed ? |
Are they valid Cabal files if there is nothing but comments? I don't think so.
For instance, annotateFieldsWithSource :: ByteString -> [Field Position] -> [Field ByteString] which annotates each field with its source including all adjacent comments. It can be modified to return |
match _ = Nothing | ||
|
||
-- | Collect comments into a map. The second field of the output will have no comment | ||
extractComments :: Ord ann => [Field ann] -> (Map.Map ann ByteString, [Field ann]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you possibly outline how the output of this function is supposed to be used? Now that we detached comments from fields, how do we reconstruct the original document? How do we do it if [Field ann]
is programmatically updated (say, adding or removing elements)?
Pretty sure you need at minimum |
Hi friends, thanks for all your responses. Leana needs some time to read up on the exact proposal to see how it all fits together before replying. |
Cabal is one of the toughest code bases I ever worked on, so I'm quite amazed by Leana making progress so quickly! |
This is the first part of the exact print parser. In this PR I changed the lexer so instead of dropping the comments it emits them to the parser which is further stored in
GenericPackageDescription
.Please let me know your thoughts!
Checklist below:
This PR modifies behaviour or interface
Include the following checklist in your PR:
significance: significant
in the changelog file.