Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefix of the part-end delimiter to the part #387

Merged
merged 19 commits into from
Sep 21, 2022
Merged

Add prefix of the part-end delimiter to the part #387

merged 19 commits into from
Sep 21, 2022

Conversation

Leonidas-from-XIV
Copy link
Collaborator

#374 reports that the last line gets cut off when using (* $MDX part-end *). This is generally true because Part.Parse_parts.parse_line can only parse a line into a Part_end or a Normal, but a line with an end-marker is sort of both.

This PR adds the prefix of the line to the Part_end constructor so if there is any code that is not whitespace, it will get added to the constructor from where the code can attach the missing line to the part.

I've been thinking about the design of this solution because attaching the value to the Part_end constructor feels wrong since Ocaml_delimiter only deals with delimiters and not the payload of the block and this is changing it. I feel that maybe the parsing ought to be done differently and signal that there is something else on the line that still needs to be processed so it would get processed into something more like [Normal prefix; Part_end].

As such, I welcome other suggestions how to solve this in a "nice" way, without breaking the separation between the parsing and the delimiters in ugly ways.

Fixes #374.

@NathanReb
Copy link
Contributor

Ocaml_delimiter is only used by Parts so removing it entirely would not be too bad if we test the latter properly.

Parts is in a pretty sad state from the look of it so we could cease this as an occasion to improve it a little bit.

Another question I'm asking myself is whether this is the right way to fix this issue. It seems that enforcing delimiters sit on their own line is sensible. Maybe we should start a conversation with the ocamlformat team to try to see how we could come to an agreement on this.

If there is nothing to be done on their side we can revert to allow inline part, or at least end delimiters I guess. The reason I'd no be too kin to support it is that I think it can lead to pretty ambiguous cases, especially if in the example from the linked issue, the end delimiter was directly followed by a begin one. Inline begin delimiters can be tedious to deal with, especially regarding indentation.

@Leonidas-from-XIV
Copy link
Collaborator Author

I agree that the current implementation is not great, but I think it is good as a starting point for discussion. I think we agree that the current way, where it silently drops the last line is bad - even a version where it would error out would be better because it wouldn't lead to potentially subtle errors where the inlined code is simply missing lines but it is only noticed way too late. Throwing an error in that case would be pretty simple and require fewer changes than what's in this branch already.

As a matter of fact, we weren't aware of the issue until the bugreport either, so it shows how fragile the current baheviour is.

Making OCamlformat not put the delimiters in-line and emitting an error message if people do that by hand could be a decent compromise, because as you say, the semantics of in-line delimiters are a bit complicated.

If we do decide to go forward with allowing inline end-delimiters, I'll probably drop Ocaml_delimiter because the more I think about it the less sense it makes to split the code this way.

@NathanReb
Copy link
Contributor

Oh no don't get me wrong, there's no particular problem with the implementation. I'm merely wondering what the best fix would be here!

Copy link
Contributor

@NathanReb NathanReb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I have a few minor comments but hen we should be good. I think some follow up refactoring of Parts could help improve the code overall but we can do this a bit later, in particular, we could do it once we drop the attributes based delimiters support!

lib/part.mli Outdated Show resolved Hide resolved
lib/part.ml Outdated
let msg = Printf.sprintf "File ended before part %s." p in
Error (msg, nline)
| File_end, None -> Ok [ make_part ~is_begin_end_part:true part_lines ]
match input_line_err input with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I find the new implementation, with nested pattern matching, a bit harder to follow.

Maybe we could extract a separate, mutually recursive function to handle the Ok _ case.

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit easier to follow because it really had two concerns:

  1. EOF handling
  2. Parsing

File_end was not a result of parsing, it is just a signal to the parser to stop. But admittedly, the function is quite complex and nested (and it took me a while to understand), so you correctly point out that that it still conflates parsing and IO. It's a good point, let me try to improve it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote parse_parts further:

  1. Instead of reading from the channel, I read into a list, thus the pattern match on the End_of_file marker became a pattern match on [], that is, the end of the input. This of course means it still has to deal with these two cases.
  2. Then I realized that the function doesn't need to be recursive. What was making the function hard to read for me was the recursive calls with >>| to cons things to the list. I noted that fold_left can do the same thing:
    • Supply an element to work on and terminate when done
    • Deal with the state in the accumulator
      Thus thus the function could be split into "deal with the inputs" via fold_left and "deal with the tail" based on the result of the accumulator. Given our minimum version is 4.08, let* helps to unwrap the arguments.
  3. Finally, given I changed from a version that lazily reads out of an in_channel to a version that eagerly reads the input into a list, I thought I can put the laziness back in by switching from List.fold_left to Seq.fold_left which was indeed a drop-in replacement.

I believe the resulting function is a bit easier to understand given each piece of code only deals with one aspect and the fold is a bit simpler to follow through than the recursion. Fortunately the tests are pretty comprehensive so I immediately got feedback when some part of the logic was implemented incorrectly.

@Leonidas-from-XIV
Copy link
Collaborator Author

I've been iterating on and off on this, thinking of smaller improvements that lead to other improvements, that again lead to other refactorings. As such the removal of the File_end constructor from Parts_part.t and the addition of Content in Ocaml_delimiter.t has shown that these types are identical, just spelled somewhat differently. Thus this made me unify the types, at which point it seemed to me that it would be most sensible to just join these two modules.

I've reworked the test a bit: the test cases of Parts_part and of Ocaml_delimiter were mostly similar except the latter tested the error cases as well. I've exposed a function that returns the result type and added these test cases to the Parts test as well.

I think this is overall an improvement, because the tests tested very similar functionality, if the Ocaml_delimiter test would fail, the Parts test with the same input would fail in the same way which did not add much value to the tests. Unifying this is in my opinion a bit easier to understand and also solves the design problem that lines previously could only be a separator or content.

(Error
(`Msg
"'part-end' delimiter does not accept a value. Please write '(* \
$MDX part-end *)' instead."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important but since this part of the API is now internal, wouldn't it better to test it with ppx_expect instead ? So we can not export this module anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's well possible, but it would introduce an additional dependency since we don't depend on ppx_expect at the moment. While this could certainly be done in the future, by refactoring (all the relevant) tests, I wouldn't add it to this PR because that would be scope-creep.

Copy link
Contributor

@gpetiot gpetiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is a clear improvement indeed. Maybe in the future it would be neat to support multiline comments such as

(* $MDX
       part-end *)

I tend to prefer ocamllex over regexps so I'm a bit biased. Maybe not having to maintain the token flow line-by-line would make it easier to parse as well? But mdx is already designed this way in its current state so this can wait.

@Leonidas-from-XIV
Copy link
Collaborator Author

I would generally agree on ocamllex. However I kept the parsers as regexp for now since one of them is deprecated (and we should try to remove it for MDX 3, #396), at which point it should be easier to switch the other one to ocamllex instead of porting both now and deleting one later or having a mix of two different ways of parsing.

@Leonidas-from-XIV Leonidas-from-XIV merged commit cb9c545 into realworldocaml:main Sep 21, 2022
@Leonidas-from-XIV Leonidas-from-XIV deleted the mdx-markers branch September 21, 2022 09:37
Leonidas-from-XIV added a commit to Leonidas-from-XIV/opam-repository that referenced this pull request Jan 6, 2023
CHANGES:

#### Added

- Report all parsing errors in Markdown files (realworldocaml/mdx#389, @NathanReb)

#### Changed

- Preserve indentation in multiline OCaml blocks in .mli files (realworldocaml/mdx#395, @panglesd)

#### Fixed

- Fixed compatibility with Cmdliner 1.1.0 (realworldocaml/mdx#371, @Leonidas-from-XIV)
- Report errors and exit codes of toplevel directives (realworldocaml/mdx#382, @talex5,
  @Leonidas-from-XIV)
- Fix block locations in error reporting (realworldocaml/mdx#389, @NathanReb)
- Include the content of the line that features the `part-end` MDX directive in
  the output, before that line would've been dropped (realworldocaml/mdx#374, realworldocaml/mdx#387,
  @Leonidas-from-XIV)
- Handle EINTR signal on waitpid call by restarting the syscall. (realworldocaml/mdx#409, @tmcgilchrist)
- Fix parsing of multiline toplevel phrases in .mli files (realworldocaml/mdx#394, realworldocaml/mdx#397,
  @Leonidas-from-XIV)

#### Removed

- Removed warning about missing semicolons added in MDX 1.11.0 and the
  automatic insertion of semicolons in the corrected files introduced in MDX
  2.0.0. (realworldocaml/mdx#398, @Leonidas-from-XIV)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

$MDX delimiters in ocaml cut off any code on the same line
3 participants