-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCaml formatting #16
Comments
Cc @Niols |
I want to clarify that it is not an objective to match the result of either of these tools (not that it's a bad idea to look at these tools, I just wanted to prevent ambiguities). |
I believe it would be a great and exciting idea to target OCaml in this project:
I had started writing a few considerations about formatters and about what I'd expect from an OCaml formatter in particular (I wanted to start writing an “Ormolu-for-OCaml”, but this pretty much gets superseeded by tree-sitter-formatter). This is not exactly the place where to discuss this, but it might help define what we want to expect (or not) from tree-sitter-formatter's OCaml query file. I'll first share an opinion on what makes a “good” formatter for a rich language like OCaml (but this applies, in my opinion, to many others). After that, I will share a few random notes on things to pay attention to. I can't wait to see how this project unfolds! OCamlFormat vs. OrmoluIn my (very personal) opinion, formatters should be opinionated but they should take into account the choices the user made in their input. Basically, I am saying that Ormolu is getting right what OCamlFormat is getting wrong. I personally have never managed to use OCamlFormat in a satisfactory way and I truly believe that it comes from the two following points:
My tests of OCamlFormat were therefore always in the lines of:
I haven't done a survey of the OCaml ecosystem, but I know I share this kind of experiences with at least some OCaml programmer friends. In my opinion, Ormolu makes two points that are crucial for a formatter to be actually used (and that OCamlFormat gets wrong):
I am a bit afraid that those might be difficult to achieve with tree-sitter-formatter, though:
Random ConsiderationsSome of these considerations are specific to OCaml formatting while some others are actually quite generic to formatting in general. I'll also add that they also come from my opinion and not an absolute truth.
|
Great input! We are going to follow Ormolu's principle of letting the user input decide whether a block should be single-line or multi-line, and also of staying opinionated. It remains to be seen if tree-sitter-formatter will be able to give you satisfactory OCaml formatting, but hopefully the query language is powerful enough to let us. I'm curious, though, when you say "taking the user's input into consideration", are you just referring to line breaks, or more? Because detecting if the user wants to break a block into multiple lines is indeed within the scope of this project. See #14. There doesn't seem to be any tree-sitter grammar for lex and yacc, unfortunately, but maybe you should just use tree-sitter instead of those 😛 There is some support for ReasonML, though. |
Very nice! I mostly meant that, although I've always been wondering if it wouldn't be worth also detecting users that prefer operators at the beginning or the end of a line, eg. do_something
|> do_something_else
do_something |>
do_something_else or stuff like do_something @@ fun x ->
do_something_else
do_something
@@ fun x -> do_something_else or in types:
etc. I am not sure if it is possible to just get it right. Besides, it might depend a lot on the user and even on the operator (I tend to use |
I suppose it would not be the worst! But a lot of people are using those, so I suppose it would make sense to support them (at least in Tree Sitter-based syntax highlighting). I know there's been some work on supporting OCamlLex, OCamlYacc and Menhir in Tree Sitter, but I don't know how stable that is. |
@Niols How would you expect a formatter to treat this:
Specifically, would this be a good output?
I don't even know if that last semicolon is allowed. |
The last semicolon is allowed and I suppose it makes sense to add it as per the rule to avoid diffs. I tend to go for: type t = {
mutable buffer : bytes;
position : int;
} and that's what Tuareg does. Out of the box, type t = {
mutable buffer : bytes;
position : int;
} In any case, I wouldn't align |
Let me make it clearer that what Tuareg and type t =
{ mutable buffer : bytes;
position : int } but I suppose we can rule that out as per the diff rule. I also suppose that, for short records, a user might want to go for: type t = { mutable buffer : bytes; position : int } so maybe there is a vertical vs. horizontal layout consideration here. |
(Should have answered to this in a clearer way)
Yes! |
That last one is allowed, because the user decides between single/multi-line. The |
That's definitely a good reason to go for |
Yes, either that, or we simply extend our query to support Tuareg. Anyway, let's see what other challenges I run into first. Such as this kind of indenting:
|
I'm under the – possibly naive – belief that Let's normalise the final semicolon, please. |
This should format to let to_seq b =
let rec aux i () =
if i >= b.position then
Seq.Nil
else
let x = Bytes.unsafe_get b.buffer i in
Seq.Cons (x, aux (i+1))
in
aux 0 |
I'd go exactly like Arnaud on this one. |
I think so. And OCaml is definitely used outside of Emacs a lot nowadays.
Yes! |
I added #42. |
Do you want to put a hard line break after if i >= b.position then Seq.Nil or do you consider this a soft line that the user has decided to be a single line? |
I think that if u then
v
else
w |
Thing is that in the CST it is structured like this:
By the multi-line rule everything before If you really want a line break there, it is up to you as the programmer to put it there, isn't it? |
Another thing: ">=" and "+" are both |
I'm not sure I understand the CST considerations. But does that mean that both if u then let x = v1 in v2
else let y = w1 in w2 if u then let x = v1 in v2
else
let y = w1 in
w2 if u then
let x = v1 in
v2
else let y = w1 in w2 if u then
let x = v1 in
v2
else
let y = w1 in
w2 are possible depending on the user's choice? It sounds reasonable, especially if it makes for a simpler query, but 2) and 3) look quite odd to me. |
Let me check as soon as I am done with the example further up 😄 |
Currently getting this: let to_seq b =
let rec aux i () =
(* Note that b.position is not a constant and cannot be lifted out of aux *)
if i >= b.position then Seq.Nil
else
let x = Bytes.unsafe_get b.buffer i in
Seq.Cons (x, aux (i + 1))
in
aux 0 I think that's pretty close 🎉 |
Pretty good indeed! |
I guess that if we want to allow both (1) and (4) from @Niols (which is a reasonable choice) we also need to allow (2) and (3). Fair enough. |
I'm now able to format the entire OCaml sample file reasonably well. Not perfectly. But we can start logging specific formatting issues now, I think. @Niols Your four examples are formatted as expected, see https://github.com/tweag/tree-sitter-formatter/pull/53/files#diff-4b9fd3e2640276ee705e5851f570dca4d2cf44e2884a6c3038ed788cb3a84a85 |
Look at how these do it:
https://github.com/OCamlPro/ocp-indent
https://github.com/ocaml-ppx/ocamlformat
The text was updated successfully, but these errors were encountered: