Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate 'visitors' and 'alphalib' for building SugarTraversals? #255

Closed
yallop opened this issue Sep 18, 2017 · 21 comments
Closed

Investigate 'visitors' and 'alphalib' for building SugarTraversals? #255

yallop opened this issue Sep 18, 2017 · 21 comments

Comments

@yallop
Copy link
Member

yallop commented Sep 18, 2017

The SugarTraversals module module was once generated automatically from the datatype definition by a Camlp4 extension. Nowadays it seems to be maintained by hand.

It may be worth investigating @fpottier's visitors library for generating visitor classes, and alphaLib library for dealing with binding, which are both described in Visitors Unchained (ICFP 2017), and which support a principled approach to traversing ASTs with binding using modern OCaml tools such as ppx, with the aim of returning to a situation where SugarTraversals no longer needs to be updated by hand.

@jamescheney jamescheney added this to the 0.8 milestone Dec 7, 2017
@jamescheney
Copy link
Contributor

We should do this for 0.8.

@jamescheney jamescheney removed this from the 0.8 milestone Dec 7, 2017
@fpottier
Copy link

fpottier commented Dec 7, 2017

Hi! visitors is released and fully documented.
alphaLib is not really released or documented,
but I'd be happy to answer questions about it.

@dhil dhil added this to the 0.8 milestone Dec 7, 2017
@dhil
Copy link
Member

dhil commented May 10, 2018

Large portions of the code base depends on the visitors defined in SugarTraversals. To assess whether we want to use these libraries to derive visitors, it may be worthwhile, as a starting point, to use them to define visitors for the backend IR. Contrary to the front-end AST, the back-end IR has only a few (unused) optimisation passes/visitors, meaning that we can rapidly prototype the libraries for the IR. It may relevant for @frank-emrich's work on type checking the IR.

@jamescheney
Copy link
Contributor

jamescheney commented May 10, 2018 via email

@frank-emrich
Copy link
Contributor

Frank agrees :)
There are currently only three traversals of the IR, including the type checking. I'm happy to try out how it would look like to use the visitors library for this.

@frank-emrich
Copy link
Contributor

I have now had a more detailed look at the visitors library/preprocessor and identified a few problems:

  1. The library does not support polymorphic variants. However, virtually all of Links (the surface syntax, IR syntax, types,...) is represented by polymorphic variants. I don't know if this is really necessary or if we could switch to "normal" variants, but it would require a lot of work to perform those changes.

  2. The library cannot be used for .mli files, i.e. it cannot create type definitions for generated visitor classes, only the implementation of the class. The library is designed to emit code traversing a particular type, but the emitted code must have its types inferred. In particular, the types are not known to the visitors library when emitting visitor classes (so this limitation won't be fixed anytime soon).

We could try to navigate around this as follows, but neither of these solutions seems desirable to me:

  • Once a visitor class has been generated, we could have its type inferred and paste the result into the .mli file. This would have to be repeated each time we change the underlying type.
  • We could stop using .mli files for those modules that include a generated visitor. Since this involes modules like Ir and Types, this does not seem like a good idea.
  • We could put all of our customized visitor implementations (i.e., classes inheriting from the generated visitor) into the same file as the type and the generated visitor. This may work for the IR, where we only have a few, small traversals/visitors, but it would not be feasible for the sugar traversals in the frontend.
  1. I can't get the visitors preprocessor to work with our current jbuilder/dune build setup. In theory, it is implemented as a plugin for ppx_deriving, which should make the integration trivial. However, the interplay between jbuilder and ppx_deriving seems to be a mess right now (cf. understanding the current mess ocaml-ppx/ppx_deriving#153).
    There has recently been a commit to the visitors library claiming to add support for jbuilder by someone other than Francois Pottier. So if we decide to use the library, I could email that person and ask how they made it work.

@fpottier
Copy link

Hello @frank-emrich! Concerning (1), you are correct: I have not implemented support for polymorphic variants in visitors, mainly because I was not sure there was a need for it, and also because I am not familiar with polymorphic variants. I suppose it could in principle be done.

Concerning (2), you are also correct, that is, the auto-generated visitor classes are just code. They are intended to appear in an .ml file which has no corresponding .mli file. The inferred class types would be huge anyway -- typically larger than the auto-generated code, and harder to read than the code. My recommendation would be to place just (a copy of) the type definition of the interest, with a deriving visitors incantation, in an .ml file, with no .mli file. Sometimes this can be done without actually duplicating the type definition (see how ppx_import is used in section 2.14 of the manual). Sometimes the type definition is manually copied and possibly decorated with attributes that influence the generation of the visitor classes; see e.g. how the Rotor people at Kent create visitors for the OCaml AST.

@frank-emrich
Copy link
Contributor

Thanks for your help, @fpottier!
Regarding (1): It's good to know that this seems to be more of a technical limitation rather than a general one.

Regarding (2): Your idea of "copying" (using ppx_import) existing type definitions into a dedicated file to create the visitor there sounds very nice and much better than my workarounds above. Thanks for that!

@jamescheney
Copy link
Contributor

I think we have decided not to try to do this for 0.8 so untagging for now.

@jamescheney jamescheney removed this from the 0.8 milestone Jun 15, 2018
@jamescheney
Copy link
Contributor

After further investigation by @jstolarek it seems that we are making use of polymorphic variants in nontrivial (or at least hard-to-remove) ways, so we can't easily migrate to using ordinary datatypes or alphalib/visitors. While it still might make sense for the IR, the real benefit would be auto-generating the much larger traversals for the front-end ASTs. We could revisit this at some point.

@dhil
Copy link
Member

dhil commented Dec 12, 2018

After further investigation by @jstolarek it seems that we are making use of polymorphic variants in nontrivial (or at least hard-to-remove) ways

I think it is only true for type algebra in types.ml. I don't think that it is the case for the term algebra sugartypes.ml. Nevertheless, it might be a tedious task to migrate the code. Incidentally, I would be interested in whether there is any practical compile-time performance gain from using ordinary variant types -- in theory there is, as the compiler can use clever decision tree encodings of match expressions.

We have previously discussed whether we should collapse the type algebra into a single monolithic algebra. I think it is worth revisiting this idea. In doing so, we could opt to use ordinary variant types, and then we could use Pottier's visitor library to generate all the boilerplate code.

@jstolarek
Copy link
Contributor

I don't think that it is the case for the term algebra sugartypes.ml.

sugartypes.ml has some benignly-looking data types like:

type location = [`Client | `Server | `Native | `Unknown]
    [@@deriving show]
type restriction = [ `Any | `Base | `Session | `Effect ]
    [@@deriving eq,show]
type linearity   = [ `Any | `Unl ]
    [@@deriving eq,show]
type freedom = [`Flexible | `Rigid]
    [@@deriving show]
type primary_kind = [`Type | `Row | `Presence]
    [@@deriving show]
type fieldconstraint = [ `Readonly | `Default ]
    [@@deriving show]

IIRC I started my attempts of refactoring with location and, after failing, freedom. I am not saying the refactoring is impossible - it certainly is - but in many places I've run into problems that went beyond simply removing ticks, e.g. coercions in types.ml:

match Unionfind.find point with
| `Var (var, _, freedom) -> [var, ((freedom :> flavour), `Presence, `Free)]

It might be a simple refactoring for someone more experienced with OCaml and with better familiarity of Links source code. But it wasn't simple for me.

@frank-emrich
Copy link
Contributor

I have played around with the visitors library once again to see if Janek's work on removing polymorphic variants has enabled us to use it.

No support for our style of accumulation via the visiting object

One major obstacle that would prevent us from porting our existing traversals based on SugarTraversals is the following: Our fold and fold_map classes use the traversal object as an accumulator during the traversal. In the fold class, this looks as follows:

| TryInOtherwise (_p1, _pat, _p2, _p3, _ty) ->
          let o = o#phrase _p1 in
          let o = o#pattern _pat in
          let o = o#phrase _p2 in
          let o = o#phrase _p3 in
          o

Similarly, in the fold_map class, it looks like this:

| TryInOtherwise (_p1, _pat, _p2, _p3, _ty) ->
          let (o, _p1) = o#phrase _p1 in
          let (o, _pat) = o#pattern _pat in
          let (o, _p2) = o#phrase _p2 in
          let (o, _p3) = o#phrase _p3 in
          (o, (TryInOtherwise (_p1, _pat, _p2, _p3, _ty)))

However, the visitors library has no support for this kind of traversal, where either the updated object or some "accumulator" is returned from each sub-traversal and passed to the next. The library does offer a different style of generated folds and also a "reducing" traversal, but I assume that it would require a lot of effort to adapt all of our existing traversals to this style.

Naming conventions

The visitors library makes fewer hard-coded assumptions about how things should be named. This means that we need to provide many annotations to specify names and where to look for other visitors.

  1. While the deriver for printing functions will automatically create functions show_foo and pp_foo if used for a type foo, the visitors library doesn't have such a convention. A map traversal will by default yield a class named map. This is justified because it may be a map for a group of mutually recursive types, without an obvious candidate to name the class after. On the other hand, it means that we need to specify the names manually in many cases.

  2. We have to explicitly specify where to look for the existing visitors for others types when requesting a new one by specifying the "ancestors" of a generated traversal class.

In summary, this results in a load of syntactic noise. For example

type kind = PrimaryKind.t  * subkind option [@@deriving show]

becomes

type kind = (PrimaryKind.t [@name "primary_kind"]) * subkind option
    [@@deriving show,
     visitors { name = "map_kind"; variety = "map"; ancestors = ["map_name"; "PrimaryKind.map"; "map_subkind"]},
     visitors { name = "mapreduce_kind"; variety = "mapreduce"; ancestors = ["mapreduce_name"; "PrimaryKind.mapreduce"; "mapreduce_subkind"] },
     visitors { name = "fold_kind"; variety = "fold"; ancestors = ["fold_name"; "PrimaryKind.fold"; "fold_subkind"]}]

We may however find a solution for moving these noisy annotations into dedicated files by using ppx_import. We would use it to duplicate our type definitions into separate files and only add the deriving annotations there.

@jamescheney
Copy link
Contributor

The second problem doesn't seem so bad, if it means replacing hundreds lines of boilerplate with tens of lines of boilerplate.

For the first problem, so how does visitors allow for accumulating traversals? Does it assume that the traversing object maintains mutable state? I don't think that would be much of a stretch, as the current way this is done is basically the Haskell way of making the mutable state explicit, and there is no particular reason to do this in an ML-like language. Alternatively, perhaps it wouldn't be too hard to modify visitors to produce pure fold-like traversals, by transforming from implicitly side-effecting to explicitly state-passing style.

@frank-emrich
Copy link
Contributor

so how does visitors allow for accumulating traversals?

Mutual state is one solution, indeed. Another related solution is a "reduce" visitor, which would yield code similar to this:

| TryInOtherwise (_p1, _pat, _p2, _p3, _ty) ->
          let r1 = o#phrase env _p1 in
          let r2 = o#pattern env _pat in
          let r3 = o#phrase env _p2 in
          let r4 = o#phrase env _p3 in
         o#plus (o#plus (o#plus r1 r2) r3) r4

where plus is a dedicated combination method that one must implement. Note that this does something slightly different, as it doesn't pass the result of the first sub-traversal to the second one.

The generated code always passes an environment around, but again not in the "accumulating" fashion we currently have in SugarTraversals (shown in the code above). Finally, there are the generated fold traversals which traverse the sub-terms of each constructor Foo and then call a virtual build_Foo method using the intermediate results.

In any case, I assume that we could express each of the traversals we currently have in terms of one generated by the visitors library. However, I'm worried that this would be more of a careful re-write rather than quickly applying a series of straightforward modifications to our existing code. Simply because there is no generated drop-in replacement for the exact structure of traversals we currently have.

@fpottier
Copy link

For the first problem, so how does visitors allow for accumulating traversals? Does it assume that the traversing object maintains mutable state?

If you need to accumulate state during a traversal, then either the visitor object or the environment should be made mutable.

So far, I have decided not to generate visitor classes where a state is explicitly threaded, as that would multiply the number of visitor variants, without an obvious gain.

@fpottier
Copy link

In summary, this results in a load of syntactic noise.

I can see how the visitors version is currently very verbose. I don't know offhand how to reduce this noise, but if you have suggestions of simple changes that I could make in visitors to reduce this noise, please do send them.

@jamescheney
Copy link
Contributor

Thanks @fpottier !

I think the above comments aren't intended as criticism of visitors per se, just reflect that it would still involve some work / boilerplate to modify Links to use visitors due to the fact the corresponding Links code was developed independently and thus the Haskell-style accumulation traversals and naming conventions are different (not better or worse).

@jamescheney
Copy link
Contributor

"reduce" visitor

I think the reduce visitor can simulate a state-threading one by accumulating state -> state functions and taking plus to be function composition. Then you take the end result, again a state -> state function, and apply it to the initial state.

@fpottier
Copy link

I think the reduce visitor can simulate a state-threading one

Yes, that should work. It would be a bit inefficient (essentially building a tree of closures whose shape is the same as that of the original tree; therefore doing one tree copy which in principle is not needed).

I have published a blog post on a related use of a reduce visitor to construct a tree iterator.

@jamescheney
Copy link
Contributor

Cool! I think we are already allocating potentially one new object per intermediate state, which is also more inefficient than in-place updates would be (though perhaps most of the "new" states are aliases of the previous one, and maybe the past states can be GCd more eagerly than the tree of closures). I'd rather not have to rewrite a lot of explicit state-passing code to use implicit mutation all at once though, seems like a recipe for introducing hard-to-find bugs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants