[DRAFT] Continuation specialization, aka match-in-match #3501

Gbury · 2025-01-23T16:59:04Z

This PR adds to flambda the capacity to specialize continuations.

For now, it's meant for testing. Once testing is done, code will be cleaned up, and some parts will be split up and submitted as independent PRs for an easier review.

Context and overview

Following the continuation lifting introduced in #2295 , this PR (finally) adds continuation specialization, i.e. given one continuation k with more than one call site, generate on specialized version of k for each call site. The most tricky part of that process is related to continuation lifting in order to ensure that the minimal amount of code is duplicated when specializing one continuation (more on that later).

Consider an expression of the form:

let cont k x =
  let y = untag_int x in
  let cont k' z =
    ..
  in
  switch y with
  | 0 -> k' ...
  | 1 -> k' ...
in
switch .. with
| 0 -> k 0
| 1 -> k 1

The process for continuation specialization is the following.

when simplifying (downwards) the let_cont for k we first simplify the body (as usual), and see/record all the uses of k (as usual)
once the downwards pass on the body has been completed, we start the downwards pass on the handler of k
after some simplifications, we simplify the let_cont for k' and start with the downwards traversal of the body of the let_cont
we reach the switch at the end of the handler of k (i.e. the switch on y). At that point we have all the information that we need to know whether specializing k will eliminate the switch (for the match-in-match optimization), so we can decide whether to specialize or not. Note: the criterion for deciding when to specialize is not implemented in this PR and will be done later, for now we always specialize if the lifting budget allows for it.
let's assume we do decide to specialize k. We then also decide to lift any continuation inside of the handler of k to avoid duplicating those
We have now finished the downwards pass on the body of the let_cont of k', and we now know that we want to specialize k, and thus also lift k', so instead of simplifying the handler of k', we lift it by storing it in the dacc (this is what Continuation lifting #2295 does)
We now reach the down_to_up for the handler of k, we first pop the lifted continuations out of the dacc and craft a new down_to_up function that will simplify them. We can then start the specialization of k
We generate the specialized versions of k by copying the original handler (before simplification) for each callsite, and then doing the downwards pass on each of these "new" continuations.
For now, we always completely specialize (i.e. generate exactly one continuation for each callsite), so we adjust the free names and other metadata to completely erase information coming from the downwards traversal fo the original handler for k (in the future we might consider specializing only some callsites and keeping a "generic" version, but that will be for later)
During the downwards pass on the specialized versions of k, we must replace any bound continuation (whose name will be fresh as the let_cont will be freshly opened by the downwards traversal) by its lifted version, and compute adequate lifted arguments. This is where we use the notion of Replay_history to relate the freshly bound continuation (and the lifted arguments) with the names from the first downwards traversal on k.
We record the mapping from callsites to the name of the new continuations as that will be used later (in the data flow analysis and during rebuilding)
12.During data flow analysis, we rewrite the internal graphs using the mapping created at point 11 (to reflect the term that will be rebuilt)
When rebuilding terms we rewrite the calls of k to refer to the specialized version, using the mapping from point 10.

Replay History

The notion of replay_history is introduced in #3302 (not merged yet) precisely to make it possible to lift a continuation during the first downwards pass on k (the specialized continuation), and then refer to the lifted continuation in the specialized versions of k.

Let's detail a bit more why that is required/necessary. Let's consider the same example as above, when we perform the downwards pass on the specialized versions of k (let's call them k_0 and k_1), what the downwards traversals effectively see/observe is a term that would look like the following:

let cont k x =
  let y = untag_int x in
  let cont k' z =
    ..
  in
  switch y with
  | 0 -> k' ...
  | 1 -> k' ...
in
let cont k_0 x_0 =
  let y_0 = untag_int x_0 in
  let cont k'_0 z_0 =
    ..
  in
  switch y_0 with
  | 0 -> k'_0 ...
  | 1 -> k'_0 ...
in
let cont k_1 x_1 =
  let y_1 = untag_int x_1 in
  let cont k'_1 z_1 =
    ..
  in
  switch y_1 with
  | 0 -> k'_1 ...
  | 1 -> k'_1 ...
in
switch .. with
| 0 -> k_0 0
| 1 -> k_1 1

In particular because of the automatic name abstraction renaming, we have no easy way to relate the lifted continuation k' with its corresponding continuations k'_0 and k'_1, except for the fact that all of those can be identified as the first continuation that is bound in the handler of k. Additionally, when lifting k', we add to it the lifted parameter y' that was in scope at its original binding site and thus must be kept available in its handler: this brings another problem which is that we need to add the adequate argument for this new parameter at every callsite of k', including those in the specialized continuations k_0 and k_1 where its value will be y_0 and y_1, which can only be related to y (the value of the extra parameter y' in the original handler of k) because they are the first variable that is bound in the handler of k (and similarly for x, the first parameter of k).

Therefore, in order to correctly lift continuations out of the handler of specialized continuations, we need to be able to relate the names of parameters, variables and continuations that are bound in the handler of k beetween successive downwards traversal. More precisely, given a parameter/variable/continuation bound in the handler of a specialized version of k, we want to know what was the corresponding name generated during the first downwards pass (the one on the original handler of k).

That's where the notion of Replay_history comes from: basically, it's an ordered list of the successive parameters/variables/continuations that are bound in the handler of a given continuation. By creating and storing that list during the first pass (i.e. the downwards pass on the original handler of k), we can then on successive downwards passes establish a mapping from the names of the current (specialized) pass to the names of the original pass.

The replay history tries to be as "secure" as possible (i.e. prevent against establishing a mapping between two downwards pass on different expressions) by raising errors if the order or variables/continuations bound does not match, and by verifying that the corresponding names are renamed version of one another, which does not protect against any misuse since two unrelated variables/continuations can be seen as renamed version of one another if they have the same names in some situations, but at least offers some measure of safeguard.

Lifted continuations

In the handler of the specialized continuations, we replace the handler of bound continuations by a call to the lifted continuation, and relay on the usual mechanism to correctly compute the adequate arguments for the lifted parameters. This works seamlessly because the lifted_cont_params structure interacts with the replay_history to correctly map the unique key of lifted params (i.e. the name of the variables in the original downwards pass) to their name in the current pass.

Conclusion

General remarks/comments/questions about how everything works are welcome. Detailed remarks about the code are likely to not be of limited use until after testing (particularly once there are no bugs remaining), contrary to remarks about the general design and organisation of the feature, ^^

I'll ping the relevant persons when the PR is ready for reviews, but anyone should feel free to look at the code (keeping in mind that there's a fair bit of code cleanup to do).

Currently, any and all variables in scope, including parameters and extra parameters (coming from CSE and unboxing), are added to the lifted_cont_params. This is superfluous, since the handlers of lifted continuations do not contain any reference to these extra params prior to simplification, and these extra params can be recomputed for the lifted continuations. As it stands, this commit alone may lead to regressions, since we currently do not perform unboxing on the lifted_cont_params. This will be fixed in a later commit.

In some cases (mostly the upcoming continuation specialization for match-in-match), we are interested in doing multiple downwards pass on a single term. In such cases, it might be necessary to relate the names of variables and continuations from the first pass to those from the subsequent passes. The predominant use of that for now should be to correctly handle continuations that have been lifted during the first pass, and whose calls should be rewritten in subsequent passes (including their lifted arguments). This is done by storing the sequence of names generated when opening name abstractions during the first downwards pass. Note that we only need to do so for names present in the term before downwards traversal so we can skip any extra param added by Simplify. On subsequent passes, each new variable and continuation "consumes" an element of that sequence, allowing to establish the correspondance between names. As a safety measure, we check that correspondings names are in the same "renaming equivalence class", though note that this is not a guarantee (for isntance, by default all continuation have no names and therefore will all be in the same "renaming equivalence class"). Additionally, and this is more specific to continuation specialization but generalizable to other future uses, it is necessary to have some kind of replayability of inlining decisions. Indeed each inlining decision will change the sequence of bound names that are opened during the downwards pass, which would currently break the hypothesis of replay histories that exactly the same sequence of binders are opened. In the case of match-in-match this is simple: the handler that we want to specialize ends with a switch, which means that any call inside the handler have been inlined, so the replay history has a boolean to denote that we want to inline everything while replaying the handler downwards pass. For other more complex uses, inlining decisions could be stored alongside the bound variables and continuations, so that they can be replayed, either as is, or within some notion of compatibility. Finally, we also need to make it so that recursive continuations are bound in an order that is stable through renaming, so we change the map of continuation to an Lmap for recursive continuation bindings. In the future, this replay mechanism would also be useful to do widening for recursive continuations (but that's a far away future).

This should have no observable effect, but will simplify the continuation specialization work, particularly when paired with the replay histories feature, so that lifted cont params can be correctly tracked between a first pass and subsequent passes during continuation specialization.

Interestingly, the blind specialization currently done (since we do not yet have a heuristic) actually removes the errors on some tests, which makes the tests fail. We therefore disable specialization on theses tests and we'll see once we have a proper heuristic if we change/promote the tests.

TODO: Revert this before merging the branch/PR This will allow to test the lifting and continuation specialization without having to alter cli arguments of env variables

Gbury added flambda2 Prerequisite for, or part of, flambda2 match-in-match prerequisites, or part of, match-in-match labels Jan 23, 2025

Gbury force-pushed the cont_spec branch 2 times, most recently from 03666f7 to e993d77 Compare February 12, 2025 17:00

Gbury force-pushed the cont_spec branch 2 times, most recently from a55c03d to 6efd2bf Compare February 24, 2025 08:21

Gbury added 7 commits February 27, 2025 13:48

Continuation specialization

8b62801

[for test only] Set non-zero default lifting&spec budget

f98f8f0

TODO: Revert this before merging the branch/PR This will allow to test the lifting and continuation specialization without having to alter cli arguments of env variables

Promote test

fcb9637

Gbury force-pushed the cont_spec branch from 8184959 to fcb9637 Compare February 27, 2025 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Continuation specialization, aka match-in-match #3501

[DRAFT] Continuation specialization, aka match-in-match #3501

Gbury commented Jan 23, 2025

[DRAFT] Continuation specialization, aka match-in-match #3501

Are you sure you want to change the base?

[DRAFT] Continuation specialization, aka match-in-match #3501

Conversation

Gbury commented Jan 23, 2025

Context and overview

Replay History

Lifted continuations

Conclusion