[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

spernsteiner · 2024-12-17T01:05:06Z

This is a WIP refactor of mir_op. I don't have time to finish it at the moment, but I'm posting this PR and including some notes here so it doesn't get lost. Currently it works on trivial examples like offset1, but fails on more interesting ones like algo_md5. It mostly seems to be failing while trying to produce nonsensical casts, but there are also a lot of unimplemented Callee cases in the new mir_op that will surely cause other problems later on.

This branch refactors the MIR rewrite generation pass (rewrite::expr::mir_op) to separate LTy/TypeDesc handling from the actual generation of casts and other rewrites. It divides mir_op into three separate passes: the first collects type and other metadata for each MIR node, the second determines which casts are needed to produce a well-typed program after type rewriting, and the third inserts the casts and any other necessary rewrites.

These passes work on a representation called SublocInfo. A "subloc" or "node" is a piece of MIR at finer granularity than a Location. For example, given the statement _2 = Use(move _1), a SubLoc path can refer to the whole statement, the destination place _2, the rvalue Use(move_1), the operand move _1, or the place _1. Each of these can have its own SublocInfo that describes its type and other information about the surrounding context or how it can be used.

The three new passes in more detail:

SublocInfo collection: This pass computes the "new type" of each node, which is the type it would have after the types of all defs and locals are rewritten to match their LTys. This can produce inconsistent results, such as giving the LHS and RHS of an assignment different types. This pass also records other metadata, such as the access mode (imm or mut) for Places.
SublocInfo typechecking: This pass checks for inconsistencies and computes the "expected type" of each node, which is the type it should have in order to make it usable in the surrounding context. By default, the node's expected type is identical to its new type, but it may be changed to resolve a type error. For example, in an inconsistent assignment (where the LHS and RHS have different new types), the expected type of the RHS will be set to match the new (and expected) type of the LHS. There are also some cases, mostly around special functions like offset, where this pass will adjust a node's new type instead of its expected type.
Rewrite generation: This pass adds casts around any node whose expected type doesn't match its new type, and also adds rewrites for special functions like offset. This is similar to the behavior of the existing mir_op pass, but it's driven entirely by SublocInfo entries, rather than directly consulting LTys.

Advantages of the new design:

Easier debugging: In case of a bad rewrite, we'll be able to inspect the SublocInfos to determine whether it's an issue with the rewrite itself or with SublocInfo generation.
Better targetability: This approach should make it easier to suppress rewrites in parts of the code where they're not wanted. Specifically, if a group of nodes have their new types set to match their old, unrewritten types, then there will be no inconsistencies detected in the typechecking pass, the expected types will all be set to match the new types, and no casts will be inserted.
Decoupling from analysis: Only the SublocInfo collection phase interacts directly with analysis results (LTys). This means we could implement an alternate version of that pass with a different strategy for determining new types, while reusing all the rest of the rewriting machinery.

Limitations:

Handling of each special function is now spread across the three passes. The three pieces for each function are tightly coupled (in many cases there are comments along the lines of "the rewriting pass will do X, so here in collection/typechecking we can do Y"). Probably this can be refactored to put the collection, typechecking, and rewriting logic for each function in one place and having the passes dispatch to the appropriate code for each Callee they encounter.
A similar issue applies to non-function MIR constructs. This is somewhat inherent to the design, as we need the two SublocInfo passes to only request casts that the rewriting pass can handle.
To further improve targetability, we should be more explicit about which calls to special functions should be fully rewritten (e.g. converting offset to a subslice operation) and which should be left alone. Currently this is handled in a roundabout way: some of the inputs and/or outputs of the function are marked FIXED in the analysis, so their new types are left as raw pointers, and the rewriting pass knows to skip the normal rewrite if it sees raw pointers there.

…l defs

spernsteiner added 9 commits December 10, 2024 16:40

analyze: rewrite: add SublocInfo collection pass

1cf8bb2

analyze: rewrite: various cleanup in subloc_info

9b331d9

analyze: revert some bad clippy fixes

d5603b6

analyze: subloc_info: add limited pointee support

60bc61b

analyze: subloc_info: factor out LTy to SublocType conversion

cd84e87

analyze: subloc_info: implement pass to collect SublocTypes for globa…

7a61c65

…l defs

analyze: subloc_info: initial implementation of typecheck pass

e3dcd08

analyze: subloc_info: add more callee cases to typecheck pass

0c0d1be

[wip] rework mir_op to use SublocInfo

f0470b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

spernsteiner commented Dec 17, 2024

[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

Are you sure you want to change the base?

[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

Conversation

spernsteiner commented Dec 17, 2024