Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

spernsteiner
Copy link
Collaborator

This is a WIP refactor of mir_op. I don't have time to finish it at the moment, but I'm posting this PR and including some notes here so it doesn't get lost. Currently it works on trivial examples like offset1, but fails on more interesting ones like algo_md5. It mostly seems to be failing while trying to produce nonsensical casts, but there are also a lot of unimplemented Callee cases in the new mir_op that will surely cause other problems later on.


This branch refactors the MIR rewrite generation pass (rewrite::expr::mir_op) to separate LTy/TypeDesc handling from the actual generation of casts and other rewrites. It divides mir_op into three separate passes: the first collects type and other metadata for each MIR node, the second determines which casts are needed to produce a well-typed program after type rewriting, and the third inserts the casts and any other necessary rewrites.

These passes work on a representation called SublocInfo. A "subloc" or "node" is a piece of MIR at finer granularity than a Location. For example, given the statement _2 = Use(move _1), a SubLoc path can refer to the whole statement, the destination place _2, the rvalue Use(move_1), the operand move _1, or the place _1. Each of these can have its own SublocInfo that describes its type and other information about the surrounding context or how it can be used.

The three new passes in more detail:

  • SublocInfo collection: This pass computes the "new type" of each node, which is the type it would have after the types of all defs and locals are rewritten to match their LTys. This can produce inconsistent results, such as giving the LHS and RHS of an assignment different types. This pass also records other metadata, such as the access mode (imm or mut) for Places.
  • SublocInfo typechecking: This pass checks for inconsistencies and computes the "expected type" of each node, which is the type it should have in order to make it usable in the surrounding context. By default, the node's expected type is identical to its new type, but it may be changed to resolve a type error. For example, in an inconsistent assignment (where the LHS and RHS have different new types), the expected type of the RHS will be set to match the new (and expected) type of the LHS. There are also some cases, mostly around special functions like offset, where this pass will adjust a node's new type instead of its expected type.
  • Rewrite generation: This pass adds casts around any node whose expected type doesn't match its new type, and also adds rewrites for special functions like offset. This is similar to the behavior of the existing mir_op pass, but it's driven entirely by SublocInfo entries, rather than directly consulting LTys.

Advantages of the new design:

  • Easier debugging: In case of a bad rewrite, we'll be able to inspect the SublocInfos to determine whether it's an issue with the rewrite itself or with SublocInfo generation.
  • Better targetability: This approach should make it easier to suppress rewrites in parts of the code where they're not wanted. Specifically, if a group of nodes have their new types set to match their old, unrewritten types, then there will be no inconsistencies detected in the typechecking pass, the expected types will all be set to match the new types, and no casts will be inserted.
  • Decoupling from analysis: Only the SublocInfo collection phase interacts directly with analysis results (LTys). This means we could implement an alternate version of that pass with a different strategy for determining new types, while reusing all the rest of the rewriting machinery.

Limitations:

  • Handling of each special function is now spread across the three passes. The three pieces for each function are tightly coupled (in many cases there are comments along the lines of "the rewriting pass will do X, so here in collection/typechecking we can do Y"). Probably this can be refactored to put the collection, typechecking, and rewriting logic for each function in one place and having the passes dispatch to the appropriate code for each Callee they encounter.
  • A similar issue applies to non-function MIR constructs. This is somewhat inherent to the design, as we need the two SublocInfo passes to only request casts that the rewriting pass can handle.
  • To further improve targetability, we should be more explicit about which calls to special functions should be fully rewritten (e.g. converting offset to a subslice operation) and which should be left alone. Currently this is handled in a roundabout way: some of the inputs and/or outputs of the function are marked FIXED in the analysis, so their new types are left as raw pointers, and the rewriting pass knows to skip the normal rewrite if it sees raw pointers there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant