Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR node pointers are not consistent? #5131

Open
bevin-hansson opened this issue Feb 18, 2025 · 3 comments
Open

IR node pointers are not consistent? #5131

bevin-hansson opened this issue Feb 18, 2025 · 3 comments
Labels
question This is a topic requesting clarification.

Comments

@bevin-hansson
Copy link

From what I've read in the documentation, it should be viable to use pointers to IR nodes so long as you make sure to get the original node when using nodes directly from a visitor callback.

However, I'm noticing extreme inconsistency in IR behavior that makes it impossible to use pointers to nodes for almost anything. I can understand why many backends simply use names of things as their map keys, because referring to things by node just doesn't work.

If I walk the IR of my P4Program using a visitor and stop on Type_Extern to grab the type of e.g. packet_in, I get one node:

>>> p N
$1 = (const P4::IR::Type_Extern *) 0x7ffff7f0a2a0

If I then later have a b.extract(h.ethernet) in a parser, and use MethodInstance::resolve to get the information from the MethodCallExpression, I get a totally different one from the ExternMethod:

>>> p EM->originalExternType
$2 = (const P4::IR::Type_Extern *) 0x7ffff45e9540

If I instead drill down into the expression of the MCE and check the type of the Member subexpression itself:

>>> p ((P4::IR::Member*)E->method)->expr->type
$9 = (const P4::IR::Type *) 0x7ffff4653000

It's a totally different node again! These nodes are all the same packet_in; I've printed them and checked.

Is it supposed to be like this? At no point do I modify the IR, but it still does this. It's very frustrating to use the IR when it behaves like this.

@fruffy fruffy added the question This is a topic requesting clarification. label Feb 18, 2025
@fruffy
Copy link
Collaborator

fruffy commented Feb 18, 2025

A clarification question, what do you mean with consistent?

In general, the pointers should be consistent for a single IR tree. But it often happens that the entire program is cloned from compiler pass to compiler pass, which means that previous pointers are stale.

I personally do not use MethodInstance::resolve but it is possible this is using a ref or type map which is stale.

@ChrisDodd can likely clarify.

@ChrisDodd
Copy link
Contributor

In general, it is preferable to use names rather than node pointers, since, once you run the UniqueNames pass, all names will be "mangled" (have an _N suffixes added/modified as needed) so as to be globally unique. The only place this does not work is where there is some overloading going on (in which case the overloaded things will all still have the same name, but you'll need to figure out which one is the right one).

This is where MethodInstance::resolve comes in -- it looks up things by name and deals with (resolves) overloading as needed. This can be done either with a ReferenceMap (which must be up to date -- if it is stale, you'll get old pointers from an older snapshot of the IR.) or with a ResolutionContext (which is basically a Visitor::Context -- the "stack" of nodes in the current depth-first traversal being done by a Visitor)

The way modifications to the IR work, it is always done "copy-on-write" -- once a node is "in" the IR, it will never change -- any modification will be done by cloning the node and modifying the copy, and then "relinking" it into its parents, which involves modifying (and thus cloning) all of them, up to the root IR node (usually a P4Program). So while a Modifier or Transform is running, there may be two instances of a node (the "before" and "after") version available -- the ones passed to the preorder/postorder methods are always the "after" (they're clones, that why they are not const). Which means if you call MethodInstance::resolve on something and it leads to some node that has not yet been visited, you'll get the "old" version, but if it has been visited and modified (which is likely just due to some child/descendent being modified) you'll get the new one which is a distinct object object (though it may appear identical). So doing any kind of name/object lookup in a Modifier/Transform pass is tricky.

This is why many passes are a two-step process -- first an Inspector that does name lookups (often with MethodInstance::resolve) and makes decisions on what to change (but doesn't actually change anything yet) and stores those decisions in some auxiliary data structure (possibly indexed by Node *), and then a Modifier/Transform pass (using getOriginal to get the original Node * to do the lookup). But again, here getOriginal() can only be used to get the original of a node currently being visited, and not the original of any child or sibliing (though you can use findOrigCtxt() to get original parent/ancestor nodes.)

Perhaps there should be a version of MethodInstant::resolve with a ResolutionContext that looks as the original pointers in the context rather than the node (new, post clone/transform) pointers.

@bevin-hansson
Copy link
Author

A clarification question, what do you mean with consistent?

I would expect that nodes representing specific things (types, functions, parsers, controls, packages, etc) remain the same if the IR is not modified through the use of a Transform or similar.

In general, the pointers should be consistent for a single IR tree. But it often happens that the entire program is cloned from compiler pass to compiler pass, which means that previous pointers are stale.

Yes, I'm aware that the tree can change when it's modified by a pass, and that makes sense. But this behavior happens even when no modifications have been made, e.g. just walking the IR with an Inspector. Nodes which should refer to the same "abstract object" in the program do not match up.

The way modifications to the IR work, it is always done "copy-on-write" -- once a node is "in" the IR, it will never change -- any modification will be done by cloning the node and modifying the copy, and then "relinking" it into its parents, which involves modifying (and thus cloning) all of them, up to the root IR node (usually a P4Program). So while a Modifier or Transform is running, there may be two instances of a node (the "before" and "after") version available -- the ones passed to the preorder/postorder methods are always the "after" (they're clones, that why they are not const).

This matches my understanding as well. But I'm using an Inspector. The behavior in my post occurs even when no modifications are done. Unless resolve clones parts of the tree for no reason when dredging up its information?

This is why many passes are a two-step process -- first an Inspector that does name lookups (often with MethodInstance::resolve) and makes decisions on what to change (but doesn't actually change anything yet) and stores those decisions in some auxiliary data structure (possibly indexed by Node *),

Yes, this is what I'm doing. But the nodes do not match what I've looked up previously.

I do have multiple "layers" of Inspectors. Does an Inspector clone the whole tree when it starts visiting?


I can't really share a significant amount of code, but I can at least show a small example. On the top level, I have something like this:

bool Backend::preorder(const IR::Type_Extern *N) {
    N = getOriginal<IR::Type_Extern>();
    if (N->name.name == "packet_in") {
        PacketInType = N;
        ExternMap[N] = new PacketInEmitter(*this);
    } else if (N->name.name == "packet_out") {
        PacketOutType = N;
        ExternMap[N] = new PacketOutEmitter(*this);
    }
    return false;
}

Later, when encountering MethodCallExpressions (not in the same Inspector, though, but two levels deeper):

P4::MethodInstance *MI = P4::MethodInstance::resolve(MCE, refMap, typeMap, true, Ctx, false);

...

if (auto *EM = MI->to<P4::ExternMethod>()) {
    ExternEmitter *EE = ExternMap[EM->originalExternType];
    return EE->EmitMethod(EM, this);
}

This fails as the map does not contain the extern type node that was looked up earlier. The IR has not been modified between these two pieces of code.

It's very odd, because I do something similar to this for P4Parser and P4Control, and that works fine. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question This is a topic requesting clarification.
Projects
None yet
Development

No branches or pull requests

3 participants