-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get types for typevars in each function (like v_7693
)?
#7
Comments
Hi @am009, getting types for local variables is possible but it would require some changes in Retypd. This is actually something that we have considered implementing, but haven't had time to do it yet. It could be done as a post-processing step once the inter-procedural propagation has finished. Once we have types for function arguments and return values, you could propagate internally and get types for local variables. However, we probably don't want to do that for all variables in the program eagerly, but rather allow querying specific procedures (otherwise it would be potentially very costly). If this is something you'd like to contribute, I'd be happy to provide more guidance. Otherwise, we will probably end up doing it ourselves, but I am not sure when. |
Hi @aeflores, I urgently need this feature and I am really happy to contribute to retypd. I have been reading the code these days. The following are my understandings. I think there are two important entries of the algorithm, one is the I guess, after the To solve the infinite nodes problem caused by recursive types, retypd uses a method related to push-down systems. Is this part of knowledge related to papers like "Synchronized Pushdown Systems for Pointer and Data-Flow Analysis"? I wonder how the author came up with the idea and probably I need to learn more to understand the paper further. By the way, because retypd may spend a lot of time on large programs, can we speed it up by using a faster interpreter other than CPython, like PyPy? |
Hi @am009
That's right.
I think you don't have to get into those details to get this working. The basic steps for "solving" a procedure (or a scc with mutually recursive procedures) are the following:
So let's say you want to get a sketch for a set of variables def get_type_of_variables(self,
sketches_map: Dict[DerivedTypeVariable, Sketch],
type_schemes: Dict[DerivedTypeVariable, ConstraintSet],
proc:DerivedTypeVariable,
vars:Set[DerivedTypeVariable]):
constraints = self.program.proc_constraints.get(
proc, ConstraintSet()
)
callees = set(networkx.DiGraph(self.program.callgraph).successors(proc))
fresh_var_factory = FreshVarFactory()
constraints |= Solver.instantiate_calls(
callees,
constraints,
sketches_map,
type_schemes,
fresh_var_factory,
)
constraints |= sketches_map[proc].instantiate_sketch(
proc, fresh_var_factory
)
var_sketches = self.infer_shapes(
vars,
self.program.types,
constraints,
)
for var in vars:
primitive_constraints = self._generate_primitive_constraints(
constraints,
frozenset({var}),
self.program.types.internal_types)
var_sketches[var].add_constraints(
primitive_constraints
)
return var_sketches NOTE I haven't actually run the code, it probably won't run as is, but hopefully it gives you an idea of what needs to happen. In your example, you would call it with e.g. This code recomputes a lot of stuff, doing that more efficiently would probably require more extensive changes.
I think we tried running it with PyPy at some point, and it worked... or it was close to working, so yes that is definitely an option. There are also plenty of optimization opportunities in this code (cacheing some of the graphs, better data structures, etc). |
@am009 please note that we require signing a "Contributor License Agreement" https://github.com/GrammaTech/retypd/blob/master/GrammaTech-CLA-retypd.pdf before reviewing and accepting contributions into the official repo (the CLA should be sent to [email protected]). |
Sorry for the late reply. Busy with other random stuff these days. Now I finally settled down and started to learn something. Really grateful for the detailed response, and it works. I created a PR for it: #8 ( I sent a mail with a signed CLA on December 6, 2023.) Actually, I'm very interested in binary type recovery and trying to do some research in this area, so I would like to learn more about the internals of retypd. I still find the retypd paper difficult to read. I guess I need to learn more about pushdown systems. I still would like to know, what kind of pushdown systems is it related to? Do you have any recommended learning resources? (or any getting started introduction or tutorial about push-down systems) There is a reference to "Saturation algorithms for model checking pushdown systems." in the retypd paper. Probably it is related? Thanks in advance. |
Hi @am009 , thanks for the PR, I'll try to review it soon.
Me too! @peteraldous and I were planning to write another document/paper revisiting the algorithm, but we never finished it.
Despite the name, pushdown systems are not directly used in the implementation explicitly. If you are not doing that yet, I'd recommend focusing on the Arxiv version https://arxiv.org/pdf/1603.05495. Most of the implementation details are in the appendices. In very rough terms, we have a type system, defined in Figure 3 of the paper, and we need to compute all the "implications" of our initial set of constraints (all the implied type constraints). We can do that by building a transducer that captures all possible type derivations. Appendix D describes how to build that transducer from an initial set of constraints. The construction of the transducer has several steps, and the pushdown system is basically an intermediate step. Again in very rough terms, Appendix D does the following steps:
In order to go from pushdown system to transducer, we:
This implementation just goes from the constraint set to the graph directly. Saturation and shadowing in the implementation can be found here respectively: |
Hi @am009 , It looks like we have not received your email. Would you mind sending it again? |
|
Finally, I can understand most of the algorithm now, after reading "Saturation Algorithms for Model-checking Pushdown Systems" and "Advanced Topics in Types and Programming Languages" Chapter 10. Thanks for your previous explanations. The code is also very helpful. Now I am trying to reimplement the algorithm in Rust to have a better understanding of the algorithm. |
I found a potential issue related to the algorithm, but it can also be a misunderstanding: Lines 707 to 711 in 8f7f72b
There is an Also, there are two arguments for ![]() ![]() |
I'm trying to write a decompiler using retypd. If I understand those python source code correctly, existing frontends (retypd-ghidra-plugin and gtirb-ddisasm-retypd) only generate function types (arguments and return value) for each function.
I also generated type constrains for local variables and registers. Is it possible to get the types of these variables (these
v_xxx
in the following example contrains)?Thanks in advance. Retypd is really powerful in type recovery. I'm quite grateful that it is open source.
The text was updated successfully, but these errors were encountered: