A call graph is a directed graph that represents relationships between
functions of a program. The nodes (vertices) of the graph are functions
defined within the program (or its dependencies). A directed edge from some
node1
to some node2
represents that node1
calls node2
at some point
in the program.
In addition, the call graph generated by MIRAI includes type information
as part of the edges of the graph. Supposing node1
calls node2
with two
arguments, having types t1
and t2
, respectively, there will be two edges
in the call graph between node
and node2
, each edge labeled with one of the
two types.
MIRAI derives the call graph of a program through static analysis (see this documentation for an overview of MIRAI). As such the derived call graph is not guaranteed to be precise; it may not always be possible to determine which functions are called in the presence of indirection. However, benchmarking (see the section below titled "Resolved calls") indicates the generated call graph is highly precise.
After the target program is statically analyzed, MIRAI supports output of the generated call graph in two output formats:
dot
: for use with Graphviz to create a visualization of the call graph.ddlog
: the call graph in the form of Datalog input relations.
Please see their respective sections below for more information.
MIRAI's call graph generation is highly precise; correctly resolving calls to all of the tests in the call graph generator benchmark.
Below we present the results of MIRAI on the benchmark:
Call type | Resolved % |
---|---|
static dispatch | 100% |
dynamic dispatch | 100% |
generic | 100% |
function pointer | 100% |
macro | 100% |
conditionally compiled | 50% |
In all cases but "conditionally compiled", MIRAI is able to resolve all of the calls. For the "conditionally compiled", MIRAI receives a 50% score because conditional compilation is performed by Cargo before MIRAI is invoked (see the Architecture documentation). MIRAI does not have access to the functions that were not included in the conditional compilation. For the calls that are included, MIARI does resolve them.
MIRAI's call graph generator output can be configured with the CLI option
--call_graph_config <path_to_config>
. A valid JSON configuration file must exist
at that path using the following schema:
{
"call_sites_output_path": "path/to/call_sites.json",
"dot_output_path": "path/to/graph.dot",
"reductions": [
{"Slice": "function name"},
"Fold",
"Clean",
"Deduplicate",
],
"included_crates": ["crate_name"],
"datalog_config": {
"ddlog_output_path": "path/to/graph.dat" | "path/to/datalog/",
"type_map_output_path": "path/to/types.json",
"type_relations_path": "path/to/type_relations.json",
"datalog_backend": "DifferentialDatalog" | "Souffle"
},
}
Below, we explain each field in detail:
"call_sites_output_path"
: (Optional) Path where call site output of graph will be saved, if provided. See the section below on "Call site output"."dot_output_path"
: (Optional) Path where dot output of graph will be saved, if provided. See the section below on "Dot output"."reductions"
: Possibly empty list of reductions to apply to the call graph. See the subsection below on "Graph reductions"."included_crates"
: List of crate names to include in the graph, if theFold
reduction is used. Can be empty if theFold
reduction is not being used."datalog_config"
: (Optional) Configuration for Datalog output (see below). If left unspecified, no Datalog output will be generated.
Datalog configuration:
"ddlog_output_path"
: Path where Datalog output of graph will be saved. See the section below on "Datalog output"."type_map_output_path"
: Path where type map output of graph will be saved. See the section below on "Type Map output"."type_relations_path"
: (Optional) Path to file where manually-added type relations will read from, if provided. See the section below on "Type relations""datalog_backend"
: The Datalog output backend to use. Currently Differential Datalog and Soufflé are supported. Note that if Soufflé is used"ddlog_output_path"
should be the path to a directory rather than a file.
A number of graph reductions are supported for the generated call graph. These reductions reduce the size of the graph by excluding nodes and edges that may not be relevant to the user in order to make the graph easier to read and analyze.
Four call graph reductions are supported:
Slice(function_name)
: This reduction returns a sub-graph of nodes that are reachable from the node associated with the given function name. Expected configuration format is:{"Slice": "function name"}
.Fold
: This reduction removes all nodes that do not belong to at least one of the crates specified in"included_crates"
. If there exists a path from an included node to another included node that goes through one or more excluded nodes, that path is preserved through the creation of a new edge connecting the included nodes. Expected configuration format is"Fold"
.Deduplicate
: This reduction reduces the set of edges in the graph such that each there exists at most one edge connecting any two nodes. This has the effect of removing type information from the graph, and makes it more suitable for visualization. Expected configuration format is"Deduplicate"
.Clean
: This reduction removes orphan nodes from the graph. These are nodes that have no incoming or outgoing edges. Expected configuration format is"Clean"
.
Reductions may be specified in any order (and even multiple times), and they are applied sequentially in the order specified.
This is a JSON encoded file providing a list of call sites where each call site is contains
a source location, the caller and the callee. The source location consists of an index into
the vector contents of the files
property, along with a 1-based start line number and start column
of the call expression. The caller and callee values are indices into the vector contents of the
callables
property.
File entries are paths: Relative paths from the crate root directory for files that are compiled by this build. Absolute (but "virtual") paths for files from prebuild libraries, such as the Rust standard library.
Callable entries are fully qualified names of functions, where the first qualifier is the name of the crate in which the callable is defined. The other qualifiers are module names and type names. Qualified names are unique.
When a call site is inside a macro invocation (such as println!(..)), the source location is that
of the macro invocation rather than the source file where the macro is defined. This has the consequence
that there may be more than one entry in the call_sites
property with the same source location.
The call graph generator supports dot output of the call graph. This output format is intended for use with Graphviz, which ingests the dot file format to produce a visualization of the graph in multiple formats.
In this form nodes are labeled with their corresponding function name while edges
are not labeled. It is recommended to use the Deduplicate
reduction to remove
extra edges between nodes.
Here is an example of the dot output:
digraph {
0 [ label = "\"static[8787]::main\"" ]
1 [ label = "\"static[8787]::fn1\"" ]
2 [ label = "\"static[8787]::fn2\"" ]
3 [ label = "\"static[8787]::fn3\"" ]
0 -> 1 [ ]
1 -> 2 [ ]
2 -> 3 [ ]
}
Supposing this is stored in a file called graph.dot
, this can be
used to produce a PDF of the graph using the Graphviz dot
tool:
$ dot -Tpdf graph.dot -o graph.pdf
The call graph generator also supports Datalog output of the call graph. Datalog is a logic programming language that can be used, in conjunction with the generated call graph, to write an analysis that operates on the call graph.
Here is an example of the Differential Datalog output for the prior graph:
start;
insert Edge(0,0,1);
insert Edge(1,1,2);
insert Edge(2,2,3);
insert EdgeType(0,0);
insert EdgeType(1,0);
insert EdgeType(2,0);
commit;
The call graph generator generates two types of relations:
- Graph relations: Specify properties about the graph itself.
- Type relations: Specify properties about the types of the edges of the graph.
These relations are explained in detail below.
Note: To preserve type information it is recommended to not use the
Deduplicate
reduction.
There are three graph relations output by MIRAI:
Edge(id,n1,n2)
: This specifies that there is a directed (call) edge fromn1
ton2
. The edge has the identifierid
.EdgeType(id,t)
: This specifies that the edge with the identifierid
has typet
.Dom(n1,n2)
: This specifies that call to noden1
dominates the call to noden2
in the call graph of the function body wheren1
andn2
are called; the call ton1
occurs beforen2
on all paths through the function body.
Note that all identifiers of these relations, e.g., id
, n1
, t
, are output
as u32
indexes. Node indexes can be traced back to their associated function name
by looking at the dot file output. Type indexes can be traced back to an associated
type string but looking at the output Type Map.
The output Datalog relations use indexes (unique identifiers) to refer to nodes of the graph and types of edges. As explained above, node identifers can be traced back to their function name by looking at the dot output of the call graph generator.
Type indexes can be traced back to their associated type string by looking at the output Type Map. Here is an example type map:
{
"0": "u32",
"1": "u8",
}
This map encodes that if relation is found that refers to the type 0
, e.g.,
EdgeType(1,0)
, that relation is referring to the (Rust) type u32
.
There are two type relations supported by MIRAI:
EqType(t1,t2)
: This encodes that typest1
andt2
are equivalent for the purpose of analysis.Member(t1,t2
): This encodes that typet2
is a "member" of typet1
. This is more general than subtyping. For example, we might say thatT
is a "member" ofVec<T>
.
Type relations are determined in two ways. First, the call graph generator includes
heuristics for deriving type relations for some common patterns. For example,
Member(Vec<T>, T)
is always derived when a type Vec<T>
is encountered (as long
as type T
by itself is used elsewhere in the program).
Second, type relations may be manually input via the path specified in the config
file: type_relations_path
. This should be a path to a JSON file with the following schema:
{
"relations": [
{
"kind": "Member" | "Eq",
"type1": "T1",
"type2": "T2"
},
...
]
}
A list of relations is provided. For each relation, its kind ("Member" or "Eq") is specified and the two types of the relation are specified. Concrete types (non-generic) must be used for the generated type relations to be included because MIRAI produces concrete type information.
Below, we show a simple Datalog analysis that finds all nodes reachable from a given node. In this example we use the Differential Datalog implementation of Datalog.
The Datalog analysis saved in reachable.dl
:
input relation Edge(id: u32, node1: u32, node2: u32)
input relation EdgeType(id: u32, type_id: u32)
output relation Reachable(node1: u32, node2: u32)
Reachable(node1, node2) :- Edge(_, node1, node2).
Reachable(node1, node3) :- Edge(_, node1, node2), Reachable(node2, node3).
The call graph Datalog output generated by MIRAI (in graph.dat
):
start;
insert Edge(0,0,1);
insert Edge(1,1,2);
insert Edge(2,2,3);
insert EdgeType(0,0);
insert EdgeType(1,0);
insert EdgeType(2,0);
commit;
dump Reachable;
To run the analysis with Differential Datalog:
ddlog -i reachable.dl &&
(cd base_ddlog && cargo build --release) &&
./reachable_ddlog/target/release/reachable_cli < ../graph.dat
Finally, the output:
Reachable{.node1 = 0, .node2 = 1}
Reachable{.node1 = 0, .node2 = 2}
Reachable{.node1 = 0, .node2 = 3}
Reachable{.node1 = 1, .node2 = 2}
Reachable{.node1 = 1, .node2 = 3}
Reachable{.node1 = 2, .node2 = 3}
As we can see, there is an output relation for every pair of nodes n1
, n2
such that n2
is reachable from n1
.
This example could also be done with Soufflé Datalog with minor changes to the syntax:
.decl Edge(id: number, node1: number, node2: number)
.input Edge(io=file, delimiter=",")
.decl EdgeType(id: number, type_id: number)
.input EdgeType(io=file, delimiter=",")
.decl Reachable(node1: number, node2: number)
Reachable(node1, node2) :- Edge(_, node1, node2).
Reachable(node1, node3) :- Edge(_, node1, node2), Reachable(node2, node3).
.output Reachable(io=file, delimiter=",")
The input relations will also need to be formatted differently; each relation need to be placed into a separate file titled by the relation:
Edge.facts
0,0,1 1,1,2 2,2,3
EdgeType.facts
0,0 1,0 2,0
To run the analysis with Soufflé: souffle reachable.dl
. The output will
be stored into a file called Reachable.csv
and will match the results
we saw before.
An interesting feature of Soufflé is the ability to see proof trees for the
output. To do this, first run Soufflé's explain command:
souffle -t explain reachable
. Then, a particular output relation can be
explained:
Enter command > explain Reachable(1,3)
Edge(2, 2, 3)
-----------(R1)
Edge(1, 1, 2) Reachable(2, 3)
--------------------------(R2)
Reachable(1, 3)
We can see how Reachable(1, 3)
was derived transitively.