TODO: Add a running example.
DDlog's profiling features are designed to help the programmer identify parts of the program that use the most CPU and memory. DDlog supports two profilers:
-
DDlog's self-profiler runs as a thread inside the DDlog program and generates textual memory and CPU profiles that can be output at any point during the runtime of the program.
-
DDShow is a standalone profiler that ingests performance-related events from DDlog or any other program using Timely Dataflow and produces a visual profile that can be explored in a browser, as well as a textual profile similar to the one generated by the self-profiler.
DDShow offers a richer (and growing) set of features and is the preferred development-time profiler. The self-profiler, on the other hand, is more resource-efficient and can be even be enabled in the production environment to troubleshoot performance issues.
The user can enable one of the two profilers when starting a DDlog program either from the CLI or via the Rust, C, or Java API.
To enable self-profiling, pass the --self-profiler
flag to the
DDlog-generated CLI executable:
./tutorial_ddlog/target/release/tutorial_cli --self-profiler < tutorial.dat
In Rust, use the run_with_config()
function:
TODO
In C, use the ddlog_run_with_config()
function:
TODO
In Java, use the DDlogConfig
class to create a DDlog configuration with self-profiling enabled:
```
import ddlogapi.DDlogAPI;
import ddlogapi.DDlogConfig;
DDlogConfig config = new DDlogConfig(2);
config.setProfilingConfig(DDlogConfig.selfProfiling());
this.api = new DDlogAPI(config, false);
```
When self-profiling is enabled, DDlog supports two profiling commands (also available through Rust, C, and Java APIs):
-
profile cpu on/off;
- enables/disables recording of CPU usage info in addition to memory profiling. CPU profiling is not enabled by default, as it can slow down the program somewhat, especially for large programs that handle many small updates. -
profile;
- returns information about program's CPU and memory usage. CPU usage is expressed as the total amount of time DDlog spent evaluating each operator, assuming CPU profiling was enabled. For example the following CPU profile record:
CPU profile
...
0s005281us ( 112calls) Join: DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc), LabeledNode(.node=child, .scc=childscc) 165
...
indicates that the program spent 5,281
microseconds in 112 activations of the
join operator that joins the prefix of the rule (DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc)
)
with the LabeledNode(.node=child, .scc=childscc)
literal.
Memory profile reports current (at the time when the profile is being generated) and peak (since the start of the program) number of records in each DDlog arrangement. An arrangement is similar to an indexed representation of a relation in databases. Arrangements are responsible for the majority of memory consumption of a DDlog program. For example, the following memory profile fragment:
Arrangement peak sizes
...
451529 Arrange: LabeledNode{.node=_, .scc=_0} 136
372446 Arrange: LabeledNode{.node=_0, .scc=_} 132
indicates that the program contains two different arrangements of the LabeledNode
relation, indexed by the second and first fields, whose peak size is
451,529 and 372,446 records respectively (the numbered variables, e.g., _0
)
indicate one or more fields used to index the relation by.
To use DDShow, first install it using cargo
:
cargo install --git https://github.com/Kixiron/ddshow
DDShow supports two profiling modes:
-
Live profiling, where the target program streams performance-related events to DDShow via network sockets.
-
Post-mortem profiling, where the target program writes profiling events to disk and DDShow later scans the recording and constructs a program profile.
The CLI executable supports several options that configure DDShow-based
profiling (use --help
for a complete list). The easiest way to enable live profiling is with
the --ddshow
switch, which tells DDlog to start DDShow in the background
and connect to it. Additionally, --profile-differential
enables Differential Dataflow profiling. By default,
DDlog only records Timely Dataflow events, which is sufficient to generate the
CPU profile of the program. --profile-differential
additionally enables
the recording of Differential Dataflow events, used to generate arrangement
size profile:
./tutorial_ddlog/target/release/tutorial_cli --ddshow --profile-differential < tutorial.dat > tutorial.dump
[0s] Waiting for 1 connection on 127.0.0.1:51317: connected to 1 trace source
[0/1, 0s] ⠂ Replaying timely events: 0 events, 0/s
[0/1, 0s] ⡀ Replaying timely events: 13361 events, 47630/s
Alternatively, you can start DDShow manually in a separate shell:
# Shell 1: start DDShow
ddshow --stream-encoding rkyv --differential --connections 2 --disable-timeline
[5s] ⠁ Waiting for 2 connections on 127.0.0.1:51317: connected to 0/2 sockets
# Shell 2: start target program;
./tutorial_ddlog/target/release/tutorial_cli -w 2 --profile-timely --profile-differential < tutorial.dat
Note the DDShow command-line arguments used above:
--stream-encoding rkyv
- use therkyv
format for event traces. This is required when debugging DDlog applications.--differential
- enable reading of the Differential Dataflow trace for arrangement size profiling.--connections
- the number of timely worker threads in the target application. This argument must match the-w
DDlog CLI flag--disable-timeline
- disable event timeline generation, which can be very expensive and slow to render for non-trivial programs.
Use --timely-trace-dir
and --differential-trace-dir
to record performance events for
post-mortem profiling:
./tutorial_ddlog/target/release/tutorial_cli --timely-trace-dir "dd_trace" --differential-trace-dir "dd_trace" < tutorial.dat
In Rust:
TODO
In C:
TODO
In Java: ``` import ddlogapi.DDlogAPI; import ddlogapi.DDlogConfig;
DDlogConfig config = new DDlogConfig(1);
config.setProfilingConfig(
DDlogConfig.timelyProfiling(
DDlogConfig.logToDisk("timely_trace"),
DDlogConfig.logDisabled(),
DDlogConfig.logToDisk("timely_trace"))
);
this.api = new DDlogAPI(config, true);
```
NOTE: DDShow currently requires using the same directory for timely and differential traces.
Run ddshow
to analyze recorded event traces:
ddshow --replay-logs dd_trace/
[0s] Loading Timely replay from dd_trace/: loaded 2 replay files
Press enter to finish loading trace data (this will cause data to not be fully processed)...
[0/1, 5s] Finished replaying timely events: 36717 events, 17073/s
Processing data... done!
Wrote report file to report.txt
Wrote output graph to file:////home/lryzhyk/projects/differential-datalog/test/datalog_tests/dataflow-graph/graph.html
When running in either live or post-portem mode, DDShow generates a text report in report.txt
and an HTML profile in dataflow-graph/graph.html
.