Skip to content

Latest commit

 

History

History
205 lines (155 loc) · 7.44 KB

profiling.md

File metadata and controls

205 lines (155 loc) · 7.44 KB

Tutorial: Profiling DDlog Programs

TODO: Add a running example.

DDlog's profiling features are designed to help the programmer identify parts of the program that use the most CPU and memory. DDlog supports two profilers:

  • DDlog's self-profiler runs as a thread inside the DDlog program and generates textual memory and CPU profiles that can be output at any point during the runtime of the program.

  • DDShow is a standalone profiler that ingests performance-related events from DDlog or any other program using Timely Dataflow and produces a visual profile that can be explored in a browser, as well as a textual profile similar to the one generated by the self-profiler.

DDShow offers a richer (and growing) set of features and is the preferred development-time profiler. The self-profiler, on the other hand, is more resource-efficient and can be even be enabled in the production environment to troubleshoot performance issues.

The user can enable one of the two profilers when starting a DDlog program either from the CLI or via the Rust, C, or Java API.

Self-profiling

To enable self-profiling, pass the --self-profiler flag to the DDlog-generated CLI executable:

./tutorial_ddlog/target/release/tutorial_cli --self-profiler < tutorial.dat

In Rust, use the run_with_config() function: TODO

In C, use the ddlog_run_with_config() function: TODO

In Java, use the DDlogConfig class to create a DDlog configuration with self-profiling enabled: ``` import ddlogapi.DDlogAPI; import ddlogapi.DDlogConfig;

DDlogConfig config = new DDlogConfig(2);
config.setProfilingConfig(DDlogConfig.selfProfiling());
this.api = new DDlogAPI(config, false);
```

When self-profiling is enabled, DDlog supports two profiling commands (also available through Rust, C, and Java APIs):

  1. profile cpu on/off; - enables/disables recording of CPU usage info in addition to memory profiling. CPU profiling is not enabled by default, as it can slow down the program somewhat, especially for large programs that handle many small updates.

  2. profile; - returns information about program's CPU and memory usage. CPU usage is expressed as the total amount of time DDlog spent evaluating each operator, assuming CPU profiling was enabled. For example the following CPU profile record:

CPU profile
    ...
       0s005281us (        112calls)     Join: DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc), LabeledNode(.node=child, .scc=childscc) 165
    ...

indicates that the program spent 5,281 microseconds in 112 activations of the join operator that joins the prefix of the rule (DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc)) with the LabeledNode(.node=child, .scc=childscc) literal.

Memory profile reports current (at the time when the profile is being generated) and peak (since the start of the program) number of records in each DDlog arrangement. An arrangement is similar to an indexed representation of a relation in databases. Arrangements are responsible for the majority of memory consumption of a DDlog program. For example, the following memory profile fragment:

Arrangement peak sizes
...
451529      Arrange: LabeledNode{.node=_, .scc=_0} 136
372446      Arrange: LabeledNode{.node=_0, .scc=_} 132

indicates that the program contains two different arrangements of the LabeledNode relation, indexed by the second and first fields, whose peak size is 451,529 and 372,446 records respectively (the numbered variables, e.g., _0) indicate one or more fields used to index the relation by.

DDShow-based profiling

To use DDShow, first install it using cargo:

cargo install --git https://github.com/Kixiron/ddshow

DDShow supports two profiling modes:

  1. Live profiling, where the target program streams performance-related events to DDShow via network sockets.

  2. Post-mortem profiling, where the target program writes profiling events to disk and DDShow later scans the recording and constructs a program profile.

Live profiling

The CLI executable supports several options that configure DDShow-based profiling (use --help for a complete list). The easiest way to enable live profiling is with the --ddshow switch, which tells DDlog to start DDShow in the background and connect to it. Additionally, --profile-differential enables Differential Dataflow profiling. By default, DDlog only records Timely Dataflow events, which is sufficient to generate the CPU profile of the program. --profile-differential additionally enables the recording of Differential Dataflow events, used to generate arrangement size profile:

./tutorial_ddlog/target/release/tutorial_cli --ddshow --profile-differential < tutorial.dat > tutorial.dump

[0s] Waiting for 1 connection on 127.0.0.1:51317: connected to 1 trace source
[0/1, 0s] ⠂ Replaying timely events: 0 events, 0/s
[0/1, 0s] ⡀ Replaying timely events: 13361 events, 47630/s

Alternatively, you can start DDShow manually in a separate shell:

# Shell 1: start DDShow
ddshow --stream-encoding rkyv --differential --connections 2 --disable-timeline
[5s] ⠁ Waiting for 2 connections on 127.0.0.1:51317: connected to 0/2 sockets

# Shell 2: start target program;
./tutorial_ddlog/target/release/tutorial_cli -w 2 --profile-timely  --profile-differential  < tutorial.dat

Note the DDShow command-line arguments used above:

  • --stream-encoding rkyv - use the rkyv format for event traces. This is required when debugging DDlog applications.
  • --differential - enable reading of the Differential Dataflow trace for arrangement size profiling.
  • --connections - the number of timely worker threads in the target application. This argument must match the -w DDlog CLI flag
  • --disable-timeline - disable event timeline generation, which can be very expensive and slow to render for non-trivial programs.

Post-mortem profiling

Use --timely-trace-dir and --differential-trace-dir to record performance events for post-mortem profiling:

./tutorial_ddlog/target/release/tutorial_cli --timely-trace-dir "dd_trace" --differential-trace-dir "dd_trace"  < tutorial.dat

In Rust: TODO

In C: TODO

In Java: ``` import ddlogapi.DDlogAPI; import ddlogapi.DDlogConfig;

DDlogConfig config = new DDlogConfig(1);
config.setProfilingConfig(
        DDlogConfig.timelyProfiling(
            DDlogConfig.logToDisk("timely_trace"),
            DDlogConfig.logDisabled(),
            DDlogConfig.logToDisk("timely_trace"))
        );
this.api = new DDlogAPI(config, true);
```

NOTE: DDShow currently requires using the same directory for timely and differential traces.

Run ddshow to analyze recorded event traces:

ddshow --replay-logs dd_trace/

[0s] Loading Timely replay from dd_trace/: loaded 2 replay files
Press enter to finish loading trace data (this will cause data to not be fully processed)...
[0/1, 5s]   Finished replaying timely events: 36717 events, 17073/s
Processing data... done!
Wrote report file to report.txt
Wrote output graph to file:////home/lryzhyk/projects/differential-datalog/test/datalog_tests/dataflow-graph/graph.html

When running in either live or post-portem mode, DDShow generates a text report in report.txt and an HTML profile in dataflow-graph/graph.html.