Description
Motivation
When we try to understand why proofs are slow, we only have either very coarse-grained or very find-grained timing data. The coarse-grained data is either wall-clock time of running kontrol
and other tools, or, at the level of kore-rpc-booster
, the --log-stats
option which times the requests. The fine-grained tool is the GHC profiler, which gives us Haskell-level cost-centres.
To better inform our decisions on what to work on to improve performance, we need something that sits in between.
Use cases
- Users of the backend (semantics writers) want to know:
- whether and how often simplification rules apply
- how often
booster
aborts on rewrite rules (falling back tokore-rpc
)
- backend implementors want to know:
- how much time is spent in different phases of
execute
andsimplify
requests (internal and external) - statistics about the terms being rewritten or simplified (probably based on indexes to group them)
- how much time is spent in different phases of
Functionality
We would like to have a proof-level profiling tool that would allow us to find out, for both Booster and Kore:
- how much time is spent in the steps of rewrite rule and equation application, both for all rules and for specific rules/sets of rules:
- matching
- checking the side condition
- how much is spent in calls to the SMT solver
Additionally, for Booster we would like to know:
- how much is spent in calls to LLVM
- simplifier cache utilisation
- Nice to have: statistics about the current configuration, perhaps for certain cells only. To know how big are the terms.
Data source
We have implemented the contextual JSON logging, which, in conjunction with timestamps, should give us all the information we need. We may need to disable the buffering of the timestamps in fast-logger
or, of that is not possible, figure out another to provide timestamps.
Tool workflow and design
The tool's input will be the JSON log file, with every line being a json object. We should design the tool to support streaming the JSONs from the file so that we do not load it all into memory.
The log file will be generated by running the server with context logging enabled. This can be either done directly by the downstream tooling, i.e. kontrol prove --profile
or by a script that consumes bug reports and runs the server with logging enabled.
The output of the tool will likely be a Markdown/HTML document.