diff --git a/docs/booster.md b/docs/booster.md new file mode 100644 index 000000000..60df440f5 --- /dev/null +++ b/docs/booster.md @@ -0,0 +1,124 @@ +# Booster backend + +The booster backend is a re-write of the symbolic rewrite engine for K, also written in Haskell. The aims of this project are to build a symbolic execution engine which is faster and more easy to understand and debug than the current backend. To accomplish these goals, several trade-offs and restrictions were introduced into the design og the booster and in it's current state, the booster is not (yet) able to execute all the programs the old backend can. As a result, the booster backend falls back to the original backend, when it cannot proceed. This document serves as a high level overview of the architecture and the procedures inside the booster and should help the writer of a semantics in K, optimise their rules to take full advantage of the booster and the speedup and better explainanability it offers. + + +## RPC server architecture + +The booster backend has been built from the ground up to be modular and tries to keep clear distinctions between various phases of symbolic execution (see ?). Unlike the old backend, which was originally built as a full CLI reachability prover, the booster backend is instead a JSON RPC server, which only exposes the basic building blocks of a symbolic execution engine via its API and leaves higher level functionality to the client. What this means is that the booster server is (mostly) stateless and only contains three main operations, namely: + * execute + * simplify + * check implication / subsumption + +The client can then build a prover based on this API by chosing how to store and explore the state-space, generated by calling the execute endpoint. This is useful, because different tools based on the same underlying re-writing engine might have different exploration strategies when execution branches, or might be interested in extrtracting different information from the state-space exploration. The small API allows the booster backend to focus on the core principles of symbolic execuition, keeping the codebase much leaner and (hopefully) more understandable than the previous backend. + + + +### Execute + +Once the server is started with the rules for a given semanics, the execute end-point expects a configuration composed of a `Term`, reperesenting the state of execution in a given semantics, along with boolean `Predicate`s/constraints imposed upon any symbolic parts of the state. Given this state, the server will apply semantic rules to the configuration, until either: + +* no more rules apply +* a user specified break point is reached, based on a specific semantic rule +* the booster cannot decide if a rule should apply + +We will go into much more detail on how the execute endoint works and which situations it fails in. For a general overview of what happens when an execute request is received, see this diagram: + + +``` + Receive execute request ──────────────────────────┐ + │ + │ │ + ▼ ▼ + + Internalise KoreJSON into pattern P Unsupported KoreJSON pattern + + │ │ + ▼ ▼ + + P /= _|_ Return an error + + │ + │ ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ +┌────────────────────────────────────────┐ │ │ │ +│ ▼ ▼ ▼ │ +│ │ +│ ┌──────────────────────────── Apply rule ◄───────────────────────────────────────────────────────────────────────────────────────────┐ │ +│ │ │ │ +│ │ │ │ │ +│ │ └─────────────┐ │ │ +│ ▼ ▼ │ │ +│ │ │ +│ Rewrite aborted ┌──────────────────── Rewrite finished ─────────────────────────┐ │ │ +│ │ │ │ │ +│ │ │ │ │ │ │ +│ │ │ │ │ │ │ +│ ▼ ▼ ▼ ▼ │ │ +│ │ │ +│ Return aborted No rules apply Rewrite to P' ───┐ Rewrite to PS ─────────────────┬───────┐ │ │ +│ │ │ │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ │ │ └──────────┐ │ │ │ │ +│ │ ▼ ▼ ▼ ▼ │ ▼ │ │ +│ │ │ │ │ +│ │ P' == _|_ P' /= _|_ /\ PS == _|_ PS simplify to │ PS simplify to ──┘ │ +│ │ [] │ single P' │ +│ ┌──────────────┼─────────────┐ │ │ │ │ │ │ +│ │ │ │ │ │ │ │ │ │ │ +│ ▼ ▼ ▼ │ │ │ │ │ └───────┐ │ +│ │ │ │ │ │ ▼ │ +│ Does not Simplified Simplifies │ ┌────────┼─┼───────────────────┘ │ │ +│ simplify already │ │ │ │ │ PS simplify to │ +│ │ │ │ │ │ PS' │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ │ ▼ ▼ │ │ │ │ │ +│ │ │ │ │ └─────────────────┐ │ ▼ │ +│ │ │ │ Return vacuous P │ │ │ │ +│ │ │ │ │ │ │ Return branching │ +│ │ │ │ │ │ │ │ +│ └───────┐ │ │ ▼ ▼ │ │ +│ ▼ ▼ │ │ │ +│ │ Depth/rule bound Unbounded ───────────────┼─────────────────────────────────┘ +│ Return stuck P │ │ +│ │ │ │ +│ ▲ │ │ │ +│ │ │ │ │ +└────────────────────────────┼──────────────┘ ▼ │ + │ │ + │ Return simplified P' │ + │ │ + │ │ + └────────────────────────────────────────────────────────────────────────────────────┘ +``` + +Below, we give details of some of the steps/states in this diagram: + +### Internalise KoreJSON into pattern `P` + + +When the execute endpoint receives a KoreJSON term (which is a JSON representation of a KORE term; see ?), it first tries to internalise it into a `Term` and a set of `Predicate`s. There are a few things to note at this step. Unlike the original backend, which is (roughly) an implementation of matching logic that KORE is based on (but does not stricly adhere to), the booster backend take a simpler and more pragmatic approach to representing program configuations. Namely, the booster makes a strict distinction between the configuration state (i.e. data) and the boolean predicates/conditions over this state. As a result, the state is represented by the following simplified subset of KORE, refered to as `Term`s previously: + +```haskell +data Term = + SymbolApplication Symbol [Sort] [Term] + | DomainValue Sort String + | Var Variable + | Injection Sort Sort Term +``` + +_Note: In the actual codebase, there are some further internal data structures for more efficient handling of collections (maps, lists and sets). However, these can mostly be thought of as simply sugar notation for `SymbolApplication`s._ + +Boolean predicates of the configuration have the type `Predicate` and are simply `Term`s of sort `Bool`. When internalising, we therefore go from a much richer/less constrained KORE language, containing various ML connectives such as `#Equals`, `#Ceil`, `#Exists`, etc. to the simplified `Term` langage described above. As a result, not every KORE term is a "correct" configuration and there are times when the old backend produces KORE terms the booster is currently unable to process. This is usually the case with predicates, which are usually encoded in KORE as + +``` +{ true #Equal 1 >Int 0 } +``` + +whereas we simply store them as `1 >Int 0` in the booster (see ? for details of internalising predicates in the booster). + +_Note: The old backend works with predicates at the ML level and may return something which is a valid predicate, but the booster doesn't currently recognise. If this happens, the booster will emit a warning and treat the given predicate as an opaque term it isn't able to manipulate. This might cause issues later, if said predicate is needed to simplify some part of the configuration or prune a branch._ + +### `P /= _|_` + +Currently, we assume that every configuration `P` received by the server is defined, i.e. all partial functions inside the configuration are defined for the given inputs and it is therefore not `#Bottom`/`_|_`. It is currently up to the client to ensure that this condition holds and can be checkced by sending a simplify request of `#Ceil(P)` to the old backend, which should return `#Top`, if `P` is indeed defined. Definedness is a very important invariant for the booster, which is carefully preserved throughout execution. This restriction of always preserving definedness is often the reason why the booster aborts. This happens when the booster cannot be sure if a rule/simplification preserves definedness and is therefore unable to proceed in applying said rule/ simplification, intead falling back to the old backend, which has much more powerful but slower mechanisms to deal with partiality. +