diff --git a/docs/offline.md b/docs/offline.md new file mode 100644 index 00000000..00da5f55 --- /dev/null +++ b/docs/offline.md @@ -0,0 +1,86 @@ +# Offline partial evaluation + +## Motivation + +The offline partial evaluation is motivated by improving the experience +for users seeking to integrate ASLp within other tools. +With the original online method, +only one opcode could be processed at a time, and +each new opcode would require a complete traversal and simplification +of the ASL specification. +This is obviously inefficient +and necessitates a tight coupling between ASLp +and programs looking to use it. + +## Introduction + +Offline partial evaluation aims to improve this +by performing much of the partial evaluation _ahead-of-time_ +instead of once per-instruction. +In the offline method, +partial evaluation operates with only the knowledge +that the opcode is constant, +but without knowledge of the _value_ of the opcode. + +Further, +we desire a separation of the "decoding" phase +(where opcode bits are examined to determine _which_ instruction is represented), +and the "execution" phase (where opcodes are executed and their actions performed). +This delineation is implemented by hand-written lifters, +where the language of the semantics IR is separate from the language used to implement the lifter itself. + +ASL itself has no such separation, but we can compute it with a _binding-time analysis_. +In this analysis, +constants and the instruction opcode are marked as *lift-time*, then the analysis +traverses the AST in this way: +- if an expression uses only lift-time values, it is also marked as *lift-time*, otherwise +- the expression is marked as *run-time*. +Lift-time values are simply emitted into the lifter's language, and +will not be visible within the final semantics. +Run-time-marked values are translated to a `gen_` prefixed function, +indicating that this should be emitted into the semantics and deferred until run-time. +This gives us an AST for a program which takes an opcode and then constructs a residual program +representing the semantics of the instruction. + +In particular, this representation +within an AST +enables efficient translation of the lifter to arbitrary lifter languages and semantics languages. +This is desirable, entirely eliminating the need to (de)serialise +the semantics and communicate across language boundaries. + +## Overview + +The entry-point for the offline transformation process is the `run` function, +near the end of [symbolic_lifter.ml](/libASL/symbolic_lifter.ml). +Here, the process is divided into stages: +- **Stage 1** - mock decoder & instruction encoding definitions: + Converts the ASL decoder tree into separate functions making up a proper ASL program. + Makes use of [decoder_program.ml](/libASL/decoder_program.ml). +- **Stage 2** - call graph construction: + Identifies reachable functions, using [call_graph.ml](/libASL/call_graph.ml). +- **Stage 3** - simplification: + Minor cleanup of temporary dynamic-length bit-vectors and unsupported structures. +- **Stage 4** - specialisation: + Performs requirement analysis to ensure bit-vector lengths + and loop iterations are statically known. + Inserts splits to branch on conditions determining these quantities. [req_analysis.ml](/libASL/req_analysis.ml). +- **Stage 5** - disassembly: + Invokes the online partial evaluation to reduce and inline structures, then + simplifies with BDDs (an opcode-sensitive value analysis to, e.g., eliminate always-true assertions), +- **Stage 6** - cleanup: + Remove unused variables within each instruction function. +- **Stage 7** - offline transform: + Performs binding-time analysis and transformations, + then BDD-based copy-propagation transform and one final cleanup. [offline_opt.ml](/libASL/offline_opt.ml). + +After the lifter is generated, it is passed to a "backend" +which performs the (mostly syntactic) translation +to a targeted lifter language. + +## Backends + +TODO: mechanism of a backend and how to implement one. + +- given a set of instruction functions +- instruction-building interface +- required: mutable variables, bitvector, bool, and int representations diff --git a/docs/online.md b/docs/online.md new file mode 100644 index 00000000..e03c401d --- /dev/null +++ b/docs/online.md @@ -0,0 +1,157 @@ +# Online partial evaluation + +## Background + +Online partial evaluation traverses the specification, +closely mirroring concrete evaluation, +and specialises expressions as it encounters them. +It does this "online" by recording, +for each variable, +whether that variable is a concrete, symbolic, or unknown value +(in order of simplification potential). + +- Concrete values are simple literals which can almost always be immediately evaluated + in expressions. +- Pure symbolic expressions are pure transformations of a combination + of concrete and symbolic/unknown values. + For example, this includes "(x + 1) - 1" and this can be used to simplify + using algebraic properties of integers and bitvectors. +- The last type, unknown, is any computation which is opaque at partial-evaluation time. + Most often, this is register and memory accesses. + When encountered in an expression, this will emit code to perform the computation and store + it into a read-only variable, transforming it into a pure symbolic expression. + +## Implementation + +Within ASLp, the partial evaluation features are implemented by three files +with different responsibilities. +These are: symbolic.ml, dis.ml, and transforms.ml. +This corresponds to the order from lower-level concepts (expressions) +to higher-level ones (statements and programs). + +### symbolic.ml + +In the implementation, this begins at the [symbolic.ml](/libASL/symbolic.ml) +file which decribes expression-level symbolic operations. +Here, the type of `sym` is defined simply as either +a concrete value or a pure symbolic expression +(the concept of "unknown" values lives outside this file). + +Moreover, this file implements symbolic analogues +of existing primitive operations on concrete values +([value.ml](/libASL/value.ml) and [primops.ml](/libASL/primops.ml])). +Essentially, these functions perform a concrete evaluation +if all (or sufficiently many) operands are concretely known, and +return a symbolic expression otherwise. +The expression may be simplified through the +expression-level simplification whenever +its structure allows. +Some notable instances are: +- identities of bitwise "and" and "or" and integer "mul", "add", and "sub", +- slices of slices, +- (sign/zero) extension of extensions. + +### dis.ml + +[dis.ml](/libASL/dis.ml) performs a recursive traversal of the ASL AST, +calling functions from symbolic.ml as needed. +This file is responsible for inter-statement and inter-procedural effects, +including the collection of the residual program. + +The structure in this file is designed to mirror that of the interpreter's +evaluation methods ([eval.ml](/libASL/eval.ml)). +For instance, there are corresponding functions for expressions: +```ocaml +val dis_expr : AST.l -> AST.expr -> sym rws +val eval_expr : AST.l -> Env.t -> AST.expr -> value +``` +Here, the _'a rws_ return value is a computation inside the reader-writer-state monad (discussed below). + +The entry-point to the entire partial evaluator is the _dis\_core_ method. +This invokes the appropriate "dis" functions then performs a number of post-processing +transformations, described next. + +#### LocalEnv / DisEnv + +The LocalEnv structure stores and manipulates location-sensitive information during the +partial evaluation. +Most importantly, this includes the stack +mappings of variables to their concrete value or pure symbolic expression. +To facilitate easier debugging, this also manually records a stack of the +AST nodes currently being evaluated +(the ordinary stack trace is not useful in the presence of the monad abstraction). + +The DisEnv contains the _'a rws_ type — a reader-writer-state monad: +```ocaml +type 'a rws = Eval.Env.t -> LocalEnv.t -> ('a, LocalEnv.t, stmt list) +``` +This implements an abstraction for: +- a read-only view of globals and constants ("eval env"), +- a mutable LocalEnv state ("local env"), and +- a write-only list of statements making up the residual program. + +This mechanism is invoked by the `let@` and `let+` syntaxes +which composes RWS computations in a sensible way. +As an example, with many type annotations, consider +```ocaml +val int_is_zero : int rws -> bool rws = + fun (x : int rws) -> + let@ y : int = x in + DisEnv.pure (y = 0) +``` +Essentially, the `let@` notation has the function of "unwrapping" a rws computation +and placing it within another rws computation. +`let+` can be used where the final result (at the bottom of the let expression) +is a `DisEnv.pure` computation. +By using `let+`, the `DisEnv.pure` can be omitted. +Other functions exist to manipulate particular rws computations in particular ways +(e.g. a list of rws can become a rws of list). + +The DisEnv struct also contains methods to retrieve variables +(searching the local bindings and then the global bindings), +all returning rws computations. +To emit a statement into the residual program, it provides a `write` function. + +### transforms.ml + +[transforms.ml](/libASL/transforms.ml) implements transforms which are executed after the +main partial evaluation phase. +Some of these are conventional compiler transformations +(e.g. removing unused variables, common subexpression factoring, and copy-propagation). +The majority, however, are designed to address deficiencies +(verbosity or unsupported constructs) in the residual program. +We list a few transformations which are of particular interest. + +- StatefulIntToBits / IntToBits: + This transformation converts expressions involving arbitrary-precision integers + into expressions of fixed-length bitvectors + by way of an interval analysis. + + The StatefulIntToBits modules implements this through abstract interpretation + with a domain of $\text{Width}\times \text{IsSigned} \times \mathbb Z^2$, + recording bit-width, signedness, and lower and upper bounds. + +- RedundantSlice: Eliminates slice operations which do not reduce the bit-width. + Crucially, this is also responsible for converting slices with non-static indices + into shift/truncate operations. +- bits_coerce_narrow: Pushes slices into sub-expressions where possible. +- CaseSimp: Removes unreachable assertions placed after exhaustive case checks. +- RemoveRegisters: Replaces expressions of ASL's register type with ordinary bit-vectors. +- CopyProp +- CommonSubExprElim +- RemoveUnused + +## Grammar + +The ASL grammar is defined in [asl.ott](/libASL/asl.ott). +The ott program translates this into a Menhir grammar +which then generates a parser and an associated Ocaml AST. +The generated code modules include Asl_ast, Asl_parser, and Asl_parser_pp. + + +For printing AST nodes and other structures, functions beginning with `pp_` are available +in [asl_utils.ml](/libASL/asl_utils.ml) and [utils.ml](/libASL/utils.ml). +These will produce an indented human-readable string. +To emit AST nodes in the structured "aslt" format, +use `Utils.to_string (Asl_parser_pp.pp_raw_stmt stmt)` or similar. + diff --git a/docs/readme.md b/docs/readme.md new file mode 100644 index 00000000..2a051c1c --- /dev/null +++ b/docs/readme.md @@ -0,0 +1,14 @@ +# ASLp documentation + +This folder will provide documentation for ASLp, +primarily of interest to developers of ASLp and +people seeking to integrate ASLp. + +ASLp is a partial evaluator for the ARM specification language. +The most established functionality is in the [online partial evaluation](./online.md). +The new [offline partial evaluation](./offline.md), building on the online features, +is available for easier integration with projects. + +The project is validated by a number of [tests](./testing.md) which are run +on pushes and merge requests to the repository. + diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 00000000..4dcdc515 --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,7 @@ +# Testing infrastructure + +test files are stored within the [tests/](/tests) directory. + +- unit testing (very limited) +- differential tests within coverage folder (bespoke infrastructure to implement pipeline, then dune diff) +- cram expect tests ending in .t (golden tests, also dune based) diff --git a/docs/vectoriser.md b/docs/vectoriser.md new file mode 100644 index 00000000..11af4184 --- /dev/null +++ b/docs/vectoriser.md @@ -0,0 +1,3 @@ +# vectoriser + +todo