-
Notifications
You must be signed in to change notification settings - Fork 121
Sail Internals
The Sail abstract syntax tree (AST) is defined by an ott grammar in
sail.ott. This
generates a OCaml (and lem) version of the ast in src/ast.ml
and
src/ast.lem
. Technically the OCaml version of the AST is generated
by Lem, which allows the Sail
OCaml source to seamlessly interoperate with parts written in
Lem. Although we do not make much use of this, it in principle allows
us to implement parts of Sail in Lem, which would enable them to be
verified in Isabelle or HOL.
The Sail AST looks something like:
and 'a exp =
| E_aux of ( 'a exp_aux) * 'a annot
and 'a exp_aux = (* expression *)
| E_block of ( 'a exp) list (* sequential block *)
| E_id of id (* identifier *)
| E_lit of lit (* literal constant *)
| E_cast of typ * ( 'a exp) (* cast *)
| E_app of id * ( 'a exp) list (* function application *)
| E_tuple of ( 'a exp) list (* tuple *)
| E_if of ( 'a exp) * ( 'a exp) * ( 'a exp) (* conditional *)
...
Each constructor is prefixed by a unique code (in this case E
for
expressions), and is additionally wrapped by an aux constructor
which attaches an annotation to each node in the AST, consisting of an
arbitrary type that parameterises the AST, and a location identifying
the position of the AST node in the source code:
type 'a annot = l * 'a
There are various types of locations:
type l =
| Unknown
| Unique of int * l
| Generated of l
| Range of Lexing.position * Lexing.position
| Documented of string * l
Range
defines a range of positions given by the parser, the
Documented
constructor attaches doc-comments to
locations. Generated
is used to signify that code is generated based
on code from some other location. Unique
is used internally to tag
locations with unique integers so they can be referred to
later. Unknown
is used for Sail that has no obvious corresponding
location, although this should be avoided as much possible as it leads
to poor error messages. Ast nodes programatically generated initially
have Unknown
locations, but Ast_util.locate
can be used to
recursively attach Generated
locations for error purposes.
A convention in the Sail source is that a single variable l
is
always a location, usually the closest location.
There are various functions for manipulating the AST defined in
ast_util. These
include constructor functions like mk_exp
for generating untyped
expressions and various functions for destructuring AST nodes. It also
defines various useful Sets and Maps, e.g. sets of identifiers, type
variables etc. Note that type variables in Sail are often internally
referred to as kids, I think this is because type variables are
defined as having a specific kind, i.e. 'n : Int
is a type
variable with kind Int
. (Although kinded_id
is technically a
separate type which is a type variablem * kind pair).
There is a separate AST parse_ast.ml which the parser generates. It is very similar to the main AST except it contains some additional syntactic sugar. The parse ast is desugared by the initial_check file, which performs some basic checks in addition from mapping the parse AST to the full AST.
The main entry point for Sail is the file
sail.ml. Each
backend option e.g. -c
, -lem
, -ocaml
etc, is referred to as a
target, and the set_target
function is used to set the
opt_target
variable, for example
( "-c",
Arg.Tuple [set_target "c"; Arg.Set Initial_check.opt_undefined_gen],
" output a C translated version of the input");
defines the -c
option. Each Sail option is configured via via opt_
variables defined at the top of each relevant file. In this case we
tell Sail that when we generate C, we also want to generated
undefined_X
functions for each type X
. In general such opt_
variables should be set when we start Sail and remain immutable
thereafter.
The first function called by main
is Sail.load_files
. This
function parses all the files passed to Sail, and then concatenates
their ASTs. The preprocessor is then run, which evaluates $directive
statements in Sail, such as
$include <prelude.sail>
Unlike the C preprocessor the Sail pre-processor operates over actual
Sail ASTs rather than strings. This can recursively include other
files into the AST, as well as add/remove parts of the AST with
$ifdef
etc. Directives that are used are preserved in the AST, so
they also function as a useful way to pass auxilliary information to
the various Sail backends.
The initial check mentioned above is then run to desugar the AST, and then the type-checker is run which produces a fully type-checked AST. Type annotations are attached to every node (for which an annotation makes sense) using the aux constructors. The type-checker is discussed in more details later.
After type-checking the Sail scattered definitions are de-scattered into single functions.
All the above is shared by each target and performed by the
load_files
function.
The next step is to apply various target-specific rewrites to the AST before passing it to the backend for each target.
The file rewrites.ml defines a list of rewrites:
let all_rewrites = [
("pat_string_append", Basic_rewriter rewrite_defs_pat_string_append);
("mapping_builtins", Basic_rewriter rewrite_defs_mapping_patterns);
("mono_rewrites", Basic_rewriter mono_rewrites);
("toplevel_nexps", Basic_rewriter rewrite_toplevel_nexps);
("monomorphise", Basic_rewriter monomorphise);
...
and each target specifies a list of rewrites to apply like so:
let rewrites_interpreter = [
("no_effect_check", []);
("realise_mappings", []);
("toplevel_string_append", []);
("pat_string_append", []);
("mapping_builtins", []);
("undefined", [Bool_arg false]);
("vector_concat_assignments", []);
("tuple_assignments", []);
("simple_assignments", [])
]
Once these rewrites have occurred the Sail.target
function is called
which invokes the backend for each target, e.g. for OCaml:
| Some "ocaml" ->
let ocaml_generator_info =
match !opt_ocaml_generators with
| [] -> None
| _ -> Some (Ocaml_backend.orig_types_for_ocaml_generator ast, !opt_ocaml_generators)
in
let out = match !opt_file_out with None -> "out" | Some s -> s in
Ocaml_backend.ocaml_compile out ast ocaml_generator_info
There is also a Sail.prover_regstate
function that allows the
register state to be Set up in a prover-agnostic way for each of the
theorem-prover targets.
The Sail type-checker is contained within src/type_check.ml. This file is long, but is structured as follows:
-
The type checking environment is defined. The functions that operate on the typing environment are contained with a separate
Env
module. The purpose of this module is to hide the internal representation of the typing environment and ensure that the main type-checker code can only interact with it using the functions defined in this module, which can be set up to guarantee any invariant we need. Code outside the type-checker itself can only interact with an even more restricted subset of these functions exported via the mli interface file. -
Helper definitions for subtyping and constraint solving are set up. The code that actually talks to an external SMT solver is contained within a separate file src/constraint.ml, wheras the code here sets up the interface between the typing environment and the solver.
-
Next comes the definitions for unification and instantiation of types. There is some additional code (3.5) to handle subtyping with existential types, which can use unification to instantiate existentially quantified type variables.
-
Sail allows some type-level constructs to appear in term-level variables, but these are then eliminated during type-checking in a process called sizeof-rewriting, after the (somewhat-awkwardly) named sizeof keyword.
-
Finally all the typing rules are given. Sail has a bi-directional type-checking approach. So there are checking rules like
check_exp
,check_pat
, etc., as well as inference rules likeinfer_exp
,infer_pat
, etc. -
Effects added by the previous typing rules are now propagated upwards through the AST
-
Finally we have rules for checking and processing toplevel definitions of functions and datatypes.
The interface by which the rest of Sail can interact with the type-checker and type annotations is strictly controlled by it's mli interface file. As much of the type-checking internals is kept as abstract as possible.
The rewriting framework used by the various rewrites mentioned previously is defined in src/rewriter.ml. It contains various large structs with functions for every single AST node, and allows data to be threaded through each re-write in various ways. Most of the re-writes are defined in src/rewrites.ml, although the re-writer is used for other rewrite like passes such as e.g. constant folding in src/constant_fold.ml which combines the rewriter with the Sail interpreter.
The rewriter also acts as the interface to the Sail monomorphisation code, in src/monomorphise.ml.
The C and SMT backends for Sail use a custom intermediate representation (IR) called Jib (it's a type of Sail). Like the full AST this is defined as an ott grammar in language/jib.ott. The sail "-ir" option allows Sail to be converted into this IR without any further processing.
The Jib related files are contained within a subdirectory
src/jib/
. First we convert Sail to A-normal form (ANF) in
src/jib/anf.ml,
then
src/jib/jib_compile.ml
converts this into Jib.
The Jib representation has the advantage of being much simpler than
the full Sail AST, and it removes a lot of the typing complexity, as
types are replaced with simpler ones (ctyp
). Note that many
Jib-related types are prefixed by c
as it was originally only used
when generating C.
The key optimization we do when generating Jib is we analyze the Sail types using SMT to try to fit arbitrary-precision types into smaller fixed-precision machine-word types that exist within Jib. To aid in this we have a specialization pass that removes polymorphism by creating specialised copies of functions based on how their type-quantifiers are instantiated. This can be used in addition to the Sail monomorphisation above.
Once we have generated Jib, the code generator from Jib into C is fairly straightforward.
Starting with some Sail such as:
default order dec
$include <prelude.sail>
register r : bits(32)
$property
function property(xs: bits(32)) -> bool = {
ys : bits(32) = 0x1234_5678;
if (r[0] == bitzero) then {
ys = 0xffff_ffff
} else {
ys = 0x0000_0000
};
(ys == sail_zeros(32) | ys == sail_ones(32))
}
We first compile to Jib, then inline all functions and flatten the resulting code into a list of instructions as below. The Sail->Jib step can be parameterised in a few ways so it differs slightly to when we compile Sail to C. First, a specialisation pass specialises integer-polymorphic functions and builtins, which is reflected in the name mangling scheme so, e.g.
zz7mzJzK0zCz0z7nzJzK32#bitvector_access
is a specialised version of bitvector access for 'n => 32 & 'm => 0. This lets us generate optimal SMT operations for monomorphic code, as the SMTLIB operations like ZeroExtend and Extract are only defined for natural number constants and bitvectors of known lengths. We also have to treat zero-length bitvectors differently to C, as SMT does not allow zero-length bitvectors, and unlike when we compile to C, we can have fixed-precision bitvectors of greater that 64-bits in the generated Jib.
var ys#u12_l#9 : fbits(32, dec)
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x12345678)
var gs#2#u12_l#15 : bool
var gs#1#u12_l#17 : bit
gs#1#u12_l#17 : bit = zz7mzJzK0zCz0z7nzJzK32#bitvector_access(R, 0l)
gs#2#u12_l#15 : bool = eq_bit(gs#1#u12_l#17, UINT64_C(0))
kill gs#1#u12_l#17 : bit
var gs#6#u12_l#16 : unit
jump gs#2#u12_l#15 then_13
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x00000000)
gs#6#u12_l#16 : unit = UNIT
goto endif_14
then_13:
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0xFFFFFFFF)
gs#6#u12_l#16 : unit = UNIT
endif_14:
kill gs#2#u12_l#15 : bool
var gs#5#u12_l#10 : bool
var gs#3#u12_l#14 : fbits(32, dec)
gs#3#u12_l#14 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
gs#5#u12_l#10 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#3#u12_l#14)
kill gs#3#u12_l#14 : fbits(32, dec)
var gs#7#u12_l#11 : bool
jump gs#5#u12_l#10 then_11
var gs#4#u12_l#12 : fbits(32, dec)
var gs#0#u9_l#13 : fbits(32, dec)
gs#0#u9_l#13 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
gs#4#u12_l#12 : fbits(32, dec) = zz7nzJzK32#not_vec(gs#0#u9_l#13)
kill gs#0#u9_l#13 : fbits(32, dec)
goto end_inline_10
end_inline_10:
gs#7#u12_l#11 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#4#u12_l#12)
kill gs#4#u12_l#12 : fbits(32, dec)
goto endif_12
then_11:
gs#7#u12_l#11 : bool = true
endif_12:
return : bool = gs#7#u12_l#11
kill gs#5#u12_l#10 : bool
kill ys#u12_l#9 : fbits(32, dec)
end
undefined bool
The above Jib is then turned into a control-flow-graph in SSA form.
The conditionals that affect control-flow are put into separate nodes (in yellow), so we can easily compute the global path conditional for each block (stored in a function pi(cond0, ... , condn) by using the yellow conditional nodes between each node and the start node.
From this form the conversion to SMT is as follows:
A variable declaration such as
var x : fbits(32, dec)
would become
(declare-const x (BitVec 32))
A call to a builtin
x : T = f(y0, ... , yn)
would become
(define-const x T' exp)
where exp encodes the builtin f using SMT bitvector operations
Phi functions are mapped to muxers as follows - for a function
phi(x0,...,xn)
we turn that into an if-then-else statement which
selects x0
to xn
based on the global path conditionals of the
parent blocks corresponding to each argument. Each phi function in a
block always has the same number of arguments as the number of parent
nodes, and the arguments are in the same order as the node index for
each parent.
The above scheme generates a lot of declare-const and define-const statements that may not be needed so we do some simple dead-code elimination and constant propagation, which results in the following SMT:
(set-logic QF_AUFBVDT)
(declare-const zR/0 (_ BitVec 32))
(define-const zysz3u12_lz30/5 (_ BitVec 32) (ite (not (= ((_ extract 0 0) zR/0) #b0)) #x00000000 #xFFFFFFFF))
(assert (and (not (ite (not (= zysz3u12_lz30/5 #x00000000)) (= zysz3u12_lz30/5 (bvnot #x00000000)) true))))
(check-sat)
For monomorphic-bitvector manipulating code we can generate very compact SMT. Both specialisation and monomorphisation can be used to help monomorphise bitvectors. For variable-length bitvectors we can represent them as length-bitvector pairs, up to a certain bounded max length (default 256). This is less efficient but unavoidable in certain places. Integers are currently mapped to either 128 bit bitvectors (or any configurable max length) or 64 bit bitvectors.
The one slightly tricky thing to support is register references, e.g.
(*r) : T = 0xFFFF_FFFF
where r
is a register reference. For this we look for all registers
with type T (as a Jib type, where type-equality is trivial), and
convert the above to
if r = ref R1 then
R1 = 0xFFFF_FFFF
else if r = ref R2 then
R2 = 0xFFFF_FFFF
else ...
if we did some smarter constant folding (e.g. propagating the GPRs letbinding in some of our specs) this could potentially generate SMT that is just as good as a manual if-then-else implementation of the register read/write functions. This step is done before converting to SSA so each register in the if-then-else cascade gets the correct indices.