Skip to content

Latest commit

 

History

History
1148 lines (893 loc) · 41.3 KB

architecture.md

File metadata and controls

1148 lines (893 loc) · 41.3 KB

TRLC Architecture

This document is a (somewhat minimal) documentation of how the tools work and why certain design decisions were made. It is required reading if you want to contribute to the code base or language design.

It is assumed that the reader understand basic compiler design, if not then the dragon book is a good place to start.

Coding style

Most of these are checked with make lint. Generally normal Python, with some changes:

  • Class names follow Ada naming convention (e.g. LASER_Is_An_Acronym)

  • Methods are lowercase with underscores (e.g. eat_potato)

  • Just use simple % string formatting

  • Horizontal alignment where reasonable, for example

    x = {
      "potatos" : 12,
      "cats"    : 3,
    }
  • Asserts for the types of all parameters (except self), for all functions or methods, all the time, e.g:

    def do_something(self, other, thing=None):
        assert isinstance(thing, str) or thing is None
        assert isinstance(other, Some_Object)
  • Never assume anything in conditions, always deal with all cases. In practice this means the final else clause in an if statement that distinguishes on three shape should always raise an ICE (internal compiler error). That way if somebody adds a new AST node in the future or changes the tree in some way we immediately get a failre, instead of silent success or worse silent failure. This is essential in making sure we can qualify this tool under ISO 26262 with minimal effort.

  • Avoid throwing or catching exceptions, unless there is no reasonable alternative. Instead write code that works. :) An exception here is the actual error mechanism of TRLC, but again there just use the provided interface, never raise your own exceptions.

Tool architecture

Files

The main files are:

  • trlc.py The main entry point and the Source_Manager
  • errors.py The single place to generate any message produced by the tools. Defines Message_Handler (the API to create messages) and Location (a way to point to a specific place in input sources).
  • lexer.py Defines the Token type and implements a lexer for the TRLC language.
  • parser.py Implements parsers for the TRLC language and the Markup_String embedded language.
  • ast.py Defines the AST. Refer to ast_hierarchy for a visual overview of this.
  • lint.py Implements a linter operating on the TRLC AST.
  • vcg.py Used by the linter to prove absence of run-time errors on user-defined checks.

There are also a few utility files:

  • version.py Defines the current tool version.
  • nested.py Used for creating sub-parses for languages embedded in TRLC strings.
  • math.py Implements the integer remainder function (it's not the same as Python %) and the rounding function for decimal to integer conversions.

Overall dataflow

The TRLC tool is actually two tools (the evaluator and the linter), however there is significant overlap in dataflow.

  • A shared command-line interface in trlc.py determines the file set to process and mode of operation.

  • The Source_Manager registers files and decides on the parse order. For all modes of operation all .rsl files are parsed first, and then all .check files.

  • Then (unless --no-lint mode is active) the lint.py and vcg.py modules are used to sanity check types and user checks, and we then stop with messages and a return code of 0 or 1.

  • Otherwise in normal mode the Source_Manager proceeds to parse all .trlc files.

  • Afterwards any missing references (i.e. references from things declared in one file to things declared in another file) are resolved.

  • Then all user defined check are evaluated. This is implemented as a tree-walk through the perform_checks function in ast.py.

At this point the command-line tool stops; but when the Source_Manager is invoked from the API then the final symbol table is passed on to the user.

Errors and Locations

There is a single class, Location, an instance of which is attached to every single node in the AST, and every single token generated by parser.

If you want to create a message of any kind, there is a single way to do it. You need to have a reference to the Message_Handler (usually called mh in the code).

You then call mh.warning, mh.check, mh.error, mh.ice_loc to create a message. The intended semantics of these are:

  • warning should be called for issues that are not outright banned by the LRM, but are problematic nonetheless. For example duplicate imports.

  • check should be called for user-defined checks or for lint discovered problems.

  • error should be called for lexing issues, syntax error, and other static violations of language rules.

  • ice_loc generates an internal compiler error and hard aborts the tool execution. Think of this as an assert that should never fail, but if it does something is truly broken. There should be no way to actually execute this.

For example:

if t_previous.value == t_import.value:
   mh.warning(t_import.location,
              "duplicate import of package %s" % t_import.value)
   return

This generates on a useful and rich message:

import Foo
       ^^^ duplicate-imports/bar.trlc:4: warning: duplicate import of package Foo

Note that you don't have to worry about all the fancy pointing or reproduction of source text; the Message_Handler will do that for you.

The mh.error method has an optional attribute fatal (true by default). If set then after creating the error an exception of type TRLC_Error is raised.

The tool catches this exception at the top-level, there should be no other place where this is caught.

Includes and discovery

The mechanism to discover which files are necessary to parse is a bit complex. Most of the logic happens in the build_graph function of the Source_Manager.

  • First we register all include files that haven't been explicitly requested on the command-line. We register them with primary = False to indicate that they are not explictly requested.

  • We then parse just the preamble of all files. After that we resolve the imports of all packages. This means we could raise errors on include files that are never used. This was a design decision, as the alternative is that an include file is silently ignored and then the user is left with a more confusing error message.

  • During this we build a dependency graph, for each package we create three nodes in a graph: a rsl node, a check node, and a trlc node. Initially we link the check to the rsl; and the trlc to both the check and rsl.

  • When we resolve imports we then add links in this graph. Note that an import in an RSL only links to the RSL node of the imported package, but a link in the TRLC links to the TRLC node of the imported package.

  • We then compute the transitive close for the slice of the graph for all explicitly requested items.

  • We then use that to expand into a final file list that is required. Note that for check nodes, all check files are included, and similarly for trlc nodes all trlc files are included.

Lexer and Parser overview

The class hierarch for the lexer and parser is a bit more complex:

Lexer/Parser Overview

In short, the _Base classes are abstract bases. There are two lexers and parsers, one for TRLC itself, and one for the Markup_String mini-language. The Nested_Lexer lexer is used build a new lexer where the input is not a source file, but instead a string from TRLC. There is a lot of magic happening with translating from token error locations from "inside" to the outside string.

Location is an abstract base for filename / line / column information. The Source_Reference is a more detailed version of this linked to a lexer, which also contains source context (e.g. the original source line).

The LRM generator defines its own Nested_Lexer, which should eventually be unified with the Nested_Lexer from TRLC.

Lexer

The lexer is a hand-crafted lexer. We have chosen to do this over e.g. using PLY because:

  • No dependencies
  • Very precise error location generation was a very important criteria. It is possible to do this with PLY but it's a major hassle.

The lexer is not using the re module, instead it considers a sliding window of 3 characters without back-tracking.

The only reason the window is 3 characters instead of just 2 is the .. operator for ranges. We need to be able to distinguish between 1.2 and 1..2.

Parser

The parser is a hand-crafted recursive descent parser. The parser considers only considers a single token look-ahead. Again, we do not want to use a generated parser like PLY or AntLR because:

  • No dependencies
  • Performance (especially when compared to AntLR)
  • Emphasis on creating as much of the semantic resolution as early as possible when generating the AST.
  • Better error messages that come naturally with a recursive descent parser.
  • Two-stage parsing (see below).

Parsing is a bit weird for .rsl and .trlc files due to the import mechanism. When the Source_Manager first registers a .rsl file we immediately parse the file_preamble to understand dependencies and determine a parse order. When we parse the .trlc files we again parse them in two parts: first we pre-parse the preamble for all files; and then in a second step we parse the rest of all files.

The naming of parse methods generally follows the naming of non-terminals in the BNF grammar, e.g. parse_qualified_name for parsing the qualified_name non-terminal.

In general symbols are immediately resolved to their entities, there are in general no unresolved references, except in two very specific cases:

  • References to record object in general are resolved late, because we need to parse all files in order to know which ones actually exist.
  • In the parse_name method we delay lookup until we know what we're supposed to be looking for. The main culprit here is the builtin functions; a record may have a field called len and we won't know if we should look into the record symbol table or the global symbol table until we encounter the (.

Speaking of symbol tables, the design of TRLC here mirrors the philosophy from the GNAT Ada compiler: there is no central symbol table, instead everything is a tree and some nodes have "symbol tables" attached to them.

For example an Enumeration_Type has a mini table that stores all its literals; and in turn this type is in the mini table of its Package. Only the packages are stored in the global symbol table (which is provided by the Source_Manager).

You can visualise the AST by using the --debug-dump option of the tools. For this file:

package Test2

type T2 {
  x Integer
}

checks T2 {
  x > 1, fatal "must be positive", x
  x != 0, warning "potato"
}

It generates something like this:

Symbol_Table
   Builtin_Integer
   Builtin_Decimal
   Builtin_Boolean
   Builtin_String
   Builtin_Markup_String
   Builtin_Function len
   Builtin_Function startswith
   Builtin_Function endswith
   Builtin_Function matches
   Package Test2
      Declared_Late: False
      Record_Type T2
         Composite_Component x
            Optional: False
            Type: Integer
         Checks
            Fatal error 'must be positive'
               Anchor: x
               Binary Binary_Operator.COMP_GT Expression
                  Type: Boolean
                  Name Reference to x
                  Integer Literal 1
            Warning 'potato'
               Binary Binary_Operator.COMP_NEQ Expression
                  Type: Boolean
                  Name Reference to x
                  Integer Literal 0

The AST

The most useful entry point here is the hierarchy picture:

AST Hierarchy

Only leaf nodes can be created.

The AST is especially picky with its assertions. Each node carries in its constructor very specific assumptions about the type of it's children. For example:

class Quantified_Expression(Expression):
    def __init__(self, mh, location,
                 typ,
                 universal,
                 n_variable,
                 n_source,
                 n_expr):
        super().__init__(location, typ)
        assert isinstance(typ, Builtin_Boolean)
        assert isinstance(universal, bool)
        assert isinstance(n_variable, Quantified_Variable)
        assert isinstance(n_expr, Expression)
        assert isinstance(n_source, Name_Reference)
        self.universal = universal
        self.n_var     = n_variable
        self.n_expr    = n_expr
        self.n_source  = n_source
        self.n_expr.ensure_type(mh, Builtin_Boolean)

For example instead of asserting that n_source is an Expression, or even Node; we are very precise here in what we require.

Similarly, functions that add or set things, also must be very specific. For example:

    def add_item(self, node):
        assert isinstance(node, (Concrete_Type,
                                 Check_Block,
                                 Record_Object))
        self.items.append(node)

Other than this the AST is pretty boring; there is no rewriting or simplification. The only node worth pointing out as special is Name_Reference, as that is the only node where the entity it refers to may be filled in late.

Evaluation

Evaluation of expressions is a major component of the language. Each Expression node has a function evaluate:

    def evaluate(self, mh, context):
        assert isinstance(mh, Message_Handler)
        assert context is None or isinstance(context, dict)

The context is a mapping of local names to values, this is used when evaluating a user defined check to bind the component names used in the check to concrete values.

The context can be None, in which case evaluation fails with "cannot be used in a static context" if the expression is not static. This used in a few places to enforce static values:

  • The right-hand side of the power operator must be static
  • The regular expression given to matches must be static

Evaluation in general is using Value (also defined in ast.py) as a polymorphic nullable value. Each individual evaluation method is responsible for the appropriate type checking; this is not delegated to the Value class (which is really just a container). The Value class has a type (typ) and value. The type is an instance of the node type Type (e.g. Builtin_Integer or Record_Type).

The value is a python value representing it, or None (for null values).

  • str (for Builtin_String or Builtin_Markup_String)
  • int (for Builtin_Integer)
  • bool (for Builtin_Boolean)
  • list of Value (for Array_Type)
  • dict of str -> Value (for Tuple_Aggregate)
  • fractions.Fraction (for Builtin_Decimal)
  • Record_Reference (for Record_Type; i.e. we store the reference itself not the object referred to)
  • Enumeration_Literal_Spec (for Enumeration_Type)

Evaluation itself is pretty simple, we just apply the relevant python operator or something from math.py. For example for the unary operators:

    def evaluate(self, mh, context):
        assert isinstance(mh, Message_Handler)
        assert context is None or isinstance(context, dict)

        v_operand = self.n_operand.evaluate(mh, context)
        if v_operand.value is None:
            mh.error(v_operand.location,
                     "input to unary expression %s (%s) must not be null" %
                     (self.to_string(),
                      mh.cross_file_reference(self.location)))

        if self.operator == Unary_Operator.MINUS:
            return Value(location = self.location,
                         value    = -v_operand.value,
                         typ      = self.typ)
        elif self.operator == Unary_Operator.PLUS:
			...

There is a big assumption made here, in that the tree is correctly formed (in terms of typ), but this is statically checked at construction. This is why we don't typecheck when we evaluate unary minus, but just rely on / assume that the value will be an integer or fraction.

In evaluation there are (currently) three places where run-time errors can be created:

  • Evaluation of null in any context other than equality (or inequality).
  • Division by zero (both integers and decimals).
  • Array out-of-bounds.

Here is a good place to explain a design decision regarding the null rules of the language and why null is only allowed in a very specific context.

Every expression is typed, and null should have the correct type, i.e. if we say 1 == null then the null should be an Builtin_Integer null. However we also do not want to do multiple passes over the parse-tree to do type resolution (e.g. like Ada), and so expressions like (if a then null elseif (...) else null) would be infuriating to deal with.

Name references that happen to be null are not an issue, because we always know the type they are supposed to be. For null we don't actually do this (have a polymorphic null), instead we have a Null_Literal that has no type, and then we just restrict the places it can appear.

This effectively dodges the issue, without any real loss of generality in the language; and we can parse everything in one pass and resolve all types immediately.

Linter

There are two parts to the linter; the classical static analysis part "linter" (in lint.py) and the more formal methods based analysis "verification condition generation" (in vcg.py).

The traditional linter is pretty simple: after parsing everything we do another tree-walk and emit additional messages.

There are a few exceptions, as some problems are best detected when parsing for the first time. Hence the Parser has a attribute lint_mode that can enable additional messages in this case. However a design goal in general is separation of concerns, so this method is only chosen if the alternative would be impossible or extremely awkward to do.

An example is the lint check for clarifying final. In the AST we just know if a record is final or not, we do not know if this was inherited, or if the keyword was present. In the parser we do. We could of course modify the AST to record this, but that is generally not helpful since nobody else (currently) cares if this attribute was there or not.

The linter itself is simple and reading the code should make it obvious how it works. The only important design goal is to try and implement new checks in lint.py and not parser.py, if at all possible.

Verification condition generation

Once other lint checks have been performed, the linter constructs an object of class VCG (from vcg.py) for record types and tuple types, and calls the analysis method. This intends to find deeper problems with the user types and rules.

The overall approach is to encode a model of our source language in SMTLIB and feed these problems to an SMT Solver. Depending on how the solver answers, we can then issue messages informing the user about problems.

This feature is gated behind the --verify option, as it requires two additional run-time dependencies:

Terminology

First some terminology that we shall be using:

  • VC: a Verification Condition. This is a small logical problem where we have known facts (e.g. x > 10) followed by a conclusion we want to prove (e.g. x != 0). Generally we generate a number of verification conditions for an expression / program / function, and if we manage to prove all of them, then we can conclude something (e.g. "we don't crash" or "we don't divide by zero").
  • Path: a possible execution trace through an expression or program
  • Sound: a property of an analysis. A sound analysis has no false positives, i.e. it does not miss bugs.
  • Complete: a property of an analysis. A complete analysis has no false alarms, i.e. all messages raised really do correspond to a problem.
  • Automatic: a property of an analysis. An automatic analysis does not require human intelligence or input.
  • SAT: SATisfiable, i.e. there is an assignment to a formula that makes it true. For example x >= 0 and x <= 10 is SAT, since x = 3 would make the entire thing true.
  • UNSAT: UNSATisfiable, i.e. there is no assignment to a formula that makes it true. For example x == 1 and x == 2 is UNSAT, since there is no possible value for x that would make both equalities true.

Example

We shall explain the overall process based on a trivial example:

package Potato

type Kitten {
  a Integer
  b Integer
  c Integer
}

checks Kitten {
  a > 17 and 100 / (a + b * c) > 50, warning "Example"
}

In this case there are no null dereference problems, but there is a potential division by zero.

First we need to understand how many possible execution paths there are through this:

  • The warning is raised
    • (1) Both a > 17 and 100 / (a + b * c) are true
  • The warning is not raised
    • (2) a <= 17 (in which case we do not care about the rest, as the and operator has short-cut semantics.
    • (3) a > 17 and 100 / (a + b * c) is false

So here only path (1) and (3) is exciting, since those are the paths where we can get to the division. Path (2) can never raise a division by zero error, if the implementation follows the LRM.

One of the features of PyVCG is that we can build a graph that models the execution paths of the source language; and we can then ask it to build VCs for all possible paths. The graph for this looks like this:

graph for example

We start at the top. First we need to introduce the things we talk about (these are the components of our record).

;; value for a declared on potato.rsl:4:3
(declare-const |Potato.Kitten.a.value| Int)
(define-const |Potato.Kitten.a.valid| Bool true)
;; value for b declared on potato.rsl:5:3
(declare-const |Potato.Kitten.b.value| Int)
(define-const |Potato.Kitten.b.valid| Bool true)
;; value for c declared on potato.rsl:6:3
(declare-const |Potato.Kitten.c.value| Int)
(define-const |Potato.Kitten.c.valid| Bool true)

First note that SMTLIB should be read like LISP: ( function operators+ ). This means in the first line we declare a constant (the SMT term for a free variable) by using the declare-const function, and it's arguments are |Potato.Kitten.a.value| (i.e. the name of this constant) and Int (i.e. it's type, or sort in SMT).

We have two constants that we use to model each record field: One for the actual value, and one indicating if it's null or not. Since all fields are not optional, all of them are already defined to be true. Note that for the values we didn't do this, since we want the solver to figure out what they could be.

Moving to the next node in the graph we see:

;; validity check for a
goal: |Potato.Kitten.a.valid|

This corresponds to the very first use of a in our expression:

  a > 17 and 100 / (a + b * c) > 50, warning "Example"
  ^ this one right here

We need to check if this a could possibly be null. It's kinda obvious, but we need to do it anyway.

So we now want to get an SMT solver to figure out if |Potato.Kitten.a.valid| is always true, in every possible instance of this record.

To do that is that we negate the goal and ask if there is a SAT assignment. Because if we can find one, the goal is obviously not always true, and even better we have a counter-example we can give to the user. If the problem turns out to be UNSAT, then we can conclude by the law of excluded middle, that the original statement must be always true.

The complete VC for this check looks like this:

(We have removed the matches function from this, because it's not relevant and we'll explain it later.)

(set-logic QF_UFLIA)
(set-option :produce-models true)

;; value for a declared on potato.rsl:4:3
(declare-const |Potato.Kitten.a.value| Int)
(define-const |Potato.Kitten.a.valid| Bool true)
;; value for b declared on potato.rsl:5:3
(declare-const |Potato.Kitten.b.value| Int)
(define-const |Potato.Kitten.b.valid| Bool true)
;; value for c declared on potato.rsl:6:3
(declare-const |Potato.Kitten.c.value| Int)
(define-const |Potato.Kitten.c.valid| Bool true)
;; validity check for a
(assert (not |Potato.Kitten.a.valid|))
(check-sat)
(get-value (|Potato.Kitten.a.value|))
(get-value (|Potato.Kitten.a.valid|))
(get-value (|Potato.Kitten.b.value|))
(get-value (|Potato.Kitten.b.valid|))
(get-value (|Potato.Kitten.c.value|))
(get-value (|Potato.Kitten.c.valid|))
(exit)

Note: You can see these files if you use the --debug-vcg option.

If we now feed this to a solver, we will immediately get an UNSAT result.

After this check we can proceed to the next step. We can compute a value for the result of a > 17.

;; result of a > 17 at potato.rsl:10:5
(define-const |tmp.1| Bool (> |Potato.Kitten.a.value| 17))

Here we store this in a new (intermediate) constant we define just for this purpose. This activity is not unlike SSA (static single assignment) that is a common step in compilers.

We then introduce a branch. In the left hand side we assert that not |tmp.1| is true, and in the right-hand side we assert that |tmp.1| is true. This models the and semantics, where we do not proceed to the right-hand side if the left-hand side is false.

We then continue adding more validity checks (for a (again), and also for b and c). You might think it would be good to not check for a again, but generally VCG is hard enough as it is, and such optimisations are more likely to introduce errors than not. So we leave this for much later if it turns out to be a real problem.

Afterwards we define intermediates for b * c (tmp.3) and then a + tmp.3 (tmp.4). We then need to check if this could be zero. So again we assert that it's not and ask for an assignment:

(define-const |tmp.1| Bool
  (> |Potato.Kitten.a.value| 17))
(assert |tmp.1|)

;; result of b * c at potato.rsl:10:27
(define-const |tmp.3| Int
  (* |Potato.Kitten.b.value| |Potato.Kitten.c.value|))

;; result of a + b * c at potato.rsl:10:23
(define-const |tmp.4| Int
  (+ |Potato.Kitten.a.value| |tmp.3|))

;; division by zero check for 100 / a + b * c
(assert (not (not (= |tmp.4| 0))))
(check-sat)

The not not seems surprising, but again it's just easier this way. What we want to be true is tmp.4 != 0, this translates to (not (= |tmp.4| 0)). So that is our goal, and to find out if the goal is always true we negate it. So (not (not ....

And now we get a SAT result, with an assignment:

$ cvc5 trlc-Potato-Kitten_0002.smt2
sat
((Potato.Kitten.a.value 18))
((Potato.Kitten.a.valid true))
((Potato.Kitten.b.value 2))
((Potato.Kitten.b.valid true))
((Potato.Kitten.c.value (- 9)))
((Potato.Kitten.c.valid true))

So now we have a counter-example and we can feed this back to the user:

a > 17 and 100 / (a + b * c) > 50, warning "Example"
               ^ potato.rsl:10: warning: divisor could be 0 [vc_id = 2] [vcg-div-by-zero]
               | example record_type triggering error:
               |   Kitten bad_potato {
               |     a = 18
               |     b = 2
               |     c = -9
               |   }

Overall architecture

The general structure of VCG is a tree transformation from the TRLC AST to the PyVCG graph.

We start at a composite type, walk over each check's expressions and build the graph. In general for each expression node in the AST there is a corresponding tr_ (for translate) function in VCG. For example to process Unary_Expression we have tr_unary_expression.

Because it may be difficult to support new constructs in VCG when they are first created, we have a special mechanism to deal with "not yet supported" features. We have a special exception Unsupported and a flag_unsupported method that can be called with a node. For example at the end of tr_expression:

        elif isinstance(n_expr, Field_Access_Expression):
            return self.tr_field_access_expression(n_expr)

        else:  # pragma: no cover
            self.flag_unsupported(n_expr)

        return value, smt.Boolean_Literal(True)

This is normally dead code, but if somebody adds a new expression type (maybe let expressions) then this will gracefully fail, e.g:

   let potato => (x + 1)
   ^^^ warning: LET_EXPRESSION not supported yet in VCG

After the translation is done we use PyVCG to generate individual VCs for all paths and attempt to solve them. For each we fail, we just emit the error message that was stored in the VC (when we build VCs we already attach the message in case it would fail; we don't generate the message on-the-fly).

Model

This section describes how we model each language construct.

Checks

For each used-define check we translate the expression and emit checks (e.g. division by zero) on the way. For fatal checks we then assert that the expression is true (to model that we only proceed with execution if the fatal error did not get raised). For any other check (warnings and errors) we forget the knowledge.

In general this works by branching off the root node instead of following the current path in the graph.

For example the following check sequence:

checks Example {
   ..., error   "normal_error_1"
   ..., fatal   "fatal_error_1"
   ..., warning "normal_warning_1"
   ..., fatal   "fatal_error_2"
   ..., error,  "normal_error_2
}

Would generate a (reduced) graph like this:

knowledge accumulation example

For example when proving anything about normal_warning_1 we know that fatal_error_1 could not have occurred. But we cannot make the same assumption about normal_error_1, any checks under fatal_error_1 do not get the truth of normal_error_1 asserted.

Types

We model types as follows

TRLC Type PyVCG Sort SMTLIB Sort
Builtin_Boolean BUILTIN_BOOLEAN Bool
Builtin_Integer BUILTIN_INTEGER Int
Builtin_Decimal BUILTIN_REAL Real
Builtin_String BUILTIN_STRING String
Record_Type BUILTIN_INTEGER Int
Tuple_Type Record datatype
Enumeration_Type Enumeration datatype
Array_Type Sequence Seq

A few of these need some additional explanation.

Decimal

The model of Builtin_Decimal as Real is not precise. A decimal number cannot be 1 / 3, but a real can be. In TRLC expression we can actually get rationals, but as the user we cannot assign 1 / 3 to a decimal value (the syntax does not permit it). The means something like

   100 / (x - 1 / 3)

Should actually be fine, but with our chosen model it's not as there is an obvious counter-example. However at least this modelling is sound (but not complete).

I have created a ticket with CVC5 for a feature request to assert that some real is actually a decimal, but this is hard.

We could also model our decimals as integers and then divide them, for example:

(declare-const |foo.T.x.value.int| Int)
(define-const |foo.T.x.value| Real (/ (to_real |foo.T.x.value.int|)
                                      1000000))

However this is unsound (but complete), as we would need to guess the maximum accuracy that the user would use. We could do some tricks with interval arithmetic to estimate a required accuracy.

In addition, converting between Real and Int in SMTLIB is generally bad, so there may be other problems attached to this.

Finally we could also model our reals as rationals, and assert that the divisor must be a product of 2s and 5s. We tried this with a recursive function, but these things immediately make everything undecidable in general.

Records

It may appear surprising to model records as integers, but we can't actually access records in the check language. We just need to model that there are N different records the user could pick when referencing them.

Tuples

Tuples are modelled as SMTLIB datatypes, but there is additional complication here. A tuple is not just a random collection of data there are additional constraints.

Specifically we know that:

  • A tuple is valid only if all error and all fatal checks have passed (compare this to our record, where we can only assume fatal checks).
  • A tuple with optional components has an additional constraint that once you have an optional component, all trailing components are also null.

This knowledge is generated with emit_tuple_constraints for a specific tuple. However this is currently not done for tuples in arrays.

Note also that we don't worry about validity: in the spirit of modular analysis this would have been checked when generating VCs for the tuple type itself.

We could simply add a universal quantifier for all tuples of type T, but then adding quantifiers is something we should only do as a last resort.

In the future we can detect if we have an array, and only in that specific case add the quantified constraint.

This may sound academic, but experience has shown that frivolous use of quantifiers can be a real problem.

String and Markup_String

We are using the newly added string theory in SMTLIB to model Strings.

There are some constraints about our strings that are not yet asserted:

  • A string can either contain no newlines (it's a "..." String)
  • Or a string cannot contain ''' (it's a '''...''' String)
  • A markup string has additional constraints when it comes to [[, ]] and what is permissible between these.

Right now it didn't seem important to add these constraints, as we hope that users don't write checks that actually depend on this, as that seems nuts. But you never know... :)

Arrays

Arrays are not modelled using the array theory, instead we use the non-standard sequence theory.

There are a two good reasons for this choice:

  • Arrays do not have a cardinality constraint, so we'd need to model that extra if we want to reason about array length.
  • Sequences provide some other builtins that we want to use (contains, prefixof, and suffixof).

If we want to target a solver in the future that doesn't support this theory then we can of course do it with Arrays, but it'll be annoying.

Expressions

Most of the expressions should be fairly simple, with a few exceptions that we mention here.

A general note on the translation functions (e.g. tr_expression): each of these always returns a tuple of two values: the translated expression tree for the value of the expression, and the translated expression tree for the validity of the expression assuming no errors in sub-expressions.

Especially the latter part ("assuming no errors") is important and needs to be explained in more detail.

The value is simple enough: for an expression like x + y the value is some SMT tree that calculates this.

The validity is more or less a null check. But it's a null check for this expression, not it's sub-expressions. We build VCs in a modular deductive fashion, and so we have emitted checks earlier.

Hence for almost all expressions (including x + y the validity is statically true).

However for the sub-expression the validity is NOT always true, e.g. for x it's the true or false if x is optional.

We do it this way because otherwise we have to consider three-valued logic, or arithmetic with null, e.g. is the value of x or y true if x = null but y = true?

There is one big down-side, that we'll come to later (hint: quantifiers).

Short-circuits

The short-circuit expressions (and, or, implies, and if) are modelled in the graph as branches.

Equality

Equality is the only place where null is considered properly, and it's weird as a result.

In general, two things are equal if they are either both null, or they are not and their values match. But due to our modelling we can't just do:

(or (= x_value y_value)
    (= x_valid y_valid))

Specifically this gives the wrong answer if x is null, but y is not and they both happen to have the same value (since value is independently modelled from validity).

In an earlier draft we considered modelling all values as datatypes that could be null OR carry the value; but this approach was pretty nasty as it complicated a lot of simple things.

So instead we define equality instead as:

(and (= x_valid y_valid)
     (=> x_valid (= x_value y_value)))

There is a bit of optimisation in tr_op_equality to simplify this to just (= x_value y_value) in most cases, but we can't always do that.

In addition tuples also make this awful, because we may need to consider the sequence of optional items. So for tuples we build a very big conjunction of these kinds of equalities in tr_core_equality and tr_core_equality_tuple_component.

It should be noted that this conjunction for tuple equality relies on the tuple constraints, so currently for tuples in arrays we don't get quite the correct answer.

Quantifiers

Quantifiers are always horrific, and we've made a number of decisions when it comes to dealing with them. The primary design goal is to not have them, if in any way possible. This is also why we have not used tools like Why3 as our intermediate language), and instead wrote something new (PyVCG).

First, quantifiers and validity checks work differently. We decided to over-approximate how they work in order to keep things sane, as the LRM suggests.

For the validity checks in quantifiers we perform a quantifier elimination: we create a new free variable that represents an arbitrary index into the subject array, and then walk our tree for the quantifier expression emitting validity checks.

However for the value of the quantifier and its body we perform a very different evaluation, eliminating short-circuit semantics. We can do this because we've proved earlier that there are no run-time errors for an arbitrary element.

The VCG class has for this purpose a global variable functional, which we enable and disable to generate the body. If this is set then instead of modifying the graph we just build one monolithic SMT expression for the value:

        temp, self.functional = self.functional, True
        b_value, _ = self.tr_expression(n_expr.n_expr)
        self.functional = temp

Additional caveat: right now nested quantifiers remain unsupported by VCG and it's unclear if users ever need this feature.

A note on expressive power: in our source language we can only ever quantify over elements of some array; we cannot quantify over the index. This makes some things easier, although in the future this may be a possible language extension.

However we've chosen to model quantifiers by quantifying over the index (with some constraints as to its range) instead of quantifying over element type and asserting this element is in the array; as the former gave much better results.

The matches function

Finally, the builtin matches regular expression test is modelled as an uninterpreted function right now.

This means for any given string and regex, the solver just decides randomly; as long as it's consistent.

Right now this doesn't lose us much, but we could add proper support for regular expressions as the SMTLIB String theory supports it.

Building the graph

When creating the PyVCG graph for our model, we use several utility functions in VCG to do so:

  • We keep track of the "current" node in self.current_start, the functions below create new things, glue them to the current start, and then advance current start to point to the new thing. Several functions that deal with branches make a node of the current start and restore it at appropriate times.
  • attach_validity_check is used to create a new check testing if the given expression is true
  • attach_int_division_check and attach_real_division_check take a divisor expressions and emit a VC to make sure it's not zero
  • attach_index_check take a index expression and emit a VC to make sure it fits in the bounds of the corresponding array
  • attach_feasability_check works a bit different. It creates a VC to make sure that the given expression can actually be false. We use this to figure out if checks are redundant (i.e. always true).
  • attach_assumption simply injects some new knowledge in the current branch
  • attach_temp_declaration also injects new knowledge by creating a new temporary variable (and optionally statically defining to be some value)
  • attach_empty_assumption just creates a new do-nothing node. This is used in quantifier elimination.