Towards a Finer grained Incremental Typechecker for F*

A main obstacle towards scaling F* towards verifying large developments is its usability. Usability covers many dimensions, ranging from documentation to better error reports. This note focuses on just a few of them: notably, latency of typechecking; better error localization; and control over proof contexts.

So far, perhaps the biggest step towards improving the usability of F* has been its support for incrementally typechecking top-level elements of a module via its emacs interface. The main value of this feature is that it significantly improves the latency of typechecking results, when compared to batch-mode verification. This has allowed us to develop large verified codebases composed of many modules, with each module consisting of dozens of small top-level elements.

But, since incremental typechecking is only supported for top-level elements, there are still significant limitations:

Programs must be broken into several smaller top-level fragments to benefit from incremental typechecking
- This comes at the cost of destroying the structure of the original code, copying specifications, manually closure-converting, etc.
A single verification condition is produced for a top-level term, potentially containing 100s of separate assertions that are all checked together
- This can lead to brittleness of SMT-based proofs, since a single proof needs to cover many different aspects of a program, with little control over which hypotheses are relevant for which parts of a proof (a local variant of the "context pollution" problem).
- When SMT proofs timeout, this leads to poor error localization, since without a model from the SMT solver, F* can do little but blame the entire top-level term
- This leads to further ad hoc strategies, e.g., the "sliding admit" style (https://github.com/FStarLang/FStar/wiki/Sliding-admit-verification-style), to localize verification errors. However, these remain non-incremental, e.g., each time the admit is advanced, a VC for the whole prefix for the program is recomputed and must be proven by the SMT solver, with, say, just one new assertion.
- Aside from SMT solving, just running F*'s VC generator on a large term can itself be expensive. This adds latency when re-typechecking large terms. Worse, for enormous terms (e.g., a large nest of mutually recursive functions spanning 1000s of lines), the VC generator is likely to just explode when computing a single VC for all of it.
Certain forms of abstraction, that are easily expressed within F*'s theory are not actually easily expressible in practice, e.g., a variable whose scope spans a large definition (e.g., a "section variable", encompassing several top-level functions), since this requires repeatedly abstracting and applying for each small top-level function.

To address these issues, this note proposes

a revised syntactic structure of F* programs, more amenable to incremental parsing
a revision to the structure of the F* typechecker to provide incremental typechecking for partial programs
revisions to the F* ide to exploit these features

Revised syntactic structure

What makes the top-level syntactic structure amenable to incremental processing? Well, the top-level of a module is simply a sequence of declarations.

Processing a partial module amounts to just processing a prefix of its declarations
Extending a partially processed module with some more declarations involves
- First, persisting the state of the compiler after it has processed a prefix of a module. This is easy, since the top-level processing loop is just a fold over the list of declarations. So, we just persist the accumulated state in the fold.
- When more declarations arrive, we simply continue folding over them with the additional declarations and persisting the resulting accumulated state for further incremental processing.

Applying this structure to the syntax of F* terms is not so easy, since the terms are in an applicative syntax of lambda terms and it's hard to represent a partial term, to extend a partial term, and to persist the intermediate state of the compiler while it is processing a partial term so that it can be resumed.

However, revising the syntax of terms to follow a similar sequential structure as the top-level may allow us to process terms incrementally, just as we do top-level terms.

Towards that end, here's a first sketch of a revised syntax, presented in a stratified form that separates complete terms from extensible term fragments.

But, before presenting the syntax, first a disclaimer:

I am intentionally proposing a rather heavy concrete syntax: I can imagine many ways in which this could be made lighter and I welcome suggestions for the nicest concrete notation.

Second, I will gloss over the type annotation convention until the next section.

The main idea of the revised syntax is to introduce introduce two new syntactic forms for sequential and branching control flow that are easily extensible with additional fragments. For example, one could write the following block of sequentially composed terms:

begin_seq
var x1 = e1
...
var xn = en
return e
end_seq

where the block of var-lines is processed simply by folding the state of the compiler over each var-line. Given an incomplete block of vars, one can easily add additional var lines and continue processing, until the end_seq is encountered.

So, here's the core of the syntax. It should be entirely familiar, except that in several places in the grammar, we use a non-terminal E1..En to allow for term fragments where previously we only allowed terms.


constants
c ::= () | 0 | 1 ...

computation types
C ::= M t e1..en

terms
t, e ::=
       | c                              //constant
       | x                              //variable
       | e1 e2                          //application
       | e <: C                         //ascription
       | fun (x1:t1) (xn:tm) -> E1..En  //abstraction
       | let x = e in E1..En        //atomic sequential composition
       | match e with | Pi -> E1..En    //atomic branching

Term fragments are defined quite loosely. Not all sequences of term fragments can be reconstituted into well-formed terms. But, sequences of term fragments are trivial to parse incrementally.

fragment
   E ::=
       | e                              //promote an expression as a fragment

       | begin_seq : C                  //start incremental seq
       | var x = E1..En                 //incremental seq op
       | end_seq                        //end incremental seq

       | begin_match e return C              //start incremental branching
       | case P1 -> E1..En              //incremental branch case
       | end_match                      //end incremental branch

Some examples

Here are some examples of programs, extended incrementally to a complete programs in stages.

Double:

Every line is a valid partial term. It can be extended one line at a time producing a fully defined term.

1. let double = fun (x:ref int) -> begin_seq : _
2.        var v = !x
3.        (x := 2*v)
4.      end_seq

Append

Here, the first three lines are processed intially, then 4, 5, 6 incrementally.

1. let rec append =
2.     fun (x y : list 'a) ->
3.        begin_match x return _
4.          case [] -> y
5.          case hd::tl -> hd :: append  tl y
6.        end_match

Invalid programs

Not all sequences of fragments yield valid programs, e.g.,

begin_seq
case [] -> []

will be parsed as a sequence of fragments, but will be rejected as syntactically invalid

Nested fragments

Every line is processed incrementally

fun (x y z:ref int) ->  begin_seq : _
  var xy = begin_seq : _
       var x0 = !x
       var y0 = !y
       x0 + y0
     end_seq

  var z0 = !z;
  return (z := z0 + xy)
end_seq

Controlling the granularity of checking

Where larger blocks of code are to be typechecked together, the programmer can always just use the existing functional syntax, e.g., in this variant of the program in 4, the expression bound to xy is typechecked atomically.

fun (x y z:ref int) ->  begin_seq : _
  var xy =
    let x0 = !x in
    let y0 = !y in
    x0 + y0

  var z0 = !z;
  z := z0 + xy
end_seq

Some non-examples

Incremental typechecking

Incremental processing of program fragments is more than just about incrementally parsing fragments. Internally, as explained already in how we incrementally process top-level terms, we need to persist the state of the typechecker as it processes a sequence of fragments.

Conceptually, checking the following block of code

let go () =
var x1 = e1
...
var xn = en
var y  =
 begin_seq : C
  var y1 = f1
  ...
  var ym = fn
  ym
 end_seq
var z1 = ...
...
var zk =

should be equivalent to checking

let hoist_y (x1:t1) (xn:tn) : C =
    let y1 = f1 in
    let ym = fm in
    ym

let go () =
  let x1 = e1 in
  ...
  let xn = en in
  let y = hoist_y x1 x2 in
  let z1 = ..
  ...
  let zk = ...

In other words, when checking a begin_seq : C block, the typechecker only makes use of the types of other variables in scope (x1:t1, ..., xn:tn), and whatever effectful preconditions are available from the annotated computation type C. Dually, the context of the seq block (z1...zk in this case) can only make use of the computation type proven about the seq block, not its definition.

Internally, to support incrementally checking a seq block, the accumulated state of the typechecker is something like

type seq_state = {
    env: TcEnv.env;                       //all variable bindings in scope
    goal: Syntax.comp;                    //the goal type to be eventually checked
    fragments: list (fragment * comp);    //type-checked fragments seen so far
    guard : guard_t;                      //accumulated typechecking guard
}

The seq_state is initialized with the enclosing environment, the goal, an empty guard, empty fragments.

To extend the state with another fragment, we use something like

  let extend (s:seq_state) (fs:fragments) : seq_state
        = match f with
          | Var(y, e) ->
            let e, g, c = tc_term env e in
            let fs = (e, c)::fs in
            let check =
                //check that the partial computation with an `admit`
                //continuation validates the goal
                let c = bind_l ((admit, admit_c) :: fs) in
                force_trivial_guard (sub_comp c goal)
            in
            let env = Env.push_bv (y, c_f.result_typ) in
            { env = env;
              goal = goal;
              fragments = fs;
              guard = conj_guard g s.guard }

          | EndSeq ->
             let c = bind_l fs in
             force_trivial_guard (mk_conj s.guard (sub_comp c goal))

          | _ -> fail "Invalid seq fragment"

This is simple and incrementalizes type inference, VC generation etc.

But, the check in the first branch repeatedly checks that the partial computations validate the goal, e.g., perhaps repeatedly proving some assertions in prior fragments already processed. We could try to be finer about this and turn those prior assertions into assumes when checking the continuation. Or, perhaps support for "effectful calc" statements would help here.

Similarly, when checking a begin_match x : C each case is checked in isolation, and when the end_match is processed, we emit an exhaustiveness check.

Editor support, for non-linear editing

As described so far, we need no particular support from the editor environment to support incrementality.

However, the form of incrementality that we get is limited in that programs are built only by textually appending fragments to it. As such, incomplete program fragments only appear in a suffix of the program text.

This may be fine for many of our uses. But, with more work on the editor side, one should be able to see a

begin_seq : C
prefix
[edit here]
end_seq

as a hole in the program that can be filled in and type-checked incrementally.

I am not sure about the editor support needed to handle this (Agda does something like it already, right?).

But on the compiler side of things, given support for "linear incrementality" supporting programs with multiple holes that are filled incrementally doesn't seem too much harder---we just need a way to focus on a hole within an expression tree, perhaps using a zipper like structure on our syntax trees.

More lightweight syntax (added by Catalin)

At least for cases where the top level type annotation is anyway inferred, we could make it easier to switch code from non-incremental to incremental mode and back, by considering a more lightweight syntax. For instance, code written in { and } could be automatically switched from non-incremental (let) to incremental (var).

A less ambitious idea is that we could make the current sliding admit less heavy by introducing syntax for "admitted blocks". Any code within say {{ and }} is desugared to admit().

Provide feedback

Saved searches