-
Notifications
You must be signed in to change notification settings - Fork 232
Towards a Finer grained Incremental Typechecker for F*
A main obstacle towards scaling F* towards verifying large developments is its usability. Usability covers many dimensions, ranging from documentation to better error reports. This note focuses on just a few of them: notably, latency of typechecking; better error localization; and control over proof contexts.
So far, perhaps the biggest step towards improving the usability of F* has been its support for incrementally typechecking top-level elements of a module via its emacs interface. The main value of this feature is that it significantly improves the latency of typechecking results, when compared to batch-mode verification. This has allowed us to develop large verified codebases composed of many modules, with each module consisting of dozens of small top-level elements.
But, since incremental typechecking is only supported for top-level elements, there are still significant limitations:
-
Programs must be broken into several smaller top-level fragments to benefit from incremental typechecking
- This comes at the cost of destroying the structure of the original code, copying specifications, manually closure-converting, etc.
-
A single verification condition is produced for a top-level term, potentially containing 100s of separate assertions that are all checked together
-
This can lead to brittleness of SMT-based proofs, since a single proof needs to cover many different aspects of a program, with little control over which hypotheses are relevant for which parts of a proof (a local variant of the "context pollution" problem).
-
When SMT proofs timeout, this leads to poor error localization, since without a model from the SMT solver, F* can do little but blame the entire top-level term
-
This leads to further ad hoc strategies, e.g., the "sliding admit" style (https://github.com/FStarLang/FStar/wiki/Sliding-admit-verification-style), to localize verification errors. However, these remain non-incremental, e.g., each time the admit is advanced, a VC for the whole prefix for the program is recomputed and must be proven by the SMT solver, with, say, just one new assertion.
-
Aside from SMT solving, just running F*'s VC generator on a large term can itself be expensive. This adds latency when re-typechecking large terms. Worse, for enormous terms (e.g., a large nest of mutually recursive functions spanning 1000s of lines), the VC generator is likely to just explode when computing a single VC for all of it.
-
-
Certain forms of abstraction, that are easily expressed within F*'s theory are not actually easily expressible in practice, e.g., a variable whose scope spans a large definition (e.g., a "section variable", encompassing several top-level functions), since this requires repeatedly abstracting and applying for each small top-level function.
To address these issues, this note proposes
-
a revised syntactic structure of F* programs, more amenable to incremental parsing
-
a revision to the structure of the F* typechecker to provide incremental typechecking for partial programs
-
revisions to the F* ide to exploit these features
What makes the top-level syntactic structure amenable to incremental processing? Well, the top-level of a module is simply a sequence of declarations.
-
Processing a partial module amounts to just processing a prefix of its declarations
-
Extending a partially processed module with some more declarations involves
-
First, persisting the state of the compiler after it has processed a prefix of a module. This is easy, since the top-level processing loop is just a fold over the list of declarations. So, we just persist the accumulated state in the fold.
-
When more declarations arrive, we simply continue folding over them with the additional declarations and persisting the resulting accumulated state for further incremental processing.
-
Applying this structure to the syntax of F* terms is not so easy, since the terms are in an applicative syntax of lambda terms and it's hard to represent a partial term, to extend a partial term, and to persist the intermediate state of the compiler while it is processing a partial term so that it can be resumed.
However, revising the syntax of terms to follow a similar sequential structure as the top-level may allow us to process terms incrementally, just as we do top-level terms.
Towards that end, here's a first sketch of a revised syntax, presented in a stratified form that separates complete terms from extensible term fragments.
But, before presenting the syntax, first a disclaimer:
I am intentionally proposing a rather heavy concrete syntax: I can imagine many ways in which this could be made lighter and I welcome suggestions for the nicest concrete notation.
Second, I will gloss over the type annotation convention until the next section.
The main idea of the revised syntax is to introduce introduce two new syntactic forms for sequential and branching control flow that are easily extensible with additional fragments. For example, one could write the following block of sequentially composed terms:
begin_seq
var x1 = e1
...
var xn = en
return e
end_seq
where the block of var
-lines is processed simply by folding the
state of the compiler over each var
-line. Given an incomplete block
of var
s, one can easily add additional var lines and continue
processing, until the end_seq
is encountered.
So, here's the core of the syntax. It should be entirely familiar,
except that in several places in the grammar, we use a non-terminal
E1..En
to allow for term fragments where previously we only allowed
terms.
constants
c ::= () | 0 | 1 ...
computation types
C ::= M t e1..en
terms
t, e ::=
| c //constant
| x //variable
| e1 e2 //application
| e <: C //ascription
| fun (x1:t1) (xn:tm) -> E1..En //abstraction
| let x = e in E1..En //atomic sequential composition
| match e with | Pi -> E1..En //atomic branching
Term fragments are defined quite loosely. Not all sequences of term fragments can be reconstituted into well-formed terms. But, sequences of term fragments are trivial to parse incrementally.
fragment
E ::=
| e //promote an expression as a fragment
| begin_seq : C //start incremental seq
| var x = E1..En //incremental seq op
| end_seq //end incremental seq
| begin_match e return C //start incremental branching
| case P1 -> E1..En //incremental branch case
| end_match //end incremental branch
Here are some examples of programs, extended incrementally to a complete programs in stages.
- Double:
Every line is a valid partial term. It can be extended one line at a time producing a fully defined term.
1. let double = fun (x:ref int) -> begin_seq : _
2. var v = !x
3. (x := 2*v)
4. end_seq
- Append
Here, the first three lines are processed intially, then 4, 5, 6 incrementally.
1. let rec append =
2. fun (x y : list 'a) ->
3. begin_match x return _
4. case [] -> y
5. case hd::tl -> hd :: append tl y
6. end_match
- Invalid programs
Not all sequences of fragments yield valid programs, e.g.,
begin_seq
case [] -> []
will be parsed as a sequence of fragments, but will be rejected as syntactically invalid
- Nested fragments
Every line is processed incrementally
fun (x y z:ref int) -> begin_seq : _
var xy = begin_seq : _
var x0 = !x
var y0 = !y
x0 + y0
end_seq
var z0 = !z;
return (z := z0 + xy)
end_seq
- Controlling the granularity of checking
Where larger blocks of code are to be typechecked together, the
programmer can always just use the existing functional syntax, e.g.,
in this variant of the program in 4, the expression bound to xy
is
typechecked atomically.
fun (x y z:ref int) -> begin_seq : _
var xy =
let x0 = !x in
let y0 = !y in
x0 + y0
var z0 = !z;
z := z0 + xy
end_seq
- Some non-examples
Incremental processing of program fragments is more than just about incrementally parsing fragments. Internally, as explained already in how we incrementally process top-level terms, we need to persist the state of the typechecker as it processes a sequence of fragments.
Conceptually, checking the following block of code
let go () =
var x1 = e1
...
var xn = en
var y =
begin_seq : C
var y1 = f1
...
var ym = fn
ym
end_seq
var z1 = ...
...
var zk =
should be equivalent to checking
let hoist_y (x1:t1) (xn:tn) : C =
let y1 = f1 in
let ym = fm in
ym
let go () =
let x1 = e1 in
...
let xn = en in
let y = hoist_y x1 x2 in
let z1 = ..
...
let zk = ...
In other words, when checking a begin_seq : C
block, the typechecker
only makes use of the types of other variables in scope (x1:t1, ..., xn:tn
), and whatever effectful preconditions are available from the
annotated computation type C
. Dually, the context of the seq block
(z1...zk
in this case) can only make use of the computation type
proven about the seq block, not its definition.
Internally, to support incrementally checking a seq block, the accumulated state of the typechecker is something like
type seq_state = {
env: TcEnv.env; //all variable bindings in scope
goal: Syntax.comp; //the goal type to be eventually checked
fragments: list (fragment * comp); //type-checked fragments seen so far
guard : guard_t; //accumulated typechecking guard
}
The seq_state
is initialized with the enclosing environment, the
goal, an empty guard, empty fragments.
To extend
the state with another fragment, we use something like
let extend (s:seq_state) (fs:fragments) : seq_state
= match f with
| Var(y, e) ->
let e, g, c = tc_term env e in
let fs = (e, c)::fs in
let check =
//check that the partial computation with an `admit`
//continuation validates the goal
let c = bind_l ((admit, admit_c) :: fs) in
force_trivial_guard (sub_comp c goal)
in
let env = Env.push_bv (y, c_f.result_typ) in
{ env = env;
goal = goal;
fragments = fs;
guard = conj_guard g s.guard }
| EndSeq ->
let c = bind_l fs in
force_trivial_guard (mk_conj s.guard (sub_comp c goal))
| _ -> fail "Invalid seq fragment"
This is simple and incrementalizes type inference, VC generation etc.
But, the check
in the first branch repeatedly checks that the
partial computations validate the goal, e.g., perhaps repeatedly
proving some assertions in prior fragments already processed. We could
try to be finer about this and turn those prior assertions into
assumes when checking the continuation. Or, perhaps support for
"effectful calc" statements would help here.
Similarly, when checking a begin_match x : C
each case is checked in
isolation, and when the end_match
is processed, we emit an
exhaustiveness check.
As described so far, we need no particular support from the editor environment to support incrementality.
However, the form of incrementality that we get is limited in that programs are built only by textually appending fragments to it. As such, incomplete program fragments only appear in a suffix of the program text.
This may be fine for many of our uses. But, with more work on the editor side, one should be able to see a
begin_seq : C
prefix
[edit here]
end_seq
as a hole in the program that can be filled in and type-checked incrementally.
I am not sure about the editor support needed to handle this (Agda does something like it already, right?).
But on the compiler side of things, given support for "linear incrementality" supporting programs with multiple holes that are filled incrementally doesn't seem too much harder---we just need a way to focus on a hole within an expression tree, perhaps using a zipper like structure on our syntax trees.
At least for cases where the top level type annotation is anyway inferred, we could make it easier to switch code from non-incremental to incremental mode and back, by considering a more lightweight syntax. For instance, code written in {
and }
could be automatically switched from non-incremental (let) to incremental (var).
A less ambitious idea is that we could make the current sliding admit less heavy by introducing syntax for "admitted blocks". Any code within say {{
and }}
is desugared to admit()
.