Skip to content

Memoizing typechecking

Aseem Rastogi edited this page Dec 12, 2018 · 6 revisions

Consider a term of the form f ?u1 ?u2, and consider the scenario where the unification variables ?u1 and ?u2 are inferred to be the same term e. In the 1-phase typechecker, the term e would have been typechecked just once. However, in the 2-phase typechecker, the term is elaborated to f e e and the second phase ends up typechecking e term twice, leading to unnecessary duplication of work.

This note is about making the typechecker aware of such sharing and memoizing the typechecking results. The basic idea is to maintain a cache in the typechecker that maps physical addresses of the terms to their computed types, and consult it before typechecking the term.

Scoping issues

We need to be careful about maintaining well-scoped terms in the cache. But we also don't want to cache the whole type environment with each term. So we rely on the locally nameless representation of terms, that the typechecker already maintains, and keep types with the names in the term. That way, the terms can be typechecked independent of the type environment. In the typechecker AST terminology, we already maintain a sort field with every bv -- we just need to make sure that this is used and populated consistently.

Even after that, we can have a sanity check, where if a term e finds the cache hit, the free variables of e are in the environment.

Preserving sharing

The cache hit rate is going to be heavily dependent on the typechecker consciously maintaining sharing of terms. One place where we break it is substitution. For e[v/x], even if x is not free in e, we currently traverse and output a fresh e during the substitution and this breaks sharing.

For this, we can rely on the free variables information at every AST node, that the typechecker already maintains, and not traverse the term if the variable that we are substituting does not appear free in it. We need to benchmark if this is going to be a performance hit.

Rolling back the cache

We need to be careful about caching the results of typechecking paths that are rolled back or discarded during unification. We can maintain the cache along with the Union-Find graph that the unifier maintains and roll it back whenever the UF graph is.

Second-level cache

We can also implement a second-level cache that maps hash of the term e to the tuple (e, t), where t is its type. Once there is a hit in the cache for some term e', we would avoid the possibility of hash-collision by checking eq_term e e' before taking the type t from the cache entry. If it is indeed a successful hit, the entry (&e', t) will be promoted to the first-level cache.

To avoid hash computations again and again, we can memoize the hash of the term in the AST node.

Clone this wiki locally