You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a meta-issue that I’ll use to track my progress in creating a toolkit for faster kernel reduction. This is to help me organize the work, but also to provide more visibility to interested members of the community.
Background
Lean has two mechanisms for computations in proof: by decide, which uses Kernel.whnf, and by native_decide, which uses the lean compiler to then run native code. The former tends to be slow, the latter has a significantly larger TCB. I investigated ways to find a middle ground here, something that allows larger computations than what we can handle well in Kernel.whnf, but still a noticably smaller TCB.
While a faster Kernel.whnfis possible by using a different evaluator (e.g. something like Sestof's abstract machine) and would give a speed up of 2-3×, this does quite justify the increase in kernel code. Other ideas, such a certified compilation to some lower-level VM that is interpreted by the kernel, are appealing, but too large tasks to tackle right away.
In these investigations I noticed that the code we send to the kernel is not particularly optimized (recursive functions use complex encodings, numerical operations are overloaded, computations on subtypes like Fin perform lot of wrapping and unwrapping). A quick experiment shows that a 10× speed-up is possible in some cases.
However just applying this trick broadly can lead to an overall slow-down, because for small calculations, it’s easier to just reduce it, rather than optimizing it first.
Goal
Therefore the goal of this project is to create a toolkit hat can be used by advanced users to carefully optimize a specific expensive proof, or to implement a particular proof-by-reflection tactic.
Design
The idea is to have a by rsimp_decide tactic that uses the simplifier (which is a versatile tool for certified rewrites of terms) with a custom simpset (rsimp) to rewrite the expression that the kernel will evaluate first.
Component overview
(These are rough specifications, some syntax or interaction will become clearer as I build these things.)
A rsimp simp set. Initially without any promise of standard library coverage; instead early adoptor are expected to extend it as needed.
A rsimp_decide tactic using this simpset
A rsimp_optimize attribute or command that takes a definition f, takes its RHS, optimizes it, defines f.rsimp_opt and adds f = f.rsimp_opt to the rsimp simpset.
The above, when run on a recursive function (more general: any theorem that has the shape of an unfolding theorem), to use Nat.rec and huge initial fuel to create a non-recursive definition that reduces efficiently. This will, in particular, unblock the use of well-founded recursion in such applications.
(Maybe) A variant of the above that can take a theorem @decide (P x) (inst x) = … @decide (P y) (inst y) …, and produce a rsimp rule @decide (P x) (inst x) = Nat.rec …, to simpliy recursive Decidable instances.
(Maybe) Pursue a Decidable refactoring so that @decide P inst is a plain function on Bool, and may reduce more easily (see Make Decidable a subtype of Bool #2038)
Data structures optimized for in-kernel reduction
A binary search tree for Nat → α maps, optimized for querying, to replace static arrays
(Maybe) Ensure that using large Nat literals as packed arrays or bitvectors works well (i.e. that their representation in C files is efficient, see Nat.repr slow for very large literals #5771)
See if the ongoing work on weighted balanced trees results in something that might work well here.
Kernel tweaks
Despite the goal of not touching the kernel too much, there might be room for small useful tweaks.
See if letting the kernel reduce Nat.rec without consulting the environment gives a worthwhile performance boost, given the above use of Nat.rec for recursion.
Maybe add strictLet : a → (a → b) → b which, when reduced by the kernel, causes the first argument to whnf’ed first. Otherwise left-folds will leak memory and stack space. This could also be used to ensure sharing of results.
Evaluation
Once things are in place, it would be good to see that it can be applied successfully to
Computations with well-founded recursion, like used in mathlib in Nat.sqrt and Nat.minFacAux
Try to tweak some of the bv_decide machinery using this toolkit, and see if we can handle smallish SAT proofs without native_decide that way.
The text was updated successfully, but these errors were encountered:
nomeata
changed the title
Optimizing terms for kernel reduction meta issue
Optimizing terms for kernel reduction (meta issue)
Oct 22, 2024
For some cases, the computation can be verified more quickly than it can be done. For example, in Nat.decidableExistsLE, from what I understand the kernel currently checks all values until one works, but optimally the value which satisfies the predicate can be found in native code and then given to the kernel.
Other examples of this phenomenon (although they require more mathematics) is when computing Nat.gcd a b, native code can compute values x, y, z, w such that z * (x * a - y * b) = a, w * (x * a - y * b) = b and then the gcd is x * a - y * b, or factorization, where native code can factor the number and produce primality certificates
Absolutely true that often you can split the computation into unverified data generation and verified checking; that is not the topic of this issue, though.
Better profiling for kernel reduction would be great, and I'm missing it too, but isn't really available right now.
This is a meta-issue that I’ll use to track my progress in creating a toolkit for faster kernel reduction. This is to help me organize the work, but also to provide more visibility to interested members of the community.
Background
Lean has two mechanisms for computations in proof:
by decide
, which usesKernel.whnf
, andby native_decide
, which uses the lean compiler to then run native code. The former tends to be slow, the latter has a significantly larger TCB. I investigated ways to find a middle ground here, something that allows larger computations than what we can handle well inKernel.whnf
, but still a noticably smaller TCB.While a faster
Kernel.whnf
is possible by using a different evaluator (e.g. something like Sestof's abstract machine) and would give a speed up of 2-3×, this does quite justify the increase in kernel code. Other ideas, such a certified compilation to some lower-level VM that is interpreted by the kernel, are appealing, but too large tasks to tackle right away.In these investigations I noticed that the code we send to the kernel is not particularly optimized (recursive functions use complex encodings, numerical operations are overloaded, computations on subtypes like
Fin
perform lot of wrapping and unwrapping). A quick experiment shows that a 10× speed-up is possible in some cases.However just applying this trick broadly can lead to an overall slow-down, because for small calculations, it’s easier to just reduce it, rather than optimizing it first.
Goal
Therefore the goal of this project is to create a toolkit hat can be used by advanced users to carefully optimize a specific expensive proof, or to implement a particular proof-by-reflection tactic.
Design
The idea is to have a
by rsimp_decide
tactic that uses the simplifier (which is a versatile tool for certified rewrites of terms) with a custom simpset (rsimp
) to rewrite the expression that the kernel will evaluate first.Component overview
(These are rough specifications, some syntax or interaction will become clearer as I build these things.)
A
rsimp
simp set. Initially without any promise of standard library coverage; instead early adoptor are expected to extend it as needed.A
rsimp_decide
tactic using this simpsetA
rsimp_optimize
attribute or command that takes a definitionf
, takes its RHS, optimizes it, definesf.rsimp_opt
and addsf = f.rsimp_opt
to thersimp
simpset.The above, when run on a recursive function (more general: any theorem that has the shape of an unfolding theorem), to use
Nat.rec
and huge initial fuel to create a non-recursive definition that reduces efficiently. This will, in particular, unblock the use of well-founded recursion in such applications.(Maybe) A variant of the above that can take a theorem
@decide (P x) (inst x) = … @decide (P y) (inst y) …
, and produce arsimp
rule@decide (P x) (inst x) = Nat.rec …
, to simpliy recursiveDecidable
instances.(Maybe) Pursue a
Decidable
refactoring so that@decide P inst
is a plain function onBool
, and may reduce more easily (see MakeDecidable
a subtype ofBool
#2038)Data structures optimized for in-kernel reduction
Nat → α
maps, optimized for querying, to replace static arraysNat
literals as packed arrays or bitvectors works well (i.e. that their representation in C files is efficient, see Nat.repr slow for very large literals #5771)Kernel tweaks
Despite the goal of not touching the kernel too much, there might be room for small useful tweaks.
Nat.rec
without consulting the environment gives a worthwhile performance boost, given the above use ofNat.rec
for recursion.strictLet : a → (a → b) → b
which, when reduced by the kernel, causes the first argument to whnf’ed first. Otherwise left-folds will leak memory and stack space. This could also be used to ensure sharing of results.Evaluation
Once things are in place, it would be good to see that it can be applied successfully to
Nat.sqrt
andNat.minFacAux
bv_decide
machinery using this toolkit, and see if we can handle smallish SAT proofs withoutnative_decide
that way.The text was updated successfully, but these errors were encountered: