Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing terms for kernel reduction (meta issue) #5806

Open
11 tasks
nomeata opened this issue Oct 22, 2024 · 3 comments
Open
11 tasks

Optimizing terms for kernel reduction (meta issue) #5806

nomeata opened this issue Oct 22, 2024 · 3 comments
Labels
P-high We will work on this issue

Comments

@nomeata
Copy link
Contributor

nomeata commented Oct 22, 2024

This is a meta-issue that I’ll use to track my progress in creating a toolkit for faster kernel reduction. This is to help me organize the work, but also to provide more visibility to interested members of the community.

Background

Lean has two mechanisms for computations in proof: by decide, which uses Kernel.whnf, and by native_decide, which uses the lean compiler to then run native code. The former tends to be slow, the latter has a significantly larger TCB. I investigated ways to find a middle ground here, something that allows larger computations than what we can handle well in Kernel.whnf, but still a noticably smaller TCB.

While a faster Kernel.whnf is possible by using a different evaluator (e.g. something like Sestof's abstract machine) and would give a speed up of 2-3×, this does quite justify the increase in kernel code. Other ideas, such a certified compilation to some lower-level VM that is interpreted by the kernel, are appealing, but too large tasks to tackle right away.

In these investigations I noticed that the code we send to the kernel is not particularly optimized (recursive functions use complex encodings, numerical operations are overloaded, computations on subtypes like Fin perform lot of wrapping and unwrapping). A quick experiment shows that a 10× speed-up is possible in some cases.

However just applying this trick broadly can lead to an overall slow-down, because for small calculations, it’s easier to just reduce it, rather than optimizing it first.

Goal

Therefore the goal of this project is to create a toolkit hat can be used by advanced users to carefully optimize a specific expensive proof, or to implement a particular proof-by-reflection tactic.

Design

The idea is to have a by rsimp_decide tactic that uses the simplifier (which is a versatile tool for certified rewrites of terms) with a custom simpset (rsimp) to rewrite the expression that the kernel will evaluate first.

Component overview

(These are rough specifications, some syntax or interaction will become clearer as I build these things.)

  • A rsimp simp set. Initially without any promise of standard library coverage; instead early adoptor are expected to extend it as needed.

  • A rsimp_decide tactic using this simpset

  • A rsimp_optimize attribute or command that takes a definition f, takes its RHS, optimizes it, defines f.rsimp_opt and adds f = f.rsimp_opt to the rsimp simpset.

  • The above, when run on a recursive function (more general: any theorem that has the shape of an unfolding theorem), to use Nat.rec and huge initial fuel to create a non-recursive definition that reduces efficiently. This will, in particular, unblock the use of well-founded recursion in such applications.

  • (Maybe) A variant of the above that can take a theorem @decide (P x) (inst x) = … @decide (P y) (inst y) …, and produce a rsimp rule @decide (P x) (inst x) = Nat.rec …, to simpliy recursive Decidable instances.

  • (Maybe) Pursue a Decidable refactoring so that @decide P inst is a plain function on Bool, and may reduce more easily (see Make Decidable a subtype of Bool #2038)

  • Data structures optimized for in-kernel reduction

    • A binary search tree for Nat → α maps, optimized for querying, to replace static arrays
    • (Maybe) Ensure that using large Nat literals as packed arrays or bitvectors works well (i.e. that their representation in C files is efficient, see Nat.repr slow for very large literals #5771)
    • See if the ongoing work on weighted balanced trees results in something that might work well here.
  • Kernel tweaks

    Despite the goal of not touching the kernel too much, there might be room for small useful tweaks.

    • See if letting the kernel reduce Nat.rec without consulting the environment gives a worthwhile performance boost, given the above use of Nat.rec for recursion.
    • Maybe add strictLet : a → (a → b) → b which, when reduced by the kernel, causes the first argument to whnf’ed first. Otherwise left-folds will leak memory and stack space. This could also be used to ensure sharing of results.

Evaluation

Once things are in place, it would be good to see that it can be applied successfully to

  • Computations with well-founded recursion, like used in mathlib in Nat.sqrt and Nat.minFacAux
  • Try to tweak some of the bv_decide machinery using this toolkit, and see if we can handle smallish SAT proofs without native_decide that way.
@nomeata nomeata changed the title Optimizing terms for kernel reduction meta issue Optimizing terms for kernel reduction (meta issue) Oct 22, 2024
@leanprover-bot leanprover-bot added the P-high We will work on this issue label Oct 25, 2024
@Command-Master
Copy link
Contributor

Command-Master commented Nov 2, 2024

For some cases, the computation can be verified more quickly than it can be done. For example, in Nat.decidableExistsLE, from what I understand the kernel currently checks all values until one works, but optimally the value which satisfies the predicate can be found in native code and then given to the kernel.
Other examples of this phenomenon (although they require more mathematics) is when computing Nat.gcd a b, native code can compute values x, y, z, w such that z * (x * a - y * b) = a, w * (x * a - y * b) = b and then the gcd is x * a - y * b, or factorization, where native code can factor the number and produce primality certificates

@Command-Master
Copy link
Contributor

There was also some discussion about this topic in https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Large-ish.20computation.20without.20Lean.2EofReduceBool . Is there some way to profile kernel reduction, to understand where the term could be optimized?

@nomeata
Copy link
Contributor Author

nomeata commented Nov 2, 2024

Absolutely true that often you can split the computation into unverified data generation and verified checking; that is not the topic of this issue, though.

Better profiling for kernel reduction would be great, and I'm missing it too, but isn't really available right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-high We will work on this issue
Projects
None yet
Development

No branches or pull requests

3 participants