We conduct a preliminary study examining the potential of applying GFlowNet Tuning to formal reasoning, specifically, neural theorem proving (NTP).
Our study is motivated by (1) the observation that standard reasoning benchmarks (e.g. GSM8K) are increasingly overfitted against and do not perfectly evaluate model performance in real-world frontier use cases, and (2) the development of GFlowNet Tuning as a well-principled approach to improving sampling diversity and search performance by "amortizing" the cost of sampling more completions at inference time into a post-training phase.
Preliminary results and discussion using base model ReProver and a subset of the LeanDojo benchmark dataset can be found in the paper (workshop_paper.pdf
).
Minimal installation:
-
clone this repository
-
install Lean Dojo dependencies
-
install packages:
pip install -r requirements.txt
-
update paths in
gfn_ntp/configs/paths/default.yaml
to point to correct directories -
prepare dataset: select one of the following options
- download
shuffled_balanced1k.json
,val20.json
togfn_ntp/data/
from here - download raw files from LeanDojo benchmark and use the filtering script (
python -m proof_flow.scripts.data_preprocessing.filter_theorems
) (DETAILED INSTRUCTIONS COMING SOON)
- download
-
start training with
python -m proof_flow.scripts.gfn_tuning.train
The codebase is under active development. If you encounter any issues, please open an issue or contact the authors.
The workshop paper's experiments were conducted with commit c3bc55c
(c3bc55cc9159278024799f8a2ed7c522042377fb
), so we recommend checking against that specific version to solve any bugs introduced by newer changes.
As of October 13 2024, the configuration files have been re-organized for modularity and proof search evaluation has been optimized to remove redundant computations, but the underlying logic remains the same.