This directory contains the code for reproducing the TroVE ablation performed in the paper Library Learning Doesn't: The Curious Case of the Single Use Library.
As TroVE executes arbitrary LLM-generated code we strongly suggest using a sandbox to limit TroVE's network and read/write access. In our case we use singularity, which also contains the requirements.
To install Singularity, see https://github.com/sylabs/singularity. Note that we used Singularity Community Edition 3.8.2 in our experiments.
We have provided a Ubuntu 22.04 singularity container that already contains the Python 3.10.12 environment with all its dependencies. The .sif file can be downloaded from this link.
In the case that you want to use a different method to sandbox, you can set up the Python environment by running:
pip install -r sing_requirements.txt
Note that sing_requirements.txt contains the exact package versions used in our singularity environment. For a list of just the high-level dependencies (which risks not working as functionality is deprecated), see requirements.txt
The results of the log analysis can be found in a series of .md files, one per function in the library. For example, the files for the count split of MATH can be found in ablation_experiment_results/baseline_run0/CodeLlama-7b-Instruct-hf/math/counting/results, the other splits can be found in the corresponding subfolders of math.
To generate the markdown file for each function from the raw TroVE outputs at ablation_experiment_results/baseline_run0, run:
./data_exploration/function_displayer.sh
NOTE: To reduce the storage requirements, we searched for overly long program outputs and either truncated them or replaced them with None using the provided, script shrink.py.
-
Locally download/cache the required CodeLlama model files by running:
python one_time_setup/download_hf_files.py
-
Download the singularity file
trove_sing_v4.sif
from this link, and place it in this directory. -
Go into exec_baseline_trove.sh and exec_ablated_trove.sh. Edit the environment variables in the header:
-
EXPERIMENT_NAME: The name for this experiment. A subfolder with this name will be created to store the experiment results.
-
RUN_IDX: Each run must have a unique
--run_index
; otherwise, if there is an existing run with said index, the program will assume that the previous run was interrupted and will attempt to resume said run. -
SOURCE_FOLDER: The absolute path to the parent directory of this file, ending in a forwards slash. i.e., the result of running:
echo `realpath ..`'/'
-
OUTPUT_FOLDER: Path to the directory in which results will be saved, ending in a slash (i.e., run the command below). TroVE will have read-write access to this directory, so the contents could be altered, deleted, or otherwise compromised by the LLM-gnerated code. Make sure there is nothing important here! This should not be a subfolder of SOURCE_FOLDER, or vice-versa.
echo `realpath ~/path/to/output/directory`'/'
-
DSNAME: Which MATH split to run the model on. The options are: math/algebra, math/counting, math/geometry, math/intermediate, math/number, math/prealgebra, or math/precalculus
-
-
To run the baseline, run:
./exec_baseline_trove.sh
To run the ablation, run:
./exec_ablated_trove.sh
These instructions are unsafe as they execute arbitrary LLM-generated Python code on your machine. They are provided solely to explain how to run TroVE, so that you can setup a different sandbox should you wish.
To prevent accidents, we have added the --disable_llm
flag so that the commands below do not execute LLM generated Python code. Inside of a sandbox, you would run the commands below with this flag removed.
To run baseline TroVE on the intermediate split of MATH, you would run:
# (UNSAFE) executes LLM-generated Python code without sandboxing if --disable_llm removed
python run_trove.py --task_name math/intermediate --exec_file "tmp_exec_`date -Ins`.py" --verbose --preemptable --preemption_log_freq 12 --run_index 0 --disable_llm
To run the ablated model, you would add the --ablation1
flag:
# (UNSAFE) executes LLM-generated Python code without sandboxing if --disable_llm removed
python run_trove.py --task_name math/intermediate --exec_file "tmp_exec_`date -Ins`.py" --verbose --preemptable --preemption_log_freq 12 --run_index 0 --ablation1 --disable_llm
These commands would save logs and checkpoints to the default location, ./outputs
; this behaviour can be overriden by setting the OUTPUT_FOLDER
environment variable; e.g., export OUTPUT_FOLDER="/my/output/folder/"
In general, specify the task name as math/${dataset_name}, e.g., math/algebra. See data/math/ for the available MATH splits. Note that the specified --task_name
argument should be lowercased.
Each run must have a unique --run_index
; otherwise, if there is an existing run with said index, the program will assume that the previous run was interrupted and will attempt to resume said run.
To produce the values in the table above, run:
python data_exploration/calc.py
This repository is a modification of the original TroVE codebase, which was made available under a CC-BY-SA-4.0 license. As per the license, this modified work is shared under the same license.
Note that the data/math directory contains data from the MATH dataset, which is licensed under an MIT license. We do not change this directory from the TroVE repository.