This document describes how to reproduce the experiments described in the SquirrelFS paper.
All experiments in the paper were run with the following system configurations:
- Debian Bookworm
- Intel processor supporting
clwb
- 64 cores
- 128GB Intel Optane DC Persistent Memory
- 128GB DRAM
Minimum recommended environment for running experiments elsewhere (VM or baremetal):
- Debian Bookworm or Ubuntu 22.04
- Intel processor supporing
clwb
orclflush
/clflushopt
- 8 cores
- 20 GB PM (emulated or real)
- 16GB DRAM (in addition to DRAM used to emulate PM)
- Approximately 20GB free storage space
We have set up SquirrelFS and all benchmarks on a machine with these configurations for artifact evaluators. We will provide information about how to access this machine to evaluators during the review period. Running multiple experiments concurrently on this machine will impact their results, so please coordinate with the other reviewers to ensure experiments do not conflict.
If you would prefer to use a different machine, please follow the setup instructions in the README to compile and install SquirrelFS. If running SquirrelFS on a different machine, please take note of the following:
- SquirrelFS can be run without
clwb
support on processors that haveclflush
/clflushopt
support, but this may negatively impact performance, as these instructions are slower. Support for these instructions can be checked by runninglscpu
. - SquirrelFS can be run on emulated or real PM. We suggest using the provided hardware or another machine with Intel Optane DC PMM; if this is not an option, note that running the experiments on PM emulated with DRAM will have different performance results.
- The filebench workload
webproxy
may experience errors on machines with a relatively low number of threads or less PM. If the test fails, try reducing the value ofnthreads
inevaluation/filebench/workloads/webproxy.f
. - The default configurations of the RocksDB and LMDB benchmarks will run out of space on devices smaller than 128GB. If using a smaller PM device, please reduce the number of operations/records proportionally to the size of the device (e.g., if using 64 GB of PM, use 1/2 of the default values). Please see the Modifying experiments section for info on changing these values.
- Most experiments can be done with less than 20GB of PM; the exception is the Linux checkout experiment, as the kernel source code takes up approximately 16GB of storage space.
Scripts to run all experiments and parse and plot their results are included in the evaluation/
directory. This section provides details on how to run each experiment and how to compare results to those presented in the paper. The raw output of each script will be placed in the specified output folder. Note that experiments should not be run in parallel, as this will impact results.
All of the following commands should be run from within the evaluation/
directory.
All experiments use default arguments for iterations, thread count, and/or other experiment-specific parameters that work on the provided machine. Unless otherwise noted, the default values are the same ones we used in the paper. If you are running experiments on a different machine, please see the Modifying experiments section for info on changing these values.
Run scripts/build_benchmarks.sh
to compile filebench and LMDB and install dependencies required by the evaluation scripts. All other experiment scripts use pre-built binaries or compile the required tests.
Please also create an SSH key in ~/.ssh/id_ed25519
and add it to your GitHub account. One of the experiments clones a repository from GitHub and requires a key to complete this operation.
Each experiment scripts requires some subset of the following arguments:
mount_point
: the location to mount the file system under test. If you are using the provided machine, we suggest using/mnt/local_ssd/mnt/pmem/
. If you are using the pre-built VM, we suggest/mnt/pmem
.output_dir
: the directory to place all output data in. The scripts expect a relative path to a subdirectory ofsquirrelfs/evaluation
(e.g., usingresults
asoutput_dir
will put all output insquirrelfs/evaluation/results
). The same directory can be passed to all experiments; each experiment creates subdirectories to keep results organized.pm_device
: the path to the PM device file to use. This will generally be/dev/pmem0
.
Run sudo -E scripts/run_all.sh <mount_point> <output_dir> <pm_device>
to run all experiments on SquirrelFS. It takes approximately 18 hours to run all of the experiments on the provided machine. Each experiment can also be run separately following the instructions below.
Recommended arguments:
- If using the provided machine:
scripts/run_all.sh /mnt/local_ssd/mnt/pmem output_ae /dev/pmem0
- If using a VM:
scripts/run_all.sh /mnt/pmem output_ae /dev/pmem0
If you would instead like to run each experiment individually, we include instructions to invoke the helper scripts for each experiment below. scripts/run_all.sh
uses these same scripts.
To run the system call latency tests on all evaluated file systems, run:
sudo -E scripts/run_syscall_latency_tests.sh <mount_point> <output_dir> <pm_device>
It takes approximately 15-20 minutes to run all latency tests on all file systems on the provided machine.
To run the syscall latency test on a single file system, run
sudo -E scripts/run_syscall_latency.sh <fs> <mount_point> <output_dir> <pm_device>
where fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
).
To run all filebench workloads, run:
sudo -E scripts/run_filebench_tests.sh <mount_point> <output_dir> <pm_device>
It takes approximately 1.5-2 hours to run all filebench workloads on all file systems on the provided machine.
To specify the workload and file system to test, run
sudo -E scripts/run_filebench.sh <mount_point> <fs> <workload> <output_dir> <pm_device> <iterations>
where workload
specifies the filebench workload to run (fileserver
, varmail
, webproxy
, or webserver
) and fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
).
To run all YCSB workloads on RocksDB, run
sudo -E scripts/run_rocksdb_tests.sh <mount_point> <output_dir> <pm_device>
It takes approximately 4-4.5 hours to run these experiments on all file systems on the provided machine.
To evaluate a specific file system, run
sudo -E scripts/run_rocksdb.sh <fs> <mount_point> <output_dir> <pm_device> <operation count> <record count> <num threads> <iterations>
where fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
). This script runs all tested YCSB workloads by default, as some YCSB workloads depend on each other, but a subset can be selected by commenting out workloads to skip in the script.
To run all LMDB workloads, run:
sudo -E scripts/run_lmdb_tests.sh <mount_point> <output_dir> <pm_device>
To specify the workload and file system to test, run
sudo -E scripts/run_lmdb.sh <mount_point> <workload> <fs> <output_dir> <pm_device> <operation_count> <iterations>
where workload
specifies the LMDB workload to run (fillseqbatch
, fillrandbatch
, or fillrandom
) and fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
).
Note: we used 10 iterations for the paper, but this takes 14-15 hours to run. The provided scripts use 5 iterations by default to reduce the runtime of this experiment.
To run the Linux checkout experiment on all file systems, run:
sudo -E scripts/run_linux_checkout.sh <mount_point> <output_dir> <pm_device>
It takes approximately 2 hours to run these experiments on all file systems on the provided machine.
Note: this experiment uses git clone
and looks in $HOME/.ssh/id_ed25519
for the private key to use. To change the path to the key, update the value of key_path
in scripts/linux_checkout.sh
.
To run the experiment on a single file system, run
sudo -E scripts/linux_checkout.sh <fs> <mount_point> <output_dir> <pm_device> <iterations>
where fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
).
To measure compilation times of all file systems, run:
scripts/run_compilation_tests.sh <output_dir>
It takes approximately 15 minutes to run these experiments on all file systems on the provided machine. Note: Do not run this script with sudo
.
To run the experiment on a single file system, run
scripts/compilation.sh <fs> <iterations>
where fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
).
To run experiments to measure the mount time of SquirrelFS, run:
sudo -E scripts/run_remount_tests.sh <mount_point> <output_dir> <pm_device>
It takes approximately 1 hour to run these experiments on SquirrelFS on the provided machine.
Note: When filling up the device to measure the remount timing on a full system, the scripts spawn many processes to create files until the device runs out of space and attempting to create or write to a file returns an error. You may see errors indicating that there is no space left on the device when running this experiment -- this is expected.
We only provide mount time measurements for SquirrelFS in the paper, but if you would like to measure them for other file systems or run a specific test on SquirrelFS,, run
sudo -E scripts/remount_timing.sh <fs> <mount_point> <test> <output_dir> <pm_device>
where fs
specifies the file system to test (squirrelfs
, nova,
winefs
, or ext4
) and test
specifies which experiment to run (init
, empty
, or fill_device
). The script supports several more experiments, including filling the device with only data files or only directories, but we did not include results from these experiments in the paper. Note that the script only supports automatically running post-crash recovery code for SquirrelFS, as SquirrelFS has a mount-time argument (force_recovery
) to force recovery code to run on a clean unmount. The other file systems do not have mount-time arguments to force crash recovery and have to be manually modified to make this code run if a crash has not occurred.
Fully checking the Alloy model of SquirrelFS takes weeks and cannot be done within the artifact evaluation period. Instead, we provide a set of simulations to run on the model that produce example traces of various operations that SquirrelFS supports.
To run this set of simulations, run:
scripts/run_model_sims.sh <threads> <output_dir>
where threads
is the number of threads to use to run the simulations in parallel.
To achieve the best results, we suggest using half of the cores in your machine as the number of threads (e.g. for a 64 core machine, use 32 threads), as some simulations are memory-intensive. It takes approximately 30 minutes to run all simulations with 32 cores.
Evaluating SquirrelFS on a machine with fewer resources/less PM space may require modifying some experiment parameters. The provided scripts that run all tests in each experiment provide default values; we provide details on how to change these parameters in these scripts. As described above, the experiments can also be directly customized by running tests on different file systems individually.
- Syscall latency: The system call latency experiments are small and can be run on as little as 1GB of PM. The number of iterations is hardcoded into
tests/syscall_latency.c
; changing them requires modifying this script. The syscall latency helper scripts automatically recompile this file. - Filebench: To change the number of iterations, update the
iterations
value inscripts/run_filebench_tests.sh
. Other experiment-specific parameters are hardcoded into the filebench workload files, which can be found atfilebench/workloads
; note that changing these values can cause errors and/or significantly change results. - YCSB on RocksDB: The number of operations and records per test, as well as the number of threads and iterations, can be modified by changing the corresponding values in
scripts/run_rocksdb_tests.sh
. The number of records/operations may need to be decreased for PM devices less than 128GB in size; however, note that this may cause more variation in results. - LMDB: The number of operations per test and the number of iterations to run can be modified by changing the corresponding values in
scripts/run_lmdb_tests.sh
. - Linux checkout: The number of iterations can be modified in
scripts/run_linux_checkout.sh
. Note that this experiment requires a minimum of 20 GB of PM (emulated or real). - Compilation: The number of iterations can be modified in
scripts/run_compilation_tests.sh
. - Remount: The number of iterations can be modified in
scripts/run_remount_tests.sh
. Note that changing the number of iterations will not significantly impact the runtime of this experiment, as the most time-consuming part is filling up the device in thefill_device
epxeriment, which is always only done once. - Model simulations: The number of threads can be modified in
scripts/run_all.sh
or by runningscripts/run_model_sims.sh
and specifying the desired number of threads.
We first describe SquirrelFS's key claims, then describe how to generate the tables and figures in the paper after running all experiments.
-
SquirrelFS achieves similar or better performance to other in-kernel PM file systems on various benchmarks. We demonstrate this by running several microbenchmarks, macrobenchmarks, and applications on SquirrelFS and on prior PM file systems ext4-DAX, NOVA, and WineFS. Figure 5 in the paper presents these results.
-
SquirrelFS obtains statically-checked crash-consistency guarantees quickly. We demonstrate this by comparing the average single-threaded compile time for each tested file system. The results of this experiment are shown in Table 3 in the paper.
-
SquirrelFS trades off simpler ordering rules for higher mount and recovery performance. A limitation of the current prototype of SquirrelFS is that it must scan the entire PM device at mount (plus additional cleanup work during recovery), which results in slower mount and recovery times than other systems. We present average mount and recovery times for SquirrelFS in Table 2 in the paper.
-
SquirrelFS's design is correct. We model-checked SquirrelFS's design to gain confidence that the ordering rules enforced by the compiler provide crash consistency.
We provide scripts to generate Figure 5 and Tables 2 and 3 from the paper, and to process other data (Linux checkout times, model checking simulation results) into a readable format.
To generate all scripts and tables, run scripts/process_results.sh <output_dir>
. If you have modified any of the experiment scripts (e.g., output directory, number of iterations), please update this script accordingly.
This script creates a directory results-ae
, generates the following files, and places them in that directory.
figure5.pdf
: A PDF with bar charts showing latency or throughput for the system call latency, filebench, RocksDB, and LMDB workloads. This graph should look similar to Figure 5 in the paper.checkout_timing.txt
: A text file with a table containing the average time in seconds to check out different versions of the Linux kernel withgit
on each file system. We do not provide these numbers directly in the paper due to space limitations, but we expect the averages for each file system to be within roughly 10% of each other for a given version.remount_timing.txt
: A text file with a table containing the average time in seconds to mount SquirrelFS in different configurations. The exact numbers will differ on different systems, but we expect them to be roughly proportional to the numbers in Table 2 after accounting for differences in PM device size.compilation.txt
: A text file with a table containing the average time to compile each file system, corresponding to Table 3 in the paper. The exact numbers will differ on different systems, but we expect them to follow the same pattern as in the paper.model_results
: A text file indicating how many model simulations passed. This file's contents should be:
Passed: 110 Failed: 0
Total simulations run: 110