A Postgres extension for quickly creating "branches" of individual databases within a Postgres cluster using copy-on-write file systems like BTRFS
.
Warning This is pre-alpha software, meant as an experimental proof-of-concept. Use at your own risk!
Postgres makes it easy to create new, empty databases with the CREATE DATABASE
command. It's so easy, in fact, that one would think that creating new databases from existing databases would be easy, too. But it's not.
Postgres provides the option to create one database from another using CREATE DATABASE name [WITH] [TEMPLATE template]
, but doing so has two major restrictions:
- there can be no active connections to the
template
database, and... - performance degrades rapidly as the size of the database increases
pg_branch
is a Postgres extension that solves those problems by giving CREATE DATABASE
the power of snapshots. If your PGDATA
directory is on a copy-on-write file system like BTRFS
, the pg_branch
extension turns every CREATE DATABASE
into an atomic file system snapshot that takes seconds instead of minutes (or hours). In addition, the copy-on-write strategy keeps disk usage low by only writing new segment data files to disk when they're modified (rather than read).
TL;DR: CREATE EXTENSION pg_branch
makes CREATE DATABASE <dbname> WITH TEMPLATE <bigdbname>
super fast
Before installing pg_branch
, it's important to configure the file system that the database cluster will use. The following steps will get you started:
Disclaimer: these steps are written with Linux in mind, and have only been testing on Linux.
-
install prerequisites
You'll need an installation of
btrfs
(usually packaged asbtrfs-progs
) as well as an up-to-date Rust toolchain and thepgrx
subcommand forcargo
. -
format a disk as BTRFS
The easiest thing to do here is plug in a USB and check which disk it is with
lsblk
. Once you've figure out which disk you'd like to reformat, you can do so with:sudo mkfs.btrfs /dev/sdX # replace sdX with your drive
-
mount your
btrfs
-formatted diskYou need a directory to mount this disk to, first. Something like:
sudo mkdir /mnt/database
...which you can then use as a mount point for your new
btrfs
drive with:sudo mount /dev/sdX /mnt/database
-
intialize a Postgres cluster on your mounted disk
cargo pgrx
can take care of initialization as long as it knows where to initialize the data through thePGRX_HOME
variable. Something like:PGRX_HOME=/mnt/database cargo pgrx init
-
clone this repo
The rest of these steps will be done from within this repo, so make sure you've run
git clone [email protected]:NAlexPear/pg_branch.git
andcd pg_branch
. -
convert all segment data directories to subvolumes
Before
pg_branch
can take over database creation, the subdirectories in the newly-initialized data directory of your database need to be converted tobtrfs
subvolumes. This repo provides aninit.sh
script for doing just this that, as long as it's provided aPGDATA
variable that points to the data directory of your cluster.pgrx
data directories have a structure of$PGRX_HOME/data-$PG_VERSION
. So if you initialized your project as instructed in step 3, you should be able to run theinit.sh
script in this repository like so:PGDATA=/mnt/database/data-15 ./init.sh
...and you should have successfully converted all of the initial databases in your cluster to subvolumes.
-
get into
psql
The quickest way to jump into a
psql
session that recognizespg_branch
is to run the following:PGX_HOME=/your/mounted/btrfs/disk cargo pgrx run
-
create the extension in
psql
withCREATE EXTENSION pg_branch
-
create some databases
After creating the extension, you can run
CREATE DATABASE <dbname> WITH TEMPLATE <template_dbname>
commands to quickly and atomically copy databases without requiring an exclusive lock or dedicated connection. To use the defaultCREATE DATABASE
behavior again, pick an explicitSTRATEGY
other thanSNAPSHOT
(i.e.WAL_COPY
orFILE_COPY
).
- distribute as pre-compiled extension
- implement a cluster-wide
fork
- support more of the options supported by
CREATE DATABASE
- streamline setup of the data directory and its file system
- support additional copy-on-write file systems like
ZFS
andXFS
- include an example Dockerfile
This project's use of file system snapshots as a branching mechanism is heavily inspired by pgcow
and Postgres.ai. And credit for the concept of "forking" Postgres clusters goes to Heroku's Database Fork feature.