This repository accompanies the following research paper:
Joseph Gentle and Martin Kleppmann. Collaborative Text Editing with Eg-walker: Better, Faster, Smaller. 20th European Conference on Computer Systems (EuroSys), March 2025. doi:10.1145/3689031.3696076, arXiv:2409.14252
It contains everything needed to produce the Eg-walker paper, including reproducing all contained results.
A rough breakdown of the files and folders:
tools/
: A series of implementations of various code used for data conversion and benchmarking. See below for details.results/
: This folder contains all benchmarking results, in JSON format. The files here are generated from various scripts. See the included README.md for more details.datasets/
: The editing traces (datasets) used in this paper. The raw datasets are contained indatasets/raw
and all other files are generated from the raw datasets.egwalker-reference
: This is a reference implementation of the egwalker algorithm described in this paper, written in typescript.svg-plot
: This tool generates charts in SVG format in thediagrams/
folder from JSON data inresults/
.eg-walker.tex
: LaTeX source for the paper itself. The paper embeds SVGs fromdiagrams/
and uses some stats from theresults/
directory for tables.
This repository contains everything you need to fully reproduce all results in the paper.
Install rust and nodejs. There are two options to run all our benchmarks.
Option 1: Use make:
make clean
make
Option 2: Use included shell scripts:
./step0-clean.sh
./step1-prepare.sh
./step2a-memusage.sh
./step2b-benchmarks.sh
node collect.js
Note benchmarks take ~12 hours or so to run. (Almost all of this time is the OT/A2 benchmark - which takes about 1 hour per sample, and we collect 10 samples).
You can use git diff
on results/timings.json
(and other files) to see how your experimental results compare to ours.
- Diamond-types (or sometimes DT) is the name of our optimized rust eg-walker implementation.
- DT-CRDT is the name of our reference CRDT implementation.
Tools: You will need the following tools installed on your computer:
- Rust compiler & toolchain: Any "recent" version of rust should work. The published version of the paper used rust 1.78. The easiest way to install rust is via rustup.
- NodeJS: Nodejs is only used for scripting - like extracting benchmarking results into 'clean' JSON files and generating the charts used in the paper.
- (Optional): GNU Make 4.3. We have alternative shell scripts if you don't have make available, but its less convenient.
We used rust 1.80 and nodejs v21 when generating the results in the current version of the paper.
This process has only been tested on linux, but it should work on other broadly supported platforms (like macos) too.
Our results are checked in to the repository. You can safely remove them before re-running the experiments, then use git diff
to compare your results with ours.
We recommend removing our results if you would like to regenerate all our benchmarking results:
rm -f results/*
The 7 raw editing traces are checked into the repository at datasets/raw
. These files are stored in the diamond types packed binary format.
Before the benchmarks can be run, we do the following preprocessing steps on these files:
- The traces are "duplicated" to make them all about the same size. (For example,
datasets/raw/friendsforever.dt
is duplicated 25 times and repacked asdatasets/C1.dt
). - The datasets are exported as JSON
- The JSON traces are converted to "native" Yjs and Automerge formats.
Conversion is slow, and this step is optional. For convenience, the converted files are already checked into git in the datasets directory.
The first 2 steps make use of a CLI tool in tools/diamond-types/crates/dt-cli
. The final step uses tools/crdt-converter
. Both tools are built automatically.
You can re-run step 1 as follows (time taken: 4 hours or so):
rm datasets/*
./step1-prepare.sh
Output: datasets/*.am, *.yjs, *.json
These files should be byte-for-byte identical with the files distributed via this git repository. Ie, after regenerated these files, git status
should report no changes.
The size of the files produced during this step are measured to produce Figure 11 and Figure 12 in the paper.
This process takes an unreasonably long time to run! Sorry! We just didn't optimize this process much, since you only need to do this once. You can also run these scripts in parallel using make. For example:
make -j16 all_datasets
(The all_datasets
make rule corresponds to this conversion step.)
There's a series of benchmarks to run. For each algorithm, for each editing trace, we do the following tests:
- Measure memory usage
- Measure time taken to merge in the editing trace (as if from a remote peer)
Algorithms:
- DT (our optimised reference egwalker implementation)
- DT-CRDT (our reference CRDT implementation)
- Automerge
- Yjs
- Yrs (Yjs rust port)
- OT (our reference OT implementation)
Note: Our OT implementation takes 1 hour to replay the A2 editing trace.
We consider the core benchmark in the paper to be the "remote time" - which is the amount of time (& memory usage) taken when merging all remote edits over the network.
We do this test for automerge, yjs (/ yrs), diamond-types (our optimized eg-walker implementation), dtcrdt (our reference CRDT implementation) and our reference OT implementation.
Each speed benchmark also has a corresponding memory usage benchmark, initiated with a different command.
Measure memory usage (time estimate: 1h10min):
./step2a-memusage.sh
Benchmark remote merging across all algorithms (time estimate: 12 hours):
./step2b-benchmarks.sh
Output: Lots of files in results/
. Criterion benchmarks are stored in directories like target/criterion/automerge/remote/A1/
.
Most of the benchmarks are done using Criterion.rs. Criterion results are stored in JSON files like target/criterion/automerge/remote/A1/base/estimates.json
. (Criterion also records the time taken for every sample, and various other cool things! Take a look at target/criterion/report/index.html
in your browser!)
The next step is pulling that data out into simple, usable json file.
Time taken: 1 second
node collect.js
This creates results/timings.json
, which is used for many of the charts. It also emits results/yjs_am_sizes.json
containing the file sizes for the yjs & automerge binary formats.
Our experimental results are checked in to git. If you're trying to reproduce our experimental results, you can run git diff results/timings.json
to see how your remote merge times differed from ours.
Our diagrams are generated by our svg-plot
tool. You can regenerate them like this:
cd svg-plot
npm i # only needed once to install dependencies
node render.js
This tool outputs a set of SVGs in diagrams/*.svg
. To convert them to PDF (required for embedding into LaTeX), we manually open them in Inkscape and export to PDF. (We previously used rsvg-convert
, but it embeds Type 3 fonts, which are not allowed in the final paper.)
Our paper is written using LaTeX, and you can build it in the usual way:
pdflatex eg-walker
bibtex eg-walker
pdflatex eg-walker
pdflatex eg-walker
An earlier version of the paper, which was written using Typst, can be found in reg-text.typ
. The LaTeX version is the more up-to-date.
After you have done all of the above steps, you can compare your experimental results to ours.
Note that some of the most important claims in the paper are properties of the algorithm, not something we can benchmark. The biggest one is the claim that a peer can emit editing events directly without reference to any metadata.
This falls out of the algorithm (since there's no equivalent to a CRDT prepare function). But we can't measure it.
However, you should still be able to validate all our performance claims!
diagrams/timings.svg
should match Figure 8. Obviously, your computer may be faster or slower than ours. But the relative speed of the various algorithms should remain intact. This diagram is generated from results/timings.json
. You can compare the raw benchmarking results with ours using git diff results/timings.json
.
diagrams/memusage.svg
should match Figure 10. Run-to-run variance in this test should be very small. This diagram is generated from files with the pattern of results/(alg)_memusage.json
.
diagrams/filesize_full.svg
and diagrams/filesize_smol.svg
should match Figure 11 and Figure 12. These diagrams are generated from the size of the files in the datasets/
directory. The "base sizes" (the size of all raw text) is shown in results/dataset_stats.json
, along with various other stats. (This file is generated during step 2a from running tools/diamond-types/target/release/paper-stats
).
diagrams/plot_ff.svg
was ultimately cut from the paper to keep the size down. You can ignore this one!
Table 1 in the paper is generated from results/dataset_stats.json
. This file is generated from the source code in tools/diamond-types/crates/paper-stats/src/main.rs
, and converted into a LaTeX table by running node datasets-table.js
. The columns are populated as follows:
- repeats: Hardcoded in the table. But these numbers should match the equivalent
-n3
/-n25
/ etc numbers in the calls tobench-duplicate
fromstep1-prepare.sh
andMakefile
. - Events (k):
total_keystrokes
field from dataset_stats. Our egwalker implementation treats every single-character insert or delete as a distinct event. (Events are eagerly run-length encoded throughout the code for performance, but semantically, each insert or delete of a single character is distinct.) - Avg Concurrency:
concurrency_estimate
from dataset_stats. This estimate is calculated by taking the mean of the number of other events which are concurrent with each event in the trace. Code for this is in estimate_concurrency in graph/tools.rs. - Graph Runs is calculated by looking at the length of the run-length encoded graph data structure. The graph is stored as a list. Each item in the list corresponds to a sequence of events such that each event n after the first event in the range has parents of n - 1. How many such runs do we need to encode the graph? This is populated from
graph_rle_size
in dataset_stats. - Authors is the number of unique authors who contributed to the trace. This is mostly not recorded in the files at all.
- Chars remaining is calculated from
data.final_doc_len_chars / data.num_insert_keystrokes
, as a percentage. - Final size is
data.final_doc_len_utf8 / 1024
.
The easiest way to use our editing traces is by using the json formatted versions in datasets
. These files contain lists of editing transactions. Each transaction contains a non-empty set of patches (edits to the document) made by some user agent at some point in time. (Ie, after some other set of transactions have been merged). The file format is described in the editing-traces
repository.
If you want to benchmark a CRDT using these editing traces, you need to convert them to your CRDT's local format. We do this by simulating (in memory) a set of collaborating peers. The peers fork and merge their changes. See tools/crdt-converter/src/main.rs for code to perform this process using automerge and yjs (yrs).
This simulation process is usually very slow, but that doesn't really matter. Our benchmarks take this converted file and merge it directly.
The paper text is copyright Joseph Gentle and Martin Kleppmann. It is made available under a Creative Commons CC-BY 4.0 license.
Most of the raw editing traces are also available at josephg/editing-traces. See that repository for details & licensing information.
All source code (Code in tools/
folder) is provided herein is licensed under the ISC license:
ISC License
Copyright 2024 Joseph Gentle
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.