diff --git a/dev-docs/fptree-agg-from-root.png b/dev-docs/fptree-agg-from-root.png new file mode 100644 index 0000000000..024d0b8418 Binary files /dev/null and b/dev-docs/fptree-agg-from-root.png differ diff --git a/dev-docs/fptree-agg-lca.png b/dev-docs/fptree-agg-lca.png new file mode 100644 index 0000000000..e88fed677e Binary files /dev/null and b/dev-docs/fptree-agg-lca.png differ diff --git a/dev-docs/fptree-agg-limit-wraparound.png b/dev-docs/fptree-agg-limit-wraparound.png new file mode 100644 index 0000000000..73fa84762a Binary files /dev/null and b/dev-docs/fptree-agg-limit-wraparound.png differ diff --git a/dev-docs/fptree-agg-limit.png b/dev-docs/fptree-agg-limit.png new file mode 100644 index 0000000000..9700d69fe0 Binary files /dev/null and b/dev-docs/fptree-agg-limit.png differ diff --git a/dev-docs/fptree-agg-wraparound.png b/dev-docs/fptree-agg-wraparound.png new file mode 100644 index 0000000000..cc536d3f06 Binary files /dev/null and b/dev-docs/fptree-agg-wraparound.png differ diff --git a/dev-docs/fptree-with-values.png b/dev-docs/fptree-with-values.png new file mode 100644 index 0000000000..40ee6cb72d Binary files /dev/null and b/dev-docs/fptree-with-values.png differ diff --git a/dev-docs/fptree.excalidraw.gz b/dev-docs/fptree.excalidraw.gz new file mode 100644 index 0000000000..1d018b5d87 Binary files /dev/null and b/dev-docs/fptree.excalidraw.gz differ diff --git a/dev-docs/fptree.png b/dev-docs/fptree.png new file mode 100644 index 0000000000..93ef265ddf Binary files /dev/null and b/dev-docs/fptree.png differ diff --git a/dev-docs/sync2-set-reconciliation.md b/dev-docs/sync2-set-reconciliation.md index 0ffdbe2cfe..b66248116f 100644 --- a/dev-docs/sync2-set-reconciliation.md +++ b/dev-docs/sync2-set-reconciliation.md @@ -27,6 +27,14 @@ - [Redundant ItemChunk messages](#redundant-itemchunk-messages) - [Range checksums](#range-checksums) - [Bloom filters for recent sync](#bloom-filters-for-recent-sync) +- [FPTree Data Structure](#fptree-data-structure) + - [Tree structure](#tree-structure) + - [Aggregation](#aggregation) + - [Aggregation of normal ranges](#aggregation-of-normal-ranges) + - [Aggregation of wraparound ranges](#aggregation-of-wraparound-ranges) + - [Splitting ranges and limited aggregation](#splitting-ranges-and-limited-aggregation) + - [Tree node representation](#tree-node-representation) + - [Accessing the database](#accessing-the-database) - [Multi-peer Reconciliation](#multi-peer-reconciliation) - [Deciding on the sync strategy](#deciding-on-the-sync-strategy) - [Split sync](#split-sync) @@ -775,11 +783,291 @@ just want to bring them closer to each other. That being said, a sufficient size of the Bloom filter needs to be chosen to minimize the number of missed elements. +# FPTree Data Structure + +FPTree (fingerprint tree) is data structure intended to facilitate +synchronization of objects stored in an SQLite database, with +hash-based IDs. It stores fingerprints (IDs XORed together) and item +counts for ID ranges. + +## Tree structure + +FPTree has the following properties: + +1. FPTree is an in-memory structure that provides efficient item count + and fingerprints for ID (item/key) ranges, trying to do its best to + avoid doing database queries. The queries may be entirely avoided + if ranges are aligned on the node boundaries. +1. FPTree is a binary trie (prefix tree), following the bits in the + IDs starting from the highest one. The intent is to convert it to a + proper radix tree instead, but that's not implemented yet. +1. FPTree relies on IDs being hashes and thus being uniformly + distributed to ensure balancedness of the tree, instead of using a + balancing mechanism such as red-black tree. +1. FPTree provides a range split mechanism (needed for pairwise sync) + which tries to ensure that the ranges are aligned on node + boundaries up to certain subdivision depth. +1. Full FPTree copy operation is `O(1)` in terms of time and + memory. The copies are safe for concurrent use. +1. FPTree can also store the actual IDs without the use of an + underlying table. +1. FPTrees can be "stacked" together. The FPTree-based `OrderedSet` + implementation uses 2 FPTrees, one database-bound and another one + fully in-memory. The in-memory FPTree is used to store fresh items + received via the [Recent sync](#recent-sync) mechanism. +1. FPTrees performs queries on ranges `[x,y)`, supporting normal `x < + y` ranges, as well as wraparound `x > y` ranges and full set range + `[x,x)` (see [Range representation](#range-representation)). +1. Each FPTree node has corresponding bit prefix by which it can be + reached. + +The tree structure is shown on the diagram below. The leaf nodes +correspond to the rows in database table with IDs having the bit prefix +corresponding to the leaf node. + +![FPTree structure](fptree.png) + +As it is mentioned above, FPTree itself can also store the actual IDs, +without using an underlying database table.\ + +![FPTree with values](fptree-with-values.png) + +## Aggregation + +Aggregation means calculation of fingerprint and item count for a +range. The aggregation is done using different methods depending on +whether the `[x,y)` range is normal (`xy`) or +indicates the whole set (`x=y`). Aggregation may also be bounded by +the maximum number of items to include. The easiest case is full set +aggregation, in which we just take the fingerprint and count values +from the root node of the FPTree. + +### Aggregation of normal ranges + +In case of a normal range `[x,y)` with `xy`, `aggregateLeft` and +`aggregateRight` are used, too. Somewhat unintuitively, in this case +`aggregateLeft` is used on the right side of the tree, b/c that's +where the beginning ("left side") of the wrapped-around `[x,y)` range +lies, whereas `aggregateRight` is applied to the left side of the tree +corresponding to the end ("right side") of the range. + +The subtree on which `aggregateLeft` is done is rooted at the node +reachable by following the longest prefix of `x` consisting entirely +of `1`s. Conversely, the subtree on which `aggregateRight` is done is +rooted at the node reachable by following the longest prefix of `y` +consisting entirely of `0`s. + +The figure below shows aggregation of the `[x,y)` range with +`x=0xD1..` and `y=0x29`. + +![Aggregation of a wrapped-around range](fptree-agg-wraparound.png) + +## Splitting ranges and limited aggregation + +During recursive set reconciliation, range split operation often needs +to be performed. This involves partitioning the range roughly in half +with respect to the number of items in each new subrange, and +calculating item count and fingerprint for each part resulting from +the split. FPTree will try to perform such an operation on node +boundary, but if the range is to small or not aligned to the node +boundary, the following is done: + +1. The number of items in the range obtained (`N`). +2. The items in the range are aggregated with the cap on maximum + aggregated count equal to `N/2`, and the non-inclusive upper bound + of the aggregated subrange is noted (`m`). The aggregated items + can be said to lie in range `[x,m)` +3. The second half of the range is aggregated starting with `m`. This + part of the range is `[m,y)`. + +In both cases, the operation is based upon imposing the limit on +number of items aggregated. In the easy, node-aligned case, the +aggregation continues after exhausting the limit on the total item +count, but using separate places for accumulation of remaining nodes' +fingerprints and counts. The initial accumulated fingerprint and count +are returned for the first resulting subrange, and the second +accumulated fingerprint and count are returned for the second subrange +resulting from the partition. In case if node-aligned "easy split" +cannot be done, aggregation stops after exhausting the limit. + +When limited aggregation is done, instead of including full right +subtrees during `aggregateLeft`, including full left subtrees during +`aggregateRight`, and including the whole tree during `[x,x)` (full +set) range aggregation, when subtree count exceeds the remaining limit +after processing all the nodes visited so far, the corresponding +subtrees are descended into to find the cutoff point. + +Below limited aggregation is shown for a normal `x= ? AND + "rowid" <= ? ORDER BY "id" LIMIT ? +``` + +Select number of recently received items items for recent sync +(which is not done using FPTree): +```sql +SELECT count("id") FROM "atxs" WHERE "epoch" = ? AND + "rowid" <= ? AND "received" >= ? +``` + +Select recently received IDs: +```sql +SELECT "id" FROM "atxs" WHERE "epoch" = ? AND "id" >= ? AND + "rowid" <= ? AND "received" >= ? ORDER BY "id" LIMIT ? +``` + # Multi-peer Reconciliation -The multi-peer reconciliation approach is loosely based on -[SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains](https://people.bu.edu/staro/2023-ICBC-Novak.pdf) -paper by Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski. +The multi-peer reconciliation approach is loosely based on [SREP: +Out-Of-Band Sync of Transaction Pools for Large-Scale +Blockchains](https://people.bu.edu/staro/2023-ICBC-Novak.pdf) paper by +Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski. ![Multi-peer set reconciliation](multipeer.png)