Tracing GC for data segments #8

dgrnbrg · 2016-05-08T03:05:37Z

We need a tracing GC for data segments, so that we can know which blocks are no longer active and can be garbage collected. We'll build this as a storage middleware.

This will ensure the DB doesn't grow to use unbounded backend storage space.

dgrnbrg · 2016-05-08T03:33:47Z

The general algorithm will probably be to enumerate all the keys in the retained trees, and then scan the old roots and delete all keys which are not in any retained tree.

One thing we could try is to store a bloom filter of each tree's root. Then, when we want to GC starting from an old root, we could check each of its nodes against all the newer bloom filters, and only delete the node if it's not in a newer bloom filter. We could quickly keep these filters up-to-date by augmenting the HH tree with them, but they may waste a lot of storage space.

It would be better if we had an algorithm that traded off storage space & # of keys we need to scan from the live trees. Perhaps we could augment the HH tree, but skip some of the levels?

dgrnbrg modified the milestone: Alpha Release May 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing GC for data segments #8

Tracing GC for data segments #8

dgrnbrg commented May 8, 2016

dgrnbrg commented May 8, 2016

Tracing GC for data segments #8

Tracing GC for data segments #8

Comments

dgrnbrg commented May 8, 2016

dgrnbrg commented May 8, 2016