Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing GC for data segments #8

Open
dgrnbrg opened this issue May 8, 2016 · 1 comment
Open

Tracing GC for data segments #8

dgrnbrg opened this issue May 8, 2016 · 1 comment
Milestone

Comments

@dgrnbrg
Copy link
Collaborator

dgrnbrg commented May 8, 2016

We need a tracing GC for data segments, so that we can know which blocks are no longer active and can be garbage collected. We'll build this as a storage middleware.

This will ensure the DB doesn't grow to use unbounded backend storage space.

@dgrnbrg dgrnbrg modified the milestone: Alpha Release May 8, 2016
@dgrnbrg
Copy link
Collaborator Author

dgrnbrg commented May 8, 2016

The general algorithm will probably be to enumerate all the keys in the retained trees, and then scan the old roots and delete all keys which are not in any retained tree.

One thing we could try is to store a bloom filter of each tree's root. Then, when we want to GC starting from an old root, we could check each of its nodes against all the newer bloom filters, and only delete the node if it's not in a newer bloom filter. We could quickly keep these filters up-to-date by augmenting the HH tree with them, but they may waste a lot of storage space.

It would be better if we had an algorithm that traded off storage space & # of keys we need to scan from the live trees. Perhaps we could augment the HH tree, but skip some of the levels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant