handler: avoid repetitive bitmap writebacks on evict #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit changes the behavior of the database handler which manages the state of on-disk allocation bitmaps. The old behavior relied on keeping a single root node in cache to avoid consequent writebacks of updates inbetween sync's. An evicted root node with the current inter-sync data attached (e.g. bitmaps and size tracking info from the last root node writeback at the very end of the last sync which is written back to the superblock) would be in the "modified" state and therefore requiring a writeback when new entries should be added, this does two things it moves the node to the "in writeback" state, and it triggers a new allocation in the writeback sequence, which moves the node to the "modified" state and erases the writeback validity and thus restarting the cycle again. This resulted in tanking of performance on read-heavy queries, some preliminary results from testing on my machine shows 2x in some benchmarks and much better scaling behavior after this patch. The main problems for the slowdown seemed to be the repetitive cache eviction calls, which lock down the cache for all reading threads, and the additional load on storage device leading to some degradation especially in read-only scenarios.
This commit changes the behavior, instead of committing all changes directly to the root tree on updates it buffers these changes together with their allocators in the handler itself. Their is still the same guarantee of disk consistency, as we only guarantee persistency on successful sync's, but all changes are just dumped into the root tree on the invocation of sync. Furthermore, this requires us to cache the changes in a format which allows further allocations without overwriting the old ones, for this purpose a small cache is added directly in the handler to accomodate the changed bitmaps. This cache is emptied on sync's to accurately free cow-afflicted regions after sync calls. This cache is unbound at the moment, but it requires 32 MiB for 1 TiB of active data without any sync's inbetween. Which I would assume as unlikely.