handler: avoid repetitive bitmap writebacks on evict #53

fia0 · 2024-02-16T19:17:57Z

This commit changes the behavior of the database handler which manages the state of on-disk allocation bitmaps. The old behavior relied on keeping a single root node in cache to avoid consequent writebacks of updates inbetween sync's. An evicted root node with the current inter-sync data attached (e.g. bitmaps and size tracking info from the last root node writeback at the very end of the last sync which is written back to the superblock) would be in the "modified" state and therefore requiring a writeback when new entries should be added, this does two things it moves the node to the "in writeback" state, and it triggers a new allocation in the writeback sequence, which moves the node to the "modified" state and erases the writeback validity and thus restarting the cycle again. This resulted in tanking of performance on read-heavy queries, some preliminary results from testing on my machine shows 2x in some benchmarks and much better scaling behavior after this patch. The main problems for the slowdown seemed to be the repetitive cache eviction calls, which lock down the cache for all reading threads, and the additional load on storage device leading to some degradation especially in read-only scenarios.

This commit changes the behavior, instead of committing all changes directly to the root tree on updates it buffers these changes together with their allocators in the handler itself. Their is still the same guarantee of disk consistency, as we only guarantee persistency on successful sync's, but all changes are just dumped into the root tree on the invocation of sync. Furthermore, this requires us to cache the changes in a format which allows further allocations without overwriting the old ones, for this purpose a small cache is added directly in the handler to accomodate the changed bitmaps. This cache is emptied on sync's to accurately free cow-afflicted regions after sync calls. This cache is unbound at the moment, but it requires 32 MiB for 1 TiB of active data without any sync's inbetween. Which I would assume as unlikely.

This commit changes the behavior of the database handler which manages the state of on-disk allocation bitmaps. The old behavior relied on keeping a single root node in cache to avoid consequent writebacks of updates inbetween sync's. An evicted root node with the current inter-sync data attached (e.g. bitmaps and size tracking info from the last root node writeback at the very end of the last sync which is written back to the superblock) would be in the "modified" state and therefore requiring a writeback when new entries should be added, this does two things it moves the node to the "in writeback" state, and it triggers a new allocation in the writeback sequence, which moves the node to the "modified" state and erases the writeback validity and thus restarting the cycle again. This resulted in tanking of performance on read-heavy queries, some preliminary results from testing on my machine shows 2x in some benchmarks and much better scaling behavior after this patch. The main problems for the slowdown seemed to be the repetitive cache eviction calls, which lock down the cache for all reading threads, and the additional load on storage device leading to some degradation especially in read-only scenarios. This commit changes the behavior, instead of committing all changes directly to the root tree on updates it buffers these changes together with their allocators in the handler itself. Their is still the same guarantee of disk consistency, as we only guarantee persistency on successful sync's, but all changes are just dumped into the root tree on the invocation of sync. Furthermore, this requires us to cache the changes in a format which allows further allocations without overwriting the old ones, for this purpose a small cache is added directly in the handler to accomodate the changed bitmaps. This cache is emptied on sync's to accurately free cow-afflicted regions after sync calls. This cache is unbound at the moment, but it requires 32 MiB for 1 TiB of *active* data without any sync's inbetween. Which I would assume as unlikely.

Erasing the old root ptr lock lead to reallocation of root node pages which eliminates consistency. This commit simply updates this entry to the new value.

fia0 · 2024-02-18T01:33:46Z

As an example for a less extreme case as the 2x (which is a ycsb workload C like benchmark case) here is a comparison:

This is a random 50/50 read/write case over multiple sizes in an object store before the patch:

This is the same case after the patch:

Major changes between these two runs is the reduction in outliers on the upper bound of latencies ranges. These are replaced with some outliers which in contrast lie higher than the previous experienced ones. For context, this case performs a sync every second so some of the observed outliers are likely occuring due to this as we no longer dissipate messages over longer time down the root tree but batch them together.

Johannes Wünsche added 2 commits February 16, 2024 20:17

handler: fix reused allocation of previous root node allocation

3cd745a

Erasing the old root ptr lock lead to reallocation of root node pages which eliminates consistency. This commit simply updates this entry to the new value.

fia0 merged commit f49de67 into parcio:main Mar 7, 2024
8 checks passed

fia0 mentioned this pull request Apr 8, 2024

Segment Bitmap Cache #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handler: avoid repetitive bitmap writebacks on evict #53

handler: avoid repetitive bitmap writebacks on evict #53

fia0 commented Feb 16, 2024 •

edited

Loading

fia0 commented Feb 18, 2024

handler: avoid repetitive bitmap writebacks on evict #53

handler: avoid repetitive bitmap writebacks on evict #53

Conversation

fia0 commented Feb 16, 2024 • edited Loading

fia0 commented Feb 18, 2024

fia0 commented Feb 16, 2024 •

edited

Loading