Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handler: avoid repetitive bitmap writebacks on evict #53

Merged
merged 2 commits into from
Mar 7, 2024

Conversation

fia0
Copy link

@fia0 fia0 commented Feb 16, 2024

This commit changes the behavior of the database handler which manages the state of on-disk allocation bitmaps. The old behavior relied on keeping a single root node in cache to avoid consequent writebacks of updates inbetween sync's. An evicted root node with the current inter-sync data attached (e.g. bitmaps and size tracking info from the last root node writeback at the very end of the last sync which is written back to the superblock) would be in the "modified" state and therefore requiring a writeback when new entries should be added, this does two things it moves the node to the "in writeback" state, and it triggers a new allocation in the writeback sequence, which moves the node to the "modified" state and erases the writeback validity and thus restarting the cycle again. This resulted in tanking of performance on read-heavy queries, some preliminary results from testing on my machine shows 2x in some benchmarks and much better scaling behavior after this patch. The main problems for the slowdown seemed to be the repetitive cache eviction calls, which lock down the cache for all reading threads, and the additional load on storage device leading to some degradation especially in read-only scenarios.

This commit changes the behavior, instead of committing all changes directly to the root tree on updates it buffers these changes together with their allocators in the handler itself. Their is still the same guarantee of disk consistency, as we only guarantee persistency on successful sync's, but all changes are just dumped into the root tree on the invocation of sync. Furthermore, this requires us to cache the changes in a format which allows further allocations without overwriting the old ones, for this purpose a small cache is added directly in the handler to accomodate the changed bitmaps. This cache is emptied on sync's to accurately free cow-afflicted regions after sync calls. This cache is unbound at the moment, but it requires 32 MiB for 1 TiB of active data without any sync's inbetween. Which I would assume as unlikely.

Notes_240217_161640

Johannes Wünsche added 2 commits February 16, 2024 20:17
This commit changes the behavior of the database handler which manages the
state of on-disk allocation bitmaps. The old behavior relied on keeping a
single root node in cache to avoid consequent writebacks of updates inbetween
sync's. An evicted root node with the current inter-sync data attached (e.g.
bitmaps and size tracking info from the last root node writeback at the very
end of the last sync which is written back to the superblock) would be in the
"modified" state and therefore requiring a writeback when new entries should be
added, this does two things it moves the node to the "in writeback" state, and
it triggers a new allocation in the writeback sequence, which moves the node to
the "modified" state and erases the writeback validity and thus restarting the
cycle again. This resulted in tanking of performance on read-heavy queries,
some preliminary results from testing on my machine shows 2x in some benchmarks
and much better scaling behavior after this patch. The main problems for the
slowdown seemed to be the repetitive cache eviction calls, which lock down the
cache for all reading threads, and the additional load on storage device
leading to some degradation especially in read-only scenarios.

This commit changes the behavior, instead of committing all changes directly to
the root tree on updates it buffers these changes together with their
allocators in the handler itself. Their is still the same guarantee of disk
consistency, as we only guarantee persistency on successful sync's, but all
changes are just dumped into the root tree on the invocation of sync.
Furthermore, this requires us to cache the changes in a format which allows
further allocations without overwriting the old ones, for this purpose a small
cache is added directly in the handler to accomodate the changed bitmaps. This
cache is emptied on sync's to accurately free cow-afflicted regions after sync
calls. This cache is unbound at the moment, but it requires 32 MiB for 1 TiB of
*active* data without any sync's inbetween. Which I would assume as unlikely.
Erasing the old root ptr lock lead to reallocation of root node pages which
eliminates consistency. This commit simply updates this entry to the new
value.
@fia0
Copy link
Author

fia0 commented Feb 18, 2024

As an example for a less extreme case as the 2x (which is a ycsb workload C like benchmark case) here is a comparison:

This is a random 50/50 read/write case over multiple sizes in an object store before the patch:
Before

This is the same case after the patch:
evaluation_rw

Major changes between these two runs is the reduction in outliers on the upper bound of latencies ranges. These are replaced with some outliers which in contrast lie higher than the previous experienced ones. For context, this case performs a sync every second so some of the observed outliers are likely occuring due to this as we no longer dissipate messages over longer time down the root tree but batch them together.

@fia0 fia0 merged commit f49de67 into parcio:main Mar 7, 2024
8 checks passed
@fia0 fia0 mentioned this pull request Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant