-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Mmap page manager only flushes dirty pages #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🟡 Heimdall Review Status
|
889b11a
to
4954405
Compare
src/page/manager/mmap.rs
Outdated
file_len: AtomicU64, | ||
page_count: AtomicU32, | ||
// A set of reallocated dirty pages which need to be flushed to disk. | ||
dirty_set: Mutex<HashSet<usize>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should better providing a capacity so that on first few iteration the set doesn't get allocated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, otherwise this is just part of the natural prewarming of the process. The capacity realistically depends on both the size of the database as well as the maximum number of writes per transaction, which makes this difficult to estimate generically.
src/page/manager/mmap.rs
Outdated
pub fn sync(&self) -> io::Result<()> { | ||
if cfg!(not(miri)) { | ||
self.mmap.flush() | ||
let mut dirty_set = self.dirty_set.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this will increase write a lot. Curious about performance number though
src/page/manager/mmap.rs
Outdated
for offset in dirty_set.drain() { | ||
self.mmap.flush_range(offset, Page::SIZE)?; | ||
} | ||
if let Some(range) = self.dirty_range.lock().take() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, when this is called, the dirty_set
is still locked, so this second lock is a bit redundant. I would use a single Mutex
around a new DirtyPages
struct that holds both the set and the range, but this can be left as an improvement for a future PR.
The pages are added to the dirty set when they are requested, and removed from it when Given how or code is structured, this is a purely theoretical problem: |
Tracks dirty pages as a set of non-overlapping non-adjacent runs. Marking a page dirty scales with O(log R) for R runs (worst case O(log P) for P dirty pages), but ensures that only R syscalls need to be made in order to flush to disk. This may be further improved by rounding to discrete chunk sizes, such that R can be divided by a fixed multiple at the cost of some kernel overhead to filter out non-dirty pages.
4954405
to
77a89e6
Compare
Approved review 3090473464 from nqd is now dismissed due to new commit. Re-request for approval.
This seems to imply that the pages must contain a reference back to their manager in order to perform the cleanup hook. Should we add a pointer back to the |
Improves sync performance by only flushing dirty pages instead of scanning the complete file.
Tracks dirty pages as a set of non-overlapping non-adjacent runs. Marking a page dirty scales with
O(log R)
forR
runs (worst caseO(log P)
forP
dirty pages), but ensures that onlyR
syscalls need to be made in order to flush to disk.This may be further improved by rounding to discrete chunk sizes, such that
R
can be divided by a fixed multiple at the cost of some kernel overhead to filter out non-dirty pages.