Skip to content

Conversation

BrianBland
Copy link
Collaborator

@BrianBland BrianBland commented Aug 6, 2025

Improves sync performance by only flushing dirty pages instead of scanning the complete file.

Tracks dirty pages as a set of non-overlapping non-adjacent runs. Marking a page dirty scales with O(log R) for R runs (worst case O(log P) for P dirty pages), but ensures that only R syscalls need to be made in order to flush to disk.

This may be further improved by rounding to discrete chunk sizes, such that R can be divided by a fixed multiple at the cost of some kernel overhead to filter out non-dirty pages.

@cb-heimdall
Copy link
Collaborator

cb-heimdall commented Aug 6, 2025

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@BrianBland BrianBland requested review from and-cb and nqd August 6, 2025 00:33
@BrianBland BrianBland force-pushed the brianbland/mmap-flush-range branch from 889b11a to 4954405 Compare August 6, 2025 00:47
file_len: AtomicU64,
page_count: AtomicU32,
// A set of reallocated dirty pages which need to be flushed to disk.
dirty_set: Mutex<HashSet<usize>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should better providing a capacity so that on first few iteration the set doesn't get allocated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, otherwise this is just part of the natural prewarming of the process. The capacity realistically depends on both the size of the database as well as the maximum number of writes per transaction, which makes this difficult to estimate generically.

pub fn sync(&self) -> io::Result<()> {
if cfg!(not(miri)) {
self.mmap.flush()
let mut dirty_set = self.dirty_set.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this will increase write a lot. Curious about performance number though

nqd
nqd previously approved these changes Aug 6, 2025
for offset in dirty_set.drain() {
self.mmap.flush_range(offset, Page::SIZE)?;
}
if let Some(range) = self.dirty_range.lock().take() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, when this is called, the dirty_set is still locked, so this second lock is a bit redundant. I would use a single Mutex around a new DirtyPages struct that holds both the set and the range, but this can be left as an improvement for a future PR.

@and-cb
Copy link
Collaborator

and-cb commented Aug 6, 2025

The pages are added to the dirty set when they are requested, and removed from it when sync() is called. This means that if someone obtains a PageMut, calls sync(), modifies the page, and the calls sync() again, the second time sync() is called there won't be any write, contrary to expectations. I think the pages should be added to the dirty set when PageMut is dropped instead.

Given how or code is structured, this is a purely theoretical problem: StorageEngine always operates in a get pages -> write pages -> drop pages -> sync fashion, so what I wrote above can't happen, but I would at least document this behavior.

Tracks dirty pages as a set of non-overlapping non-adjacent runs.
Marking a page dirty scales with O(log R) for R runs (worst case O(log
P) for P dirty pages), but ensures that only R syscalls need to be made
in order to flush to disk.

This may be further improved by rounding to discrete chunk sizes, such
that R can be divided by a fixed multiple at the cost of some kernel
overhead to filter out non-dirty pages.
@BrianBland BrianBland force-pushed the brianbland/mmap-flush-range branch from 4954405 to 77a89e6 Compare August 22, 2025 20:24
@cb-heimdall cb-heimdall dismissed nqd’s stale review August 22, 2025 20:24

Approved review 3090473464 from nqd is now dismissed due to new commit. Re-request for approval.

@BrianBland
Copy link
Collaborator Author

BrianBland commented Aug 22, 2025

I think the pages should be added to the dirty set when PageMut is dropped instead.

This seems to imply that the pages must contain a reference back to their manager in order to perform the cleanup hook. Should we add a pointer back to the PageManager on the PageMut struct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants