-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: batch InMemoryLayer put
s, remove need to sort items by LSN during ingest
#8591
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jcsp
added
c/storage/pageserver
Component: storage: pageserver
a/tech_debt
Area: related to tech debt
labels
Aug 2, 2024
2198 tests run: 2134 passed, 0 failed, 64 skipped (full report)Flaky tests (2)Postgres 15Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
67ef0e4 at 2024-08-22T10:14:09.856Z :recycle: |
jcsp
changed the title
pageserver: avoid a spurious sort during ingest
pageserver: batch InMemoryLayer Aug 2, 2024
put
s, remove need to sort items by LSN during ingest
jcsp
force-pushed
the
jcsp/ingest-refactor-pt0
branch
from
August 2, 2024 18:12
933cf8f
to
30bae03
Compare
jcsp
force-pushed
the
jcsp/ingest-refactor-pt0
branch
3 times, most recently
from
August 6, 2024 14:58
d559ee2
to
cb393b1
Compare
5 tasks
jcsp
force-pushed
the
jcsp/ingest-refactor-pt0
branch
from
August 14, 2024 10:41
cb393b1
to
1c0264d
Compare
VladLazar
reviewed
Aug 14, 2024
This should be good to go, but I plan on merging it after Monday's release branch so that it gets a week in staging. |
VladLazar
reviewed
Aug 16, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - just nits
VladLazar
approved these changes
Aug 16, 2024
jcsp
added a commit
that referenced
this pull request
Sep 3, 2024
…8621) ## Problem Currently, DatadirModification keeps a key-indexed map of all pending writes, even though we (almost) never need to read back dirty pages for anything other than metadata pages (e.g. relation sizes). Related: #6345 ## Summary of changes - commit() modifications before ingesting database creation wal records, so that they are guaranteed to be able to get() everything they need directly from the underlying Timeline. - Split dirty pages in DatadirModification into pending_metadata_pages and pending_data_pages. The data ones don't need to be in a key-addressable format, so they just go in a Vec instead. - Special case handling of zero-page writes in DatadirModification, putting them in a map which is flushed on the end of a WAL record. This handles the case where during ingest, we might first write a zero page, and then ingest a postgres write to that page. We used to do this via the key-indexed map of writes, but in this PR we change the data page write path to not bother indexing these by key. My least favorite thing about this PR is that I needed to change the DatadirModification interface to add the on_record_end call. This is not very invasive because there's really only one place we use it, but it changes the object's behaviour from being clearly an aggregation of many records to having some per-record state. I could avoid this by implicitly doing the work when someone calls set_lsn or commit -- I'm open to opinions on whether that's cleaner or dirtier. ## Performance There may be some efficiency improvement here, but the primary motivation is to enable an earlier stage of ingest to operate without access to a Timeline. The `pending_data_pages` part is the "fast path" bulk write data that can in principle be generated without a Timeline, in parallel with other ingest batches, and ultimately on the safekeeper. `test_bulk_insert` on AX102 shows approximately the same results as in the previous PR #8591: ``` ------------------------------ Benchmark results ------------------------------- test_bulk_insert[neon-release-pg16].insert: 23.577 s test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB test_bulk_insert[neon-release-pg16].peak_mem: 637 MB test_bulk_insert[neon-release-pg16].size: 0 MB test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB test_bulk_insert[neon-release-pg16].wal_recovery: 18.264 s test_bulk_insert[neon-release-pg16].compaction: 0.052 s ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem/Solution
TimelineWriter::put_batch is simply a loop over individual puts. Each put acquires and releases locks, and checks for potentially starting a new layer. Batching these is more efficient, but more importantly unlocks future changes where we can pre-build serialized buffers much earlier in the ingest process, potentially even on the safekeeper (imagine a future model where some variant of DatadirModification lives on the safekeeper).
Ensuring that the values in put_batch are written to one layer also enables a simplification upstream, where we no longer need to write values in LSN-order. This saves us a sort, but also simplifies follow-on refactors to DatadirModification: we can store metadata keys and data keys separately at that level without needing to zip them together in LSN order later.
Why?
In this PR, these changes are simplify optimizations, but they are motivated by evolving the ingest path in the direction of disentangling extracting DatadirModification from Timeline. It may not obvious how right now, but the general idea is that we'll end up with three phases of ingest:
Related: #8452
Caveats
Doing a big monolithic buffer of values to write to disk is ordinarily an anti-pattern: we prefer nice streaming I/O. However:
Benchmarks
This PR is primarily a stepping stone to safekeeper ingest filtering, but also provides a modest efficiency improvement to the
wal_recovery
part oftest_bulk_ingest
.test_bulk_ingest:
Checklist before requesting a review
Checklist before merging