-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safekeeper: send AppendResponse
on segment flush
#9692
base: main
Are you sure you want to change the base?
Conversation
@arssher Just submitting this for an early look to see if you agree with the overall approach. This needs a few tweaks and tests/benchmarks. |
5391 tests run: 5170 passed, 1 failed, 220 skipped (full report)Failures on Postgres 17
Test coverage report is not availableThe comment gets automatically updated with the latest test results
4bfdf46 at 2024-11-13T10:49:55.004Z :recycle: |
10223d1
to
1cf5b69
Compare
Here are some benchmark results to illustrate the need for #9698 before merging this.
Before this change, With this change, the Without #9698, this increases the number of control file flushes from 2 to 60 (each with 3 fsyncs on the ingest path), reducing throughput by 30%:
With #9698, only 5 control file flushes happen, and more importantly, these happen off of the ingest hot path. Thus throughput remains unchanged:
So #9698 is a necessary prerequisite to this PR. |
1cf5b69
to
71f04f9
Compare
This should be ready for review now. We don't have any tests for The benchmarks in #9692 (comment) confirm that this results in more frequent commits and no performance regression (assuming #9698 merges first). I'll add a separate benchmark measuring commit latency as part of #9690. |
Approach LGTM. |
Thanks! I think this should be good for a final review -- anything you think is missing? |
/// The last LSN flushed to disk. May be in the middle of a record. | ||
/// | ||
/// NB: when the rest of the system refers to `flush_lsn`, it usually | ||
/// actually refers to `flush_record_lsn`. This ambiguity can be dangerous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's proceed with renaming flush_lsn to flush_record_lsn everywhere in wal_storage related code? Plus leave the comment that other outside places might continue to call flush_record_lsn as flush_lsn because they care only about whole records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll submit a follow-up PR, to avoid too much churn in this one.
async fn write_exact(&mut self, pos: Lsn, mut buf: &[u8]) -> Result<()> { | ||
// TODO: this shouldn't be possible, except possibly with write_lsn == 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this todo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will submit a follow-up PR.
); | ||
} | ||
// We have unflushed data (write_lsn != flush_lsn), but no file. This | ||
// shouldn't happen, since the segment is flushed on close. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still might happen because write_in_segment temporarily takes the file; if fsync/rename fails it is not installed back. So either assert should be removed completely (and file reopened) or write_in_segment shouldn't take the file out.
This is not directly related to this PR and shouldn't be problem in practice because on error reconnection would do truncate_wal establishing self.flush_record_lsn == self.write_record_lsn, so I'm ok with leaving this out.
Problem
When processing pipelined
AppendRequest
s, we explicitly flush the WAL every second and return anAppendResponse
. However, the WAL is also implicitly flushed on segment bounds, but this does not result in anAppendResponse
. Because of this, concurrent transactions may take up to 1 second to commit and writes may take up to 1 second before sending to the pageserver.Separately, we should consider flushing the WAL on transaction commits -- see #9690.
Resolves #9688.
Summary of changes
Advance
flush_lsn
when a WAL segment is closed and flushed, and emit anAppendResponse
. To accommodate this, track theflush_lsn
in addition to theflush_record_lsn
.Note that this will result in more frequent commits during pipelined WAL ingestion, resulting in a control file flush (3 fsyncs) on every segment bound. We should address #9663 first, e.g. by taking control file flushes off of the ingest hot path.
Checklist before requesting a review
Checklist before merging