Skip to content

Make WAL keys TLI aware #491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: TDE_REL_17_STABLE
Choose a base branch
from

Conversation

dAdAbird
Copy link
Member

WIP, needs tests

Before this commit, WAL keys didn't mind TLI at all. But after pg_rewind, for example, pg_wal/ may contain segments from two timelines. And the wal reader choosing the key may pick the wrong one because LSNs of different TLIs may overlap. There was also another bug: There is a key with the start LSN 0/30000 in TLI 1. And after the start in TLI 2, the wal writer creates a new key with the SN 0/30000, but in TLI 2. But the reader wouldn't fetch the latest key because w/o TLI, these are the same.

This commit adds TLI to the Internal keys and makes use of it along with LSN for key compares.

@AndersAstrand
Copy link
Collaborator

AndersAstrand commented Jul 29, 2025

This will break existing clusters as anything added to InternalKey will also be added to TDEMapEntry which is read directly from the existing file for relation keys.

This needs to be dealt with somehow now that we're past 1.0.

My suggestion is to completely separate key storage for wal keys and relation keys so that relation key file format can stay as it is.

That would also let us get rid of that awful fake rlocator we use to find the wal key file.

@dAdAbird
Copy link
Member Author

Yeah, that totally makes sense. Will do that with additional commit

Before this commit, WAL keys didn't mind TLI at all. But after
pg_rewind, for example, pg_wal/ may contain segments from two
timelines. And the wal reader choosing the key may pick the wrong one
because LSNs of different TLIs may overlap. There was also another bug:
There is a key with the start LSN 0/30000 in TLI 1. And after the start
in TLI 2, the wal writer creates a new key with the SN 0/30000, but in
TLI 2. But the reader wouldn't fetch the latest key because w/o TLI,
these are the same.

This commit adds TLI to the Internal keys and makes use of it along
with LSN for key compares.
@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.94%. Comparing base (a329aeb) to head (f25f29d).
⚠️ Report is 19 commits behind head on TDE_REL_17_STABLE.

❌ Your project status has failed because the head coverage (81.94%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           TDE_REL_17_STABLE     #491      +/-   ##
=====================================================
- Coverage              83.68%   81.94%   -1.75%     
=====================================================
  Files                     21       24       +3     
  Lines                   2771     3018     +247     
  Branches                 435      490      +55     
=====================================================
+ Hits                    2319     2473     +154     
- Misses                   368      444      +76     
- Partials                  84      101      +17     
Components Coverage Δ
access 81.81% <78.57%> (-0.46%) ⬇️
catalog 87.86% <ø> (+0.01%) ⬆️
common 77.77% <ø> (ø)
encryption 73.68% <100.00%> (+0.23%) ⬆️
keyring 73.21% <ø> (ø)
src 94.15% <ø> (+6.59%) ⬆️
smgr 94.88% <ø> (+0.02%) ⬆️
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dAdAbird dAdAbird force-pushed the keys_tli branch 2 times, most recently from 17f892c to d9c85aa Compare July 30, 2025 16:21
@dAdAbird dAdAbird marked this pull request as ready for review August 1, 2025 15:05
Copy link
Collaborator

@jeltz jeltz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we bump the file format version?

@@ -369,22 +370,23 @@ pg_tde_delete_principal_key(Oid dbOid)
* needs keyfile_path
*/
void
pg_tde_wal_last_key_set_lsn(XLogRecPtr lsn, const char *keyfile_path)
pg_tde_wal_last_key_set_lsn(XLogRecPtr lsn, TimeLineID tli, const char *keyfile_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this take a WalLocation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants