-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist pg_stat information in pageserver #6560
Conversation
7557 tests run: 7184 passed, 0 failed, 373 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
f188149 at 2025-02-19T22:02:29.612Z :recycle: |
036e235
to
1b1fbf1
Compare
Thanks for the ping. Can we enforce a size limit of this WAL write on the postgres side? (i.e. check I'm thinking that since pgstat file is meant to be small, a limit shouldn't affect it, but it will protect us in case this generic "write any file to the WAL" function is misused in future. |
Just check how large statistic file can be: I created 10000 tables, perform some manipulations with them and do |
To factor in the issues we've had with capacity/replay scaling, what do you think about having separate stores for "big values" (such as pgstats) and "small values" (such as logical replication state)? Whatever the implementation on the pageserver evolves to, I think it will be helpful to store the ~< 8KiB items separately from the ~<4MiB items. This doesn't have to be a whole separate API, we just need some recognizable bit that the pageserver can use to store them separately. |
Cross referencing with #6874 (review) -- since that PR is writing image values regularly for the AuxFilesDirectory structure, it's important that we don't put the multi-megabyte stats value in the same structure, to avoid using unreasonable amounts of storage when logical replication is used on small tenants. The scenario described in that PR was 70,000 writes, so if we were writing a 4MB image every 1000 writes, that would be 280MB of storage for a tenant that could be as small as 30MB -- the rule of thumb I'm using here is that logical replication metadata should never account for the majority of storage in a tenant. |
pgstat information is saved only on systems shutdown and unlike snapshots files it is the single key. Please also notice that:
|
It doesn't matter how often the compute is restarted, if its large stats object is written to the same AuxFileDirectory as all the logical replication keys: every 1024 writes to LR slots will trigger a large (>4MiB) image value being written.
Let's design defensively so that we are not relying on good luck to avoid a situation where someone creates a lot of relations (big stats) and the enables logical replication (frequent updates).
dbdir is indeed an issue, but the dbdir key isn't re-written continuously as a result of logical replication slot updates, and each value in the dbdir is only a couple of bytes. The difference is that the AuxFilesDirectory is subject to frequent, continuous writes, so we must not put anything big in that structure because the frequency of writes will lead to repetition of that big object. |
Yes, it is true. Size of each hash map element is only 9 bytes. But still storage size of creation of 10k relations is 4.4Gb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM for the storage part of the change, but for now, AUX file v2 is not enabled for all regions, so I don't plan to get this merged for now, unless there could be a flag to control whether or not to enable this feature.
I see the Postgres-side of the pull request writes pg_stat data during shutdown checkpoint, so I assume that's not a lot of write traffic and page server should be able to handle that relatively easily. Please correct me if I'm wrong :)
Also, I didn't see pg_stat-related files getting encoded in the aux file v2 mapping? What are the paths of the files produced by pg_stat, and probably we need to assign a prefix ID to that? (see encode_aux_file_key
in aux_file.rs
)
Yes, pgstat data is written on shutdown checkpoint.
Sorry, I missed it. |
229076b
to
ddeb08a
Compare
f6b620b
to
bf5c7fb
Compare
Extracted from #6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Konstantin Knizhnik <[email protected]>
Extracted from #6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Konstantin Knizhnik <[email protected]>
This is for PR #6560. It implements the scheme I envisioned at #6560 (comment)
752a6b4
to
bb427b5
Compare
(please, don't forget to squash commits in this PR and in associated postgres PRs) |
This is for PR #6560. It implements the scheme I envisioned at #6560 (comment)
bb427b5
to
c02d727
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the storage side except that:
- Ingestion path -- need Vlad's review to ensure it works with the recent WAL fanout / filter project.
- Production -- I'm fairly uncertain about the pg_stat write pattern and whether the current compaction handles it well, so I'd prefer we test this in staging with large databases, and also make "whether to persist/load pg_stat" as a pageserver-level flag so that we can disable it if things go wrong. I can add the flag once the patch gets merged.
Right now setting |
This is for PR #6560. It implements the scheme I envisioned at #6560 (comment)
262b404
to
6d1102b
Compare
Add neon_pgstat_file_size_limit GUC to limit size of AUX file for pg_stat Drop pg_stat data in case of abnormal Postgres termination Use redo field to distinguish shutdown andonine checkpoints Disable persisting pgstat file by default
6d1102b
to
f188149
Compare
If this PR added a GUC in the Postgres fork or
If you're an external contributor, a Neon employee will assist in |
The test added here is going to be unstable: when running 10 test instances in parallel (following the recipe #10442 (comment)), I got failures on iterations 2, 2, 2:
|
Problem
Statistic is saved in local file and so lost on compute restart.
Persist in in page server using the same AUX file mechanism used for replication slots
See more about motivation in https://neondb.slack.com/archives/C04DGM6SMTM/p1703077676522789
Summary of changes
Persist postal file using AUX mechanism
Postgres PRs:
neondatabase/postgres#547
neondatabase/postgres#446
neondatabase/postgres#445
Related to #6684 and #6228