Epic: Pageserver Timeline Archival #8088

jcsp · 2024-06-18T09:25:28Z

Purpose

Enable users to create branches fearlessly, without worrying about hitting branch count limits & without having to worry about cleaning up old branches unless they want to.

Background

Currently, all timelines have significant physical overhead on the pageserver, even if they haven't been used for days/weeks/months:

scanning timeline's remote storage path on tenant startup & load their index
pinning some of the timeline's layers into local storage for logical size calculations
running a wal receiver for the timeline

Changes

This section isn't an authoritative design, but calls out functional areas that will need work.

We'll need some manifest in remote storage that the tenant can read on startup to learn which timelines should be loaded in an active state, vs. which timelines are hibernated. Keeping this properly up to date with timeline create/delete operations will be a key correctness point.
Persist enough information about hibernated timelines that we can know their logical size (& any other key stats) without having to load them fully. It probably makes sense to inline this into the per-tenant object that lists the timelines.
Our runtime state in Tenant will need to only store active timelines in Tenant::timelines, and have some other map of hibernated timelines.
APIs that list timelines will need either to change their semantics to only report active timelines, to avoid unreasonably large responses when users have many thousands of branches -- or paginated/queryable. Bu
An external API to enable the control plane to tell us when a timeline should be hibernated or awoken. We could also choose to auto-hibernate after some period of inactivity, but that might be duplicative wrt the externally driven mechanism.`.
A cache-warming routine that loads enough layers to serve reads at the tip of the branch, so that when we activate a timeline, the user doesn't encounter a long slow period while data is promoted to local storage.

Tasks

Give feedback

pageserver: write timeline archival rfc #8218

c/storage/pageserver t/feature
pageserver: add supplementary branch usage stats #8131

a/tech_debt c/storage/pageserver
Add archival_config endpoint to pageserver #8414
Mark body of archival_config endpoint as required #8458
timeline archival: persistence #8459 / Persist archival information #8479
pageserver: implement visible layer housekeeping, for use in warm-ups
Timeline archival test #8824
controller: add pass-through for archival_config API: Implement archival_config timeline endpoint in the storage controller #8680
Forbid creation of child timelines of archived timeline #9122
Add timeline offload mechanism #8907
Shut down timelines during offload and add offload tests #9289
Also consider offloaded timelines for obtaining retain_lsn #9308
Activate timelines during unoffload #9399
Synthetic size should exclude archived timelines #9384
Add config variable for timeline offloading #9421
persistence for offloaded state #9386
offloaded timeline list API #9461
Support offloaded timelines during shard split #9489
Offloaded timeline deletion #9519
Fix unoffload_timeline races with creation #9525
Assert the tenant to be active in unoffload_timeline #9539
Disallow archived timelines to be detached or reparented #9578
Disallow offloaded children during timeline deletion #9582
Don't keep around the timeline's remote_client #9583
Add tenant config option to allow timeline_offloading #9598
Add a retain_lsn test #9599
pageserver: expose billing metrics for active size vs. archived size (decided to use logical size in the end, stop sending it after offload)
Don't preload offloaded timelines #9646
Correct mistakes in offloaded timeline retain_lsn management #9760
offloaded timeline query API
test for many timelines depending on each other
Impede external getpage requests for archived timelines #9548

c/storage/pageserver
pageserver: generation numbers for manifest objects #9543

c/storage/pageserver t/feature
test that offloaded timelines are excluded from heatmaps and never downloaded to secondaries
pytest for archival/unarchival together with storage controller and old generations
controller: ensure that timeline passthrough operations (incl. archival) land on shards with the latest generation (check generation is still current after they ack)
resume deletion instead of logging warning upon unoffloading
--- Milestone: archived branches are cheap locally -- (no index load on startup, no layers on disk, no Timeline at runtime)
pageserver: implement warm-up API
tests: after warming up, a read workload should not result in any on-demand downloads
add timeline flattening (including some way to block offload for it)
--- Milestone: archived branches are cheap in remote storage -- eventually written as compressed image layers at a single LSN
make scrubber check S3 invariants: a) timeline that is offloaded must be archived, b) timeline that is archived must have all of its children archived as well
unified lock for offloaded/timelines/loading timelines: eliminates some race conditions and inconsistent states
test: offload but pageserver crashes somewhere in delete_local_timeline_directory: can the pageserver deal with remnants after a restart?
Options

The text was updated successfully, but these errors were encountered:

## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.

A design for a cheap low-resource state for idle timelines: - #8088

This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of #8088

arpad-m · 2024-07-22T14:58:47Z

This week:

As pointed out in #8414 (comment) Part of #8088

Persists whether a timeline is archived or not in `index_part.json`. We only return success if the upload has actually worked successfully. Also introduces a new `index_part.json` version number. Fixes #8459 Part of #8088

arpad-m · 2024-08-19T13:29:58Z

This week:

get storage controller PR merged (tests missing): Implement archival_config timeline endpoint in the storage controller #8680
make offload MVP PR (ideally also reviewed plus merged)

Add a way to list the offloaded timelines. Before, one had to look at logs to figure out if a timeline has been offloaded or not, or use the non-presence of a certain timeline in the list of normal timelines. Now, one can list them directly. Part of #8088

Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088

Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088

This PR does two things: 1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This prevents two unoffload tasks from racing with each other. While they already obtain locks for `timelines` and `offloaded_timelines`, they aren't sufficient, as we have already constructed an entire timeline at that point. We shouldn't ever have two `Timeline` objects in the same process at the same time. 2. don't allow timeline creations for timelines that have been offloaded. Obviously they already exist, so we should not allow creation. the previous logic only looked at the timelines list. Part of #8088

Archived timelines should not count towards synthetic size. Closes #9384. Part of #8088.

arpad-m · 2024-10-28T12:50:34Z

Last week:

merged Timeline offloading persistence #9444
merged offloaded timeline list API #9461
filed and merged Support offloaded timelines during shard split #9489
filed and merged Don't consider archived timelines for synthetic size calculation #9497
filed and merged Fix unoffload_timeline races with creation #9525
filed Offloaded timeline deletion #9519
went over all locations where timelines are listed, identified some locations where offloaded should be listed too.

This week:

after 9519 merges, enable offloading on staging
test for retain_lsn functionality of offloaded branches
return error if trying to detach ancestor of archived or offloaded timeline, demanding all children to be unarchived first.
make deletion code check that the timeline also has no offloaded children in addition to no non-offloaded ones

Currently, all callers of `unoffload_timeline` ensure that the tenant the unoffload operation is called on is active. We rely on it being active as we activate the timeline below and don't want to race with the activation code of the tenant (in the worst case, activating a timeline twice). Therefore, add this assertion. Part of #8088

As pointed out in #9489 (comment) , we currently didn't support deletion for offloaded timelines after the timeline has been loaded from the manifest instead of having been offloaded. This was because the upload queue hasn't been initialized yet. This PR thus initializes the timeline and shuts it down immediately. Part of #8088

Disallow a request for timeline ancestor detach if either the to be detached timeline, or any of the to be reparented timelines are offloaded or archived. In theory we could support timelines that are archived but not offloaded, but archived timelines are at the risk of being offloaded, so we treat them like offloaded timelines. As for offloaded timelines, any code to "support" them would amount to unoffloading them, at which point we can just demand to have the timelines be unarchived. Part of #8088

Constructing a remote client is no big deal. Yes, it means an extra download from S3 but it's not that expensive. This simplifies code paths and scenarios to test. This unifies timelines that have been recently offloaded with timelines that have been offloaded in an earlier invocation of the process. Part of #8088

If we delete a timeline that has childen, those children will have their data corrupted. Therefore, extend the already existing safety check to offloaded timelines as well. Part of #8088

arpad-m · 2024-11-04T15:43:28Z

Last week I made and merged a lot of pull requests. Although some of them are quite small, they fix a lot of possible misuses/edge cases that can lead to corruption:

merged: Offloaded timeline deletion #9519
enabled offloading on staging: https://github.com/neondatabase/infra/pull/2205
filed and merged Fix unoffload_timeline races with creation #9525
filed and merged Assert the tenant to be active in unoffload_timeline #9539
filed and merged Disallow archived timelines to be detached or reparented #9578
filed and merged Disallow offloaded children during timeline deletion #9582
filed and merged Don't keep around the timeline's remote_client #9583
filed Add tenant config option to allow timeline_offloading #9598
filed Add a retain_lsn test #9599 <--- this one took me a long time of debugging, but it was worth the time, it's important to ensure that everything is correct.

There has also been work by John for #9386, to make manifests more robust/generation ready:

This week:

merge the two open PRs to add a test and make it possible to enable offloading for single tenants
scrubber changes for persistence for offloaded state #9386, also to delete old generations of manifest
monitor staging and see if there is corruptions

So it's going really well, and work is mostly complete. Now the main task is to ensure it rolls out safely, with us reducing the impact of any possible issue by doing a staged rollout.

Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.

arpad-m · 2024-11-11T15:33:51Z

last week:

merged Add tenant config option to allow timeline_offloading #9598
filed and merged Don't attach is_archived to debug output #9679
filed Don't preload offloaded timelines #9646
Chi filed and merged fix(pageserver): drain upload queue before offloading timeline #9682
I've analyzed a staging tenant that had issues due to offloading. The issues have been caused by an already fixed (in Shut down timelines during offload and add offload tests #9289) bug caused by the timeline not having been shut down properly.

this week:

I'll focus on the scrubber side of #9386, and continuing to analyze tenants.

Add a test that ensures the `retain_lsn` functionality works. Right now, there is not a single test that is broken if offloaded or non-offloaded timelines don't get registered at their parents, preventing gc from discarding the ancestor_lsns of the children. This PR fills that gap. The test has four modes: * `offloaded`: offload the child timeline, run compaction on the parent timeline, unarchive the child timeline, then try reading from it. hopefully the data is still there. * `offloaded-corrupted`: offload the child timeline, corrupts the manifest in a way that the pageserver believes the timeline was flattened. This is the closest we can get to pretend the `retain_lsn` mechanism doesn't exist for offloaded timelines, so we can avoid adding endpoints to the pageserver that do this manually for tests. The test then checks that indeed data is corrupted and the endpoint can't be started. That way we know that the test is actually working, and actually tests the `retain_lsn` mechanism, instead of say the lsn lease mechanism, or one of the many other mechanisms that impede gc. * `archived`: the child timeline gets archived but doesn't get offloaded. this currently matches the `None` case but we might have refactors in the future that make archived timelines sufficiently different from non-archived ones. * `None`: the child timeline doesn't even get archived. this tests that normal timelines participate in `retain_lsn`. I've made them locally not participate in `retain_lsn` (via commenting out the respective `ancestor_children.push` statement in tenant.rs) and ran the testsuite, and not a single test failed. So this test is first of its kind. Part of #8088.

PR #9308 has modified tenant activation code to take offloaded child timelines into account for populating the list of `retain_lsn` values. However, there is more places than just tenant activation where one needs to update the `retain_lsn`s. This PR fixes some bugs of the current code that could lead to corruption in the worst case: 1. Deleting of an offloaded timeline would not get its `retain_lsn` purged from its parent. With the patch we now do it, but as the parent can be offloaded as well, the situatoin is a bit trickier than for non-offloaded timelines which can just keep a pointer to their parent. Here we can't keep a pointer because the parent might get offloaded, then unoffloaded again, creating a dangling pointer situation. Keeping a pointer to the *tenant* is not good either, because we might drop the offloaded timeline in a context where a `offloaded_timelines` lock is already held: so we don't want to acquire a lock in the drop code of OffloadedTimeline. 2. Unoffloading a timeline would not get its `retain_lsn` values populated, leading to it maybe garbage collecting values that its children might need. We now call `initialize_gc_info` on the parent. 3. Offloading of a timeline would not get its `retain_lsn` values registered as offloaded at the parent. So if we drop the `Timeline` object, and its registration is removed, the parent would not have any of the child's `retain_lsn`s around. Also, before, the `Timeline` object would delete anything related to its timeline ID, now it only deletes `retain_lsn`s that have `MaybeOffloaded::No` set. Incorporates Chi's reproducer from #9753. cc neondatabase/cloud#20199 The `test_timeline_retain_lsn` test is extended: 1. it gains a new dimension, duplicating each mode, to either have the "main" branch be the direct parent of the timeline we archive, or the "test_archived_parent" branch intermediary, creating a three timeline structure. This doesn't test anything fixed by this PR in particular, just explores the vast space of possible configurations a little bit more. 2. it gains two new modes, `offload-parent`, which tests the second point, and `offload-no-restart` which tests the third point. It's easy to verify the test actually is "sharp" by removing one of the respective `self.initialize_gc_info()`, `gc_info.insert_child()` or `ancestor_children.push()`. Part of #8088 --------- Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Alex Chi Z <[email protected]>

jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic labels Jun 18, 2024

jcsp mentioned this issue Jun 19, 2024

logical size limit is broken during PS restart #5963

Open

jcsp changed the title ~~Epic: Pageserver Timeline Hibernation~~ Epic: Pageserver Timeline Archival Jul 1, 2024

This was referenced Jul 1, 2024

rfcs: add RFC for timeline archival #8221

Merged

pageserver: add supplementary branch usage stats #8131

Merged

jcsp assigned arpad-m Jul 8, 2024

jcsp added a commit that referenced this issue Jul 11, 2024

rfcs: add RFC for timeline archival (#8221)

69b6675

A design for a cheap low-resource state for idle timelines: - #8088

skyzh pushed a commit that referenced this issue Jul 15, 2024

rfcs: add RFC for timeline archival (#8221)

32f668f

A design for a cheap low-resource state for idle timelines: - #8088

arpad-m mentioned this issue Jul 17, 2024

Add archival_config endpoint to pageserver #8414

Merged

2 tasks

This was referenced Jul 22, 2024

Mark body of archival_config endpoint as required #8458

Merged

timeline archival: persistence #8459

Closed

timeline archival: slimmed down timeline object #8460

Closed

arpad-m added a commit that referenced this issue Jul 22, 2024

Mark body of archival_config endpoint as required (#8458)

f17fe75

As pointed out in #8414 (comment) Part of #8088

arpad-m mentioned this issue Jul 23, 2024

Persist archival information #8479

Merged

arpad-m mentioned this issue Aug 9, 2024

Implement archival_config timeline endpoint in the storage controller #8680

Merged

arpad-m mentioned this issue Aug 24, 2024

Timeline archival test #8824

Merged

This was referenced Oct 23, 2024

Support offloaded timelines during shard split #9489

Merged

Don't consider archived timelines for synthetic size calculation #9497

Merged

This was referenced Oct 25, 2024

Offloaded timeline deletion #9519

Merged

Fix unoffload_timeline races with creation #9525

Merged

arpad-m added a commit that referenced this issue Oct 26, 2024

Don't consider archived timelines for synthetic size calculation (#9497)

e727788

Archived timelines should not count towards synthetic size. Closes #9384. Part of #8088.

arpad-m mentioned this issue Oct 28, 2024

Assert the tenant to be active in unoffload_timeline #9539

Merged

jcsp mentioned this issue Oct 28, 2024

Epic: Pageserver internal catalog #4636

Closed

arpad-m mentioned this issue Oct 28, 2024

Impede external getpage requests for archived timelines #9548

Open

This was referenced Oct 29, 2024

Disallow archived timelines to be detached or reparented #9578

Merged

Disallow offloaded children during timeline deletion #9582

Merged

Don't keep around the timeline's remote_client #9583

Merged

This was referenced Oct 31, 2024

Add tenant config option to allow timeline_offloading #9598

Merged

Add a retain_lsn test #9599

Merged

arpad-m added a commit that referenced this issue Nov 4, 2024

Add tenant config option to allow timeline_offloading (#9598)

ee68bbf

Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.

arpad-m mentioned this issue Nov 5, 2024

Don't preload offloaded timelines #9646

Open

arpad-m mentioned this issue Nov 14, 2024

Correct mistakes in offloaded timeline retain_lsn management #9760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Pageserver Timeline Archival #8088

Epic: Pageserver Timeline Archival #8088

jcsp commented Jun 18, 2024 •

edited by arpad-m

Loading

Tasks

arpad-m commented Jul 22, 2024

arpad-m commented Aug 19, 2024

arpad-m commented Oct 28, 2024

arpad-m commented Nov 4, 2024

arpad-m commented Nov 11, 2024

Epic: Pageserver Timeline Archival #8088

Epic: Pageserver Timeline Archival #8088

Comments

jcsp commented Jun 18, 2024 • edited by arpad-m Loading

Purpose

Background

Changes

Tasks

arpad-m commented Jul 22, 2024

arpad-m commented Aug 19, 2024

arpad-m commented Oct 28, 2024

arpad-m commented Nov 4, 2024

arpad-m commented Nov 11, 2024

jcsp commented Jun 18, 2024 •

edited by arpad-m

Loading