-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Pageserver Timeline Archival #8088
Comments
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: #8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in #8245 than this PR is adding.
A design for a cheap low-resource state for idle timelines: - #8088
A design for a cheap low-resource state for idle timelines: - #8088
This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of #8088
This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of #8088
As pointed out in #8414 (comment) Part of #8088
This week:
|
Add a way to list the offloaded timelines. Before, one had to look at logs to figure out if a timeline has been offloaded or not, or use the non-presence of a certain timeline in the list of normal timelines. Now, one can list them directly. Part of #8088
Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088
Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088
This PR does two things: 1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This prevents two unoffload tasks from racing with each other. While they already obtain locks for `timelines` and `offloaded_timelines`, they aren't sufficient, as we have already constructed an entire timeline at that point. We shouldn't ever have two `Timeline` objects in the same process at the same time. 2. don't allow timeline creations for timelines that have been offloaded. Obviously they already exist, so we should not allow creation. the previous logic only looked at the timelines list. Part of #8088
Last week:
This week:
|
Currently, all callers of `unoffload_timeline` ensure that the tenant the unoffload operation is called on is active. We rely on it being active as we activate the timeline below and don't want to race with the activation code of the tenant (in the worst case, activating a timeline twice). Therefore, add this assertion. Part of #8088
As pointed out in #9489 (comment) , we currently didn't support deletion for offloaded timelines after the timeline has been loaded from the manifest instead of having been offloaded. This was because the upload queue hasn't been initialized yet. This PR thus initializes the timeline and shuts it down immediately. Part of #8088
Disallow a request for timeline ancestor detach if either the to be detached timeline, or any of the to be reparented timelines are offloaded or archived. In theory we could support timelines that are archived but not offloaded, but archived timelines are at the risk of being offloaded, so we treat them like offloaded timelines. As for offloaded timelines, any code to "support" them would amount to unoffloading them, at which point we can just demand to have the timelines be unarchived. Part of #8088
Constructing a remote client is no big deal. Yes, it means an extra download from S3 but it's not that expensive. This simplifies code paths and scenarios to test. This unifies timelines that have been recently offloaded with timelines that have been offloaded in an earlier invocation of the process. Part of #8088
If we delete a timeline that has childen, those children will have their data corrupted. Therefore, extend the already existing safety check to offloaded timelines as well. Part of #8088
Last week I made and merged a lot of pull requests. Although some of them are quite small, they fix a lot of possible misuses/edge cases that can lead to corruption:
There has also been work by John for #9386, to make manifests more robust/generation ready:
This week:
So it's going really well, and work is mostly complete. Now the main task is to ensure it rolls out safely, with us reducing the impact of any possible issue by doing a staged rollout. |
Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.
last week:
this week: I'll focus on the scrubber side of #9386, and continuing to analyze tenants. |
Add a test that ensures the `retain_lsn` functionality works. Right now, there is not a single test that is broken if offloaded or non-offloaded timelines don't get registered at their parents, preventing gc from discarding the ancestor_lsns of the children. This PR fills that gap. The test has four modes: * `offloaded`: offload the child timeline, run compaction on the parent timeline, unarchive the child timeline, then try reading from it. hopefully the data is still there. * `offloaded-corrupted`: offload the child timeline, corrupts the manifest in a way that the pageserver believes the timeline was flattened. This is the closest we can get to pretend the `retain_lsn` mechanism doesn't exist for offloaded timelines, so we can avoid adding endpoints to the pageserver that do this manually for tests. The test then checks that indeed data is corrupted and the endpoint can't be started. That way we know that the test is actually working, and actually tests the `retain_lsn` mechanism, instead of say the lsn lease mechanism, or one of the many other mechanisms that impede gc. * `archived`: the child timeline gets archived but doesn't get offloaded. this currently matches the `None` case but we might have refactors in the future that make archived timelines sufficiently different from non-archived ones. * `None`: the child timeline doesn't even get archived. this tests that normal timelines participate in `retain_lsn`. I've made them locally not participate in `retain_lsn` (via commenting out the respective `ancestor_children.push` statement in tenant.rs) and ran the testsuite, and not a single test failed. So this test is first of its kind. Part of #8088.
PR #9308 has modified tenant activation code to take offloaded child timelines into account for populating the list of `retain_lsn` values. However, there is more places than just tenant activation where one needs to update the `retain_lsn`s. This PR fixes some bugs of the current code that could lead to corruption in the worst case: 1. Deleting of an offloaded timeline would not get its `retain_lsn` purged from its parent. With the patch we now do it, but as the parent can be offloaded as well, the situatoin is a bit trickier than for non-offloaded timelines which can just keep a pointer to their parent. Here we can't keep a pointer because the parent might get offloaded, then unoffloaded again, creating a dangling pointer situation. Keeping a pointer to the *tenant* is not good either, because we might drop the offloaded timeline in a context where a `offloaded_timelines` lock is already held: so we don't want to acquire a lock in the drop code of OffloadedTimeline. 2. Unoffloading a timeline would not get its `retain_lsn` values populated, leading to it maybe garbage collecting values that its children might need. We now call `initialize_gc_info` on the parent. 3. Offloading of a timeline would not get its `retain_lsn` values registered as offloaded at the parent. So if we drop the `Timeline` object, and its registration is removed, the parent would not have any of the child's `retain_lsn`s around. Also, before, the `Timeline` object would delete anything related to its timeline ID, now it only deletes `retain_lsn`s that have `MaybeOffloaded::No` set. Incorporates Chi's reproducer from #9753. cc neondatabase/cloud#20199 The `test_timeline_retain_lsn` test is extended: 1. it gains a new dimension, duplicating each mode, to either have the "main" branch be the direct parent of the timeline we archive, or the "test_archived_parent" branch intermediary, creating a three timeline structure. This doesn't test anything fixed by this PR in particular, just explores the vast space of possible configurations a little bit more. 2. it gains two new modes, `offload-parent`, which tests the second point, and `offload-no-restart` which tests the third point. It's easy to verify the test actually is "sharp" by removing one of the respective `self.initialize_gc_info()`, `gc_info.insert_child()` or `ancestor_children.push()`. Part of #8088 --------- Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Alex Chi Z <[email protected]>
Purpose
Enable users to create branches fearlessly, without worrying about hitting branch count limits & without having to worry about cleaning up old branches unless they want to.
Background
Currently, all timelines have significant physical overhead on the pageserver, even if they haven't been used for days/weeks/months:
Changes
This section isn't an authoritative design, but calls out functional areas that will need work.
Tenant
will need to only store active timelines inTenant::timelines
, and have some other map of hibernated timelines.Tasks
unoffload_timeline
#9539The text was updated successfully, but these errors were encountered: