Implement dataflow expiration to limit temporal data retention #29587

antiguru · 2024-09-17T12:49:37Z

Introduces a new feature to limit data retention in temporal filters by dropped retraction diffs beyond a configured expiration time.

Motivation and logic is explained in more details in the design doc.

Fixes MaterializeInc/database-issues#7757

Tips for reviewer

Testing is in progress.
Added a new function on the Catalog in a separate file instead of piling on in catalog.rs.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

antiguru · 2024-09-17T13:17:51Z

src/compute/src/render.rs

+                // does not advance past the expiration time. Otherwise, we might write down incorrect
+                // data.
+                if let Some(timestamp) = self.expire_at.as_option().copied() {
+                    oks.expire_at(timestamp);


We could assign to oks to make sure that we're observing the frontier before any downstream operators.

This change would need to recreate the arrangement after the inspect, right?

sdht0 · 2024-09-19T04:02:29Z

Some notes:

Doesn't look like setting until in for_dataflow_in() is propagating to mpf.evaluate(), atleast in my testing. I've tracing back the calls and in many cases, until is just set to empty. I might be missing something, but we might have to go back to explicitly passing t_limit around.
- Specifically, mpf.evaluate() in PendingWork::do_work() does not use Context.until.
As per the Slack discussion, we'll have to handle the TimestampIndependent variant somewhere?

antiguru · 2024-09-19T10:59:20Z

until is just set to empty

That's expected! Dataflows that are valid for all times, and indexes/materialed views/subscribes without up-to should have an empty frontier. The Meet of the empty frontier with the expiration time should, however, set the until.

sdht0 · 2024-09-19T11:11:42Z

until is just set to empty

That's expected! Dataflows that are valid for all times, and indexes/materialed views/subscribes without up-to should have an empty frontier. The Meet of the empty frontier with the expiration time should, however, set the until.

Thanks for the reply! Yes I meant the second part. I can see that until is being set to t_limit, but that value is not propagating to one of the 3 mfp.evaluate() in do_work().

Did you test temporal filters in an index? For me, the data is not being dropped because until is empty.

src/adapter/src/optimize/index.rs

src/compute/src/compute_state.rs

src/compute-types/src/dyncfgs.rs

antiguru · 2024-09-19T18:51:29Z

Did you test temporal filters in an index?

What index did you test this with?

shepherdlybot · 2024-09-19T19:34:55Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test 🔍 Detected
(Required) Observability 🔍 Detected
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test

Risk Summary:

This pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these characteristics are 158% more likely to cause a bug than the repository baseline. Additionally, four modified files are recent hotspots for bug fixes. While the observed bug trend in the repository is increasing with recent spikes, the predicted trend is decreasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../memory/objects.rs	99
../src/as_of_selection.rs	93
../src/coord.rs	100
../src/catalog.rs	97

sdht0 · 2024-09-19T22:20:56Z

What index did you test this with?

I tested with a simple index and materialized view. Had to move out the until meet logic outside Context to make it propagate to all mfp.evaluate().

doc/developer/design/20240919_replica_expiration.md

src/adapter/src/coord/introspection.rs

src/compute/src/compute_state.rs

src/compute-types/src/dataflows.rs

teskje · 2024-09-23T13:29:02Z

src/compute/src/render.rs

@@ -241,7 +254,7 @@ pub fn build_compute_dataflow<A: Allocate>(
                        source.storage_metadata.clone(),
                        dataflow.as_of.clone(),
                        snapshot_mode,
-                        dataflow.until.clone(),
+                        until.clone(),


I'm not sure if this is correct. We are now passing the expiration time as the until to the storage source operators. Which means that these operators are not free to produce all times up to the until, and then jump directly to the empty frontier. expire_stream_at wouldn't panic in this case because it has an exception for the empty frontier, so we would show invalid data for times >= until. I think we still need to pass dataflow.until here, or don't ignore the empty frontier in expire_stream_at.

Yes, the logic needs a rework wrt empty frontiers, as is also obvious from a CI failure.

As we discussed, this is needed in mfp.evaluate().

I think Jan has a point, because a source could produce all data at the correct point in time, and then drop its capability. The downstream operator could see the data, plus the frontier advancement at the same time, which would make it hard to reason about what the expiration logic should do.

I think this is mitigated by only allowing the milliseconds timeline, which implies that we won't observe the forward-jumping behavior. But it's very hand-wavy.

src/compute/src/expiration.rs

Adds a metric to report the timestamp of replica expiration, and an approximate number of seconds that remain. ``` # HELP mz_dataflow_replica_expiration_remaining_seconds The remaining seconds until replica expiration. Can go negative, can lag behind. # TYPE mz_dataflow_replica_expiration_remaining_seconds gauge mz_dataflow_replica_expiration_remaining_seconds{worker_id="0"} 1727981.64199996 # HELP mz_dataflow_replica_expiration_timestamp_seconds The replica expiration timestamp in seconds since epoch. # TYPE mz_dataflow_replica_expiration_timestamp_seconds gauge mz_dataflow_replica_expiration_timestamp_seconds{worker_id="0"} 1730280383911 ``` Signed-off-by: Moritz Hoffmann <[email protected]>

…s_after_panic

sdht0 · 2024-10-10T22:23:22Z

Thanks everyone for the reviews!

Support expiration of dataflows depending on wall-clock time and with refresh schedules. This is a partial re-implementation of #29587 to enable more dataflows to participate in expiration. Specifically, it introduces the abstraction of _time dependence_ to describe how a dataflow follows wall-clock time. Using this information, we can then determine how a replica's expiration time relates to a specific dataflow. This allows us to support dataflows that have custom refresh policies. I'm not sold on the names introduced by this PR, but it's the best I came up with. Open to suggestions! The implementation deviates from the existing implementation is some important ways: * We do not panic in the dataflow operator that checks for frontier advancements, but rather retain a capability until the dataflow is shut down. This avoids race-condition where dataflow shutdown happens in parallel with dropping the shutdown token, and it avoids needing to reason about what dataflows produce error streams---some have an error output that immediately advances to the empty frontier. * We do not handle the empty frontier in a special way. Previously, we considered advancing to the empty frontier acceptable. However, this makes it difficult to distinguish a shutdown from a source reading the expiration time. In the first case, the operator should drop its capability, in the second it must not for correctness reasons. * We check in the worker thread whether the replica has expired and panic if needed. There are some problems this PR does not address: * Caching the time dependence information in the physical plans seems like a hack. I think a better place would be the controller. Happy to try this in a follow-up PR. * We need a separate kill-switch to disable the feature because as it is implemented, we capture the expiration time in the controller once per replica. A second kill-switch would enable us to override the expiration to stabilize the system. Fixes MaterializeInc/database-issues#8688. Fixes MaterializeInc/database-issues#8683. ### Tips for the reviewer Don't look at individual commits, it's a work log and does not have any semantic meaning. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [ ] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post. --------- Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru commented Sep 17, 2024

View reviewed changes

antiguru assigned sdht0 Sep 17, 2024

sdht0 force-pushed the temporal_ignorance branch 2 times, most recently from 054b232 to b9a2074 Compare September 18, 2024 03:12

sdht0 force-pushed the temporal_ignorance branch 2 times, most recently from 0eb513d to 6da5d66 Compare September 19, 2024 18:15

sdht0 reviewed Sep 19, 2024

View reviewed changes

src/adapter/src/optimize/index.rs Outdated Show resolved Hide resolved

sdht0 reviewed Sep 19, 2024

View reviewed changes

src/compute/src/compute_state.rs Outdated Show resolved Hide resolved

sdht0 reviewed Sep 19, 2024

View reviewed changes

src/compute-types/src/dyncfgs.rs Show resolved Hide resolved

sdht0 marked this pull request as ready for review September 19, 2024 19:33

sdht0 requested review from a team as code owners September 19, 2024 19:33

sdht0 requested review from ParkMyCar, a team and jkosh44 and removed request for a team and ParkMyCar September 19, 2024 19:33

sdht0 changed the title ~~temporal ignorance~~ Allow replica expiration to limit data retention Sep 19, 2024

sdht0 changed the title ~~Allow replica expiration to limit data retention~~ Allow replica expiration to limit temporal data rentention Sep 19, 2024

sdht0 force-pushed the temporal_ignorance branch 2 times, most recently from 5275555 to 1606dfd Compare September 20, 2024 17:38

teskje reviewed Sep 23, 2024

View reviewed changes

doc/developer/design/20240919_replica_expiration.md Outdated Show resolved Hide resolved

doc/developer/design/20240919_replica_expiration.md Outdated Show resolved Hide resolved

doc/developer/design/20240919_replica_expiration.md Outdated Show resolved Hide resolved

teskje reviewed Sep 23, 2024

View reviewed changes

sdht0 and others added 20 commits October 10, 2024 16:03

Fix bug

c3fd0c6

Fix replica-expiration.td

9c99688

Address comments

03b51d8

Fix rebase on main

42c87a8

Reimplement applying config changes

5dbdcd1

Address comments

530a3a5

Log expiration time

b3ae7a6

Make explicit

0656388

Tests

7f8529d

Fmt

445796f

ci: Ignore expected panic

6e2bcf4

Tests

0cfb43c

Metrics tests

422a4fd

Address comments

74e7819

lint fixes

f89ccf7

Saner time offsets

29b36ad

In memory test makes more sense for VIEW+INDEX

59ce8a3

Lint fixes

6dfa86e

Fix flaky tests

74635c2

sdht0 force-pushed the temporal_ignorance branch from 51f0062 to 74635c2 Compare October 10, 2024 20:03

sdht0 and others added 2 commits October 10, 2024 16:11

Fix merge skew

65feae5

cluster test: Add workflow_replica_expiration_creates_retraction_diff…

5b847d4

…s_after_panic

def- force-pushed the temporal_ignorance branch from fcfac82 to 5b847d4 Compare October 10, 2024 21:44

sdht0 enabled auto-merge (squash) October 10, 2024 21:55

sdht0 merged commit 05acf95 into MaterializeInc:main Oct 10, 2024
75 checks passed

antiguru deleted the temporal_ignorance branch October 11, 2024 07:07

antiguru mentioned this pull request Oct 28, 2024

Improved replica and dataflow expiration #30162

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement dataflow expiration to limit temporal data retention #29587

Implement dataflow expiration to limit temporal data retention #29587

antiguru commented Sep 17, 2024 •

edited by sdht0

Loading

antiguru Sep 17, 2024

sdht0 Oct 7, 2024

sdht0 commented Sep 19, 2024 •

edited

Loading

antiguru commented Sep 19, 2024

sdht0 commented Sep 19, 2024

antiguru commented Sep 19, 2024

shepherdlybot bot commented Sep 19, 2024 •

edited

Loading

sdht0 commented Sep 19, 2024

teskje Sep 23, 2024

sdht0 Sep 23, 2024

sdht0 Sep 25, 2024

antiguru Oct 9, 2024

sdht0 commented Oct 10, 2024

Implement dataflow expiration to limit temporal data retention #29587

Implement dataflow expiration to limit temporal data retention #29587

Conversation

antiguru commented Sep 17, 2024 • edited by sdht0 Loading

Tips for reviewer

Checklist

antiguru Sep 17, 2024

Choose a reason for hiding this comment

sdht0 Oct 7, 2024

Choose a reason for hiding this comment

sdht0 commented Sep 19, 2024 • edited Loading

antiguru commented Sep 19, 2024

sdht0 commented Sep 19, 2024

antiguru commented Sep 19, 2024

shepherdlybot bot commented Sep 19, 2024 • edited Loading

Mitigations

sdht0 commented Sep 19, 2024

teskje Sep 23, 2024

Choose a reason for hiding this comment

sdht0 Sep 23, 2024

Choose a reason for hiding this comment

sdht0 Sep 25, 2024

Choose a reason for hiding this comment

antiguru Oct 9, 2024

Choose a reason for hiding this comment

sdht0 commented Oct 10, 2024

antiguru commented Sep 17, 2024 •

edited by sdht0

Loading

sdht0 commented Sep 19, 2024 •

edited

Loading

shepherdlybot bot commented Sep 19, 2024 •

edited

Loading