Revert "Backport 0.57.9 reverts" #20202

antiguru · 2023-06-28T12:03:00Z

Reverts #20124.

~~Only merge after TimelyDataflow/differential-dataflow#398 is fixed.~~ We selected to have a solution purely within Materialize to address the issue.

Best viewed commit-by-commit:

Add the arrangement heap size feature, clean revert.
Remove the consolidate functionality introduced by the first PR. This avoids the memory regression we encountered.
Add the MzArranged wrapper to track trace usage. This resolves holding on to the trace, even in other situations than consolidate.
Move compaction, which should only be a cosmetic change.

uce · 2023-07-12T10:07:37Z

Great to see this back on track to main. 🎉

Did you have a chance to verify that the test cases we added to detect increased memory usage fail without your additional changes (bc0ae72, 5d4f5c8)?

vmarcos

The new approach looks reasonable to me, but I feel that the size and complexity of the change after revert/backport/revert rounds is making reviewing more difficult here (at least for me). For example, there is a change below regarding AHash that appears to have been necessary before, but is no longer now?

Perhaps we should use complementary techniques to increase confidence in addition to reviewing. A few suggestions for your consideration of what is feasible:

It would be good to get a full run of the feature benchmark for this PR, now that it includes memory measurements.
It would also be interesting to see if this PR changes the results of testdrive: Check mz_arrangement_sharing #20180.
It would be good to run this change in staging with a large-ish workload (e.g., the TPC-H load generator with updates and all 22 queries maintained as indexed views) and contrast memory requirements for the same without this change.
Ask the QA team for what else they could throw at this.

src/compute/Cargo.toml

antiguru · 2023-07-12T13:52:37Z

Did you have a chance to verify that the test cases we added to detect increased memory usage fail without your additional changes (bc0ae72, 5d4f5c8)?

I manually ran the feature benchmark with CustomerWorkload1, and it shows an increase in memory but doesn't report a regression. A change of 6.7% seems to be under the threshold for reporting regressions, so we might need to tune regression reporting on memory stats.

antiguru · 2023-07-12T14:39:06Z

I restructured the PR to have a clean revert at the beginning, then removal of mz_consolidate[_if], followed by MzArrange and a compaction nit change. I kicked off a feature benchmark run on the initial commit to show that there was a memory regression, which should have disappeared now.

https://buildkite.com/materialize/nightlies/builds/2800

antiguru · 2023-07-13T19:29:03Z

The memory regression test would have caught the increased memory utilization: https://buildkite.com/materialize/nightlies/builds/2800#01894a9a-869b-4320-bf7b-d33ed2ef0351

This is a great signal and means that once this PR passes the feature benchmark, we have much higher confidence that it doesn't cause the same regression again.

def- · 2023-07-17T09:59:58Z

This seems like it causes some additional arrangements:

[2023-07-17T09:53:11Z] mz-arrangement-sharing.td:804:1: error: non-matching rows: expected:
[2023-07-17T09:53:11Z] [["Arrange Timely(Reachability)"], ["ArrangeByKey Compute(DataflowCurrent)"], ["ArrangeByKey Compute(DataflowDependency)"], ["ArrangeByKey Compute(FrontierCurrent)"], ["ArrangeByKey Compute(FrontierDelay)"], ["ArrangeByKey Compute(ImportFrontierCurrent)"], ["ArrangeByKey Compute(PeekCurrent)"], ["ArrangeByKey Compute(PeekDuration)"], ["ArrangeByKey Differential(ArrangementBatches)"], ["ArrangeByKey Differential(ArrangementRecords)"], ["ArrangeByKey Differential(Sharing)"], ["ArrangeByKey Timely(Addresses)"], ["ArrangeByKey Timely(Channels)"], ["ArrangeByKey Timely(Elapsed)"], ["ArrangeByKey Timely(Histogram)"], ["ArrangeByKey Timely(MessagesReceived)"], ["ArrangeByKey Timely(MessagesSent)"], ["ArrangeByKey Timely(Operates)"], ["ArrangeByKey Timely(Parks)"]]
[2023-07-17T09:53:11Z] got:
[2023-07-17T09:53:11Z] [["Arrange Timely(Reachability)"], ["ArrangeByKey Compute(ArrangementHeapAllocations)"], ["ArrangeByKey Compute(ArrangementHeapCapacity)"], ["ArrangeByKey Compute(ArrangementHeapSize)"], ["ArrangeByKey Compute(DataflowCurrent)"], ["ArrangeByKey Compute(DataflowDependency)"], ["ArrangeByKey Compute(FrontierCurrent)"], ["ArrangeByKey Compute(FrontierDelay)"], ["ArrangeByKey Compute(ImportFrontierCurrent)"], ["ArrangeByKey Compute(PeekCurrent)"], ["ArrangeByKey Compute(PeekDuration)"], ["ArrangeByKey Compute(ShutdownDuration)"], ["ArrangeByKey Differential(ArrangementBatches)"], ["ArrangeByKey Differential(ArrangementRecords)"], ["ArrangeByKey Differential(Sharing)"], ["ArrangeByKey Timely(Addresses)"], ["ArrangeByKey Timely(Channels)"], ["ArrangeByKey Timely(Elapsed)"], ["ArrangeByKey Timely(Histogram)"], ["ArrangeByKey Timely(MessagesReceived)"], ["ArrangeByKey Timely(MessagesSent)"], ["ArrangeByKey Timely(Operates)"], ["ArrangeByKey Timely(Parks)"]]
[2023-07-17T09:53:11Z] Poor diff:
[2023-07-17T09:53:11Z] + "ArrangeByKey Compute(ArrangementHeapAllocations)"
[2023-07-17T09:53:11Z] + "ArrangeByKey Compute(ArrangementHeapCapacity)"
[2023-07-17T09:53:11Z] + "ArrangeByKey Compute(ArrangementHeapSize)"
[2023-07-17T09:53:11Z] + "ArrangeByKey Compute(ShutdownDuration)"

Seen in #20590 Is this expected?

Edit: This is in the introspection debugging views, so probably fine.

teskje · 2023-07-17T10:15:05Z

Edit: This is in the introspection debugging views, so probably fine.

Just confirming that these are expected!

def-

All other mz_arrangement_sharings were the same as before. Please rebase this on main, otherwise we'll have merge skew with the mz_arrangement_sharing checking in testdrive.

ggnall · 2023-07-17T15:20:22Z

Linking https://github.com/MaterializeInc/database-issues/issues/6042

This reverts commit 086e49b, reversing changes made to c4a8c41.

Signed-off-by: Moritz Hoffmann <[email protected]>

MzArranged. This tracks whether any client uses the trace, and only then installs the arrangement size logging operator. Signed-off-by: Moritz Hoffmann <[email protected]>

)

This commit adds the enable_arrangement_size_logging feature flag and wires it up such that clusters construct dataflows with or without arrangement size logging. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

teskje

Using unsafe to protect against the memory leak might be considered an antipattern, since unsafe is meant to protect against memory safety violations and memory leaks are considered safe in Rust. But I don't think we have a better way to force callers to double-check, so this seems fine. We should, however, also make MzArranged::trace unsafe, I think.

The plumbing for the feature flag is quite horrible, but I suppose we can live with it since we don't plan to keep it around forever. Is there a follow-up issue for removing it after a while, so we don't forget?

src/compute/src/extensions/arrange.rs

teskje · 2023-07-18T10:43:14Z

src/compute/src/extensions/arrange.rs

-    Arranged {
-        trace: arranged.trace,
-        stream,
-    }


Just to check my understanding: We now throw away the output of the unary operator instead of passing through its input. That means we will perform additional copies of the stream updates, right?

If we don't use the output anymore, should we make the operator a sink?

Your understanding is correct. The problem is that we need to attach the logging operator to the stream potentially after someone else attached themselves to the stream already, so we need to clone anyways. The clones are cheap, because we send Rc<Batch>, so I'm not worried about this.

We can't turn the operator into a sink because sink doesn't give us access to the OperatorInfo, which we need to determine the operator's address to be able to activate it from the outside. We could rewrite it in terms of builder_rc.

I think unary is fine. Though should we maybe remove the output.session(&time).give_container(&mut buffer); to not make readers think that the output will be used?

src/compute/src/extensions/arrange.rs

src/compute-client/src/protocol/command.rs

src/compute/src/logging/reachability.rs

src/compute/src/logging/timely.rs

teskje · 2023-07-18T11:32:54Z

src/compute/src/render/join/linear_join.rs

 {
    // Reuseable allocation for unpacking.
    let mut datums = DatumVec::new();
    let mut row_builder = Row::default();

+    // Safety: all `join_impl`s holds on to the trace.


Suggested change

// Safety: all `join_impl`s holds on to the trace.

// Safety: all `join_impl`s hold on to the trace.

The linear join impls drop their trace handles when the corresponding streams reach the empty frontier. I assume that's allowed under the safety requirements?

Yes, that's the expected behavior. In this case, the trace should only contain a single empty batch.

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the revert-20124-backport_57.9_reverts branch 3 times, most recently from 1ba6c81 to 5d4f5c8 Compare July 11, 2023 14:46

antiguru marked this pull request as ready for review July 11, 2023 16:30

antiguru requested review from a team July 11, 2023 16:30

antiguru requested review from a team as code owners July 11, 2023 16:30

antiguru removed request for a team July 11, 2023 17:00

vmarcos reviewed Jul 12, 2023

View reviewed changes

src/compute/Cargo.toml Outdated Show resolved Hide resolved

antiguru force-pushed the revert-20124-backport_57.9_reverts branch from 5d4f5c8 to 4e94ec2 Compare July 12, 2023 14:17

antiguru force-pushed the revert-20124-backport_57.9_reverts branch 4 times, most recently from ba93001 to a723330 Compare July 14, 2023 15:55

teskje self-requested a review July 17, 2023 09:11

def- approved these changes Jul 17, 2023

View reviewed changes

antiguru force-pushed the revert-20124-backport_57.9_reverts branch 2 times, most recently from 8b895eb to e25d57f Compare July 17, 2023 11:59

antiguru and others added 2 commits July 18, 2023 11:10

Revert "Merge pull request #20124 from umanwizard/backport_57.9_reverts"

5032396

This reverts commit 086e49b, reversing changes made to c4a8c41.

Revert "Backport 0.57.9 reverts"

4c217ec

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru added 7 commits July 18, 2023 11:10

Track trace usage in Materialize through a custom wrapper type

7cefacf

MzArranged. This tracks whether any client uses the trace, and only then installs the arrangement size logging operator. Signed-off-by: Moritz Hoffmann <[email protected]>

Arrangement size logging: allow compaction to the empty frontier (#20203

7b19127

)

Add arrangement size logging feature flag

ebcc499

This commit adds the enable_arrangement_size_logging feature flag and wires it up such that clusters construct dataflows with or without arrangement size logging. Signed-off-by: Moritz Hoffmann <[email protected]>

Update documentation

57f0173

Signed-off-by: Moritz Hoffmann <[email protected]>

Add safety argument to MzArranged::inner calls

f4a80c2

Signed-off-by: Moritz Hoffmann <[email protected]>

Fix testdrive

03e7e13

Signed-off-by: Moritz Hoffmann <[email protected]>

Enable arrangement size logging in tests

5dc6505

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the revert-20124-backport_57.9_reverts branch from e25d57f to 5dc6505 Compare July 18, 2023 09:11

teskje requested changes Jul 18, 2023

View reviewed changes

Address review comments

5db53dd

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru mentioned this pull request Jul 19, 2023

Revert "Merge pull request #20124 from umanwizard/backport_57.9_reverts" #20645

Closed

5 tasks

antiguru closed this Aug 11, 2023

antiguru deleted the revert-20124-backport_57.9_reverts branch December 22, 2023 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Backport 0.57.9 reverts" #20202

Revert "Backport 0.57.9 reverts" #20202

antiguru commented Jun 28, 2023 •

edited

Loading

uce commented Jul 12, 2023

vmarcos left a comment

antiguru commented Jul 12, 2023

antiguru commented Jul 12, 2023

antiguru commented Jul 13, 2023

def- commented Jul 17, 2023 •

edited

Loading

teskje commented Jul 17, 2023

def- left a comment •

edited

Loading

ggnall commented Jul 17, 2023

teskje left a comment

teskje Jul 18, 2023

antiguru Jul 18, 2023

teskje Jul 18, 2023

teskje Jul 18, 2023

teskje Jul 18, 2023

antiguru Jul 18, 2023

	// Safety: all `join_impl`s holds on to the trace.
	// Safety: all `join_impl`s hold on to the trace.

Revert "Backport 0.57.9 reverts" #20202

Revert "Backport 0.57.9 reverts" #20202

Conversation

antiguru commented Jun 28, 2023 • edited Loading

uce commented Jul 12, 2023

vmarcos left a comment

Choose a reason for hiding this comment

antiguru commented Jul 12, 2023

antiguru commented Jul 12, 2023

antiguru commented Jul 13, 2023

def- commented Jul 17, 2023 • edited Loading

teskje commented Jul 17, 2023

def- left a comment • edited Loading

Choose a reason for hiding this comment

ggnall commented Jul 17, 2023

teskje left a comment

Choose a reason for hiding this comment

teskje Jul 18, 2023

Choose a reason for hiding this comment

antiguru Jul 18, 2023

Choose a reason for hiding this comment

teskje Jul 18, 2023

Choose a reason for hiding this comment

teskje Jul 18, 2023

Choose a reason for hiding this comment

teskje Jul 18, 2023

Choose a reason for hiding this comment

antiguru Jul 18, 2023

Choose a reason for hiding this comment

antiguru commented Jun 28, 2023 •

edited

Loading

def- commented Jul 17, 2023 •

edited

Loading

def- left a comment •

edited

Loading