-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Backport 0.57.9 reverts" #20202
Conversation
1ba6c81
to
5d4f5c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach looks reasonable to me, but I feel that the size and complexity of the change after revert/backport/revert rounds is making reviewing more difficult here (at least for me). For example, there is a change below regarding AHash
that appears to have been necessary before, but is no longer now?
Perhaps we should use complementary techniques to increase confidence in addition to reviewing. A few suggestions for your consideration of what is feasible:
- It would be good to get a full run of the feature benchmark for this PR, now that it includes memory measurements.
- It would also be interesting to see if this PR changes the results of testdrive: Check mz_arrangement_sharing #20180.
- It would be good to run this change in staging with a large-ish workload (e.g., the TPC-H load generator with updates and all 22 queries maintained as indexed views) and contrast memory requirements for the same without this change.
- Ask the QA team for what else they could throw at this.
I manually ran the feature benchmark with CustomerWorkload1, and it shows an increase in memory but doesn't report a regression. A change of 6.7% seems to be under the threshold for reporting regressions, so we might need to tune regression reporting on memory stats. |
5d4f5c8
to
4e94ec2
Compare
I restructured the PR to have a clean revert at the beginning, then removal of |
The memory regression test would have caught the increased memory utilization: https://buildkite.com/materialize/nightlies/builds/2800#01894a9a-869b-4320-bf7b-d33ed2ef0351 This is a great signal and means that once this PR passes the feature benchmark, we have much higher confidence that it doesn't cause the same regression again. |
ba93001
to
a723330
Compare
This seems like it causes some additional arrangements:
Seen in #20590 Is this expected? Edit: This is in the introspection debugging views, so probably fine. |
Just confirming that these are expected! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other mz_arrangement_sharings were the same as before. Please rebase this on main, otherwise we'll have merge skew with the mz_arrangement_sharing checking in testdrive.
8b895eb
to
e25d57f
Compare
Signed-off-by: Moritz Hoffmann <[email protected]>
MzArranged. This tracks whether any client uses the trace, and only then installs the arrangement size logging operator. Signed-off-by: Moritz Hoffmann <[email protected]>
This commit adds the enable_arrangement_size_logging feature flag and wires it up such that clusters construct dataflows with or without arrangement size logging. Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
e25d57f
to
5dc6505
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unsafe
to protect against the memory leak might be considered an antipattern, since unsafe
is meant to protect against memory safety violations and memory leaks are considered safe in Rust. But I don't think we have a better way to force callers to double-check, so this seems fine. We should, however, also make MzArranged::trace
unsafe, I think.
The plumbing for the feature flag is quite horrible, but I suppose we can live with it since we don't plan to keep it around forever. Is there a follow-up issue for removing it after a while, so we don't forget?
Arranged { | ||
trace: arranged.trace, | ||
stream, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding: We now throw away the output of the unary
operator instead of passing through its input. That means we will perform additional copies of the stream updates, right?
If we don't use the output anymore, should we make the operator a sink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. The problem is that we need to attach the logging operator to the stream potentially after someone else attached themselves to the stream already, so we need to clone anyways. The clones are cheap, because we send Rc<Batch>
, so I'm not worried about this.
We can't turn the operator into a sink
because sink
doesn't give us access to the OperatorInfo
, which we need to determine the operator's address to be able to activate it from the outside. We could rewrite it in terms of builder_rc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unary
is fine. Though should we maybe remove the output.session(&time).give_container(&mut buffer);
to not make readers think that the output will be used?
{ | ||
// Reuseable allocation for unpacking. | ||
let mut datums = DatumVec::new(); | ||
let mut row_builder = Row::default(); | ||
|
||
// Safety: all `join_impl`s holds on to the trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Safety: all `join_impl`s holds on to the trace. | |
// Safety: all `join_impl`s hold on to the trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linear join impls drop their trace handles when the corresponding streams reach the empty frontier. I assume that's allowed under the safety requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the expected behavior. In this case, the trace should only contain a single empty batch.
Signed-off-by: Moritz Hoffmann <[email protected]>
Reverts #20124.
Only merge after TimelyDataflow/differential-dataflow#398 is fixed.We selected to have a solution purely within Materialize to address the issue.Best viewed commit-by-commit:
MzArranged
wrapper to track trace usage. This resolves holding on to the trace, even in other situations than consolidate.