Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt Materialize to use the columnated merge batcher #23121

Merged
merged 2 commits into from
Nov 20, 2023

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented Nov 10, 2023

cc @frankmcsherry

Update our dependency on Differential to use the columnated merge batcher. This allows us to move some of the allocations during hydration to lgalloc.

As it's not obvious: The ColKeyBatch in differential changes such that its merge batcher will use columnated data instead of vector-allocated data. This doesn't show up in this PR because we still depend on the same type definition. For more details, see TimelyDataflow/differential-dataflow#418

Motivation

Part of MaterializeInc/database-issues#6848.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@antiguru antiguru force-pushed the columnated_merge_batcher branch 7 times, most recently from dcc6f97 to d21baad Compare November 17, 2023 20:41
@antiguru antiguru force-pushed the columnated_merge_batcher branch from d21baad to 7a99184 Compare November 20, 2023 14:27
@antiguru antiguru marked this pull request as ready for review November 20, 2023 14:28
@antiguru antiguru requested review from a team, umanwizard, vmarcos and frankmcsherry November 20, 2023 14:28
Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me; I included a few questions on details below.

src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved
Comment on lines +137 to +138
T: Timestamp + Lattice + Columnation,
S::Timestamp: Lattice + Refines<T> + Columnation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not completely clear to me whether these trait bounds are absolutely required here? We are forcing CollectionBundle to require Columnation on its timestamp types, but we could require these bounds only in the implementation including ensure_collections?

Another way to phrase the question more conceptually: Do we want to ensure Columnation for timestamp types in connection with arrangement usage only or across all contexts (arrangements, dataflow edges) in this PR? The difference is not large now in that the timestamps used in arrangements and the ones in dataflow edges are today the same, but changing the requirements expresses an opinion that they should at least share some more behavior than we've required so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we need the bound on CollectionBundle because ArrangementFlavor needs it. We use both T and S::Timestamp in the handles, so the Columnation bound propagates everywhere.

Comment on lines +636 to +637
T: Timestamp + Lattice + Columnation,
S::Timestamp: Lattice + Refines<T> + Columnation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, even if we'd like Columnation in the definition of CollectionBundle, then we'd need to require the bound for the scope timestamp type S::Timestamp here, but not for the anchor timestamp type T?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needed:

error[E0277]: the trait bound `T: Columnation` is not satisfied
   --> src/compute/src/render/context.rs:372:13
    |
372 |     RowUnit(KeyValArrangementImport<S, Row, (), T>),
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Columnation` is not implemented for `T`
    |
    = note: required for `TStack<((mz_repr::Row, ()), T, i64), Overflowing<u32>>` to implement `differential_dataflow::trace::implementations::Layout`
    = note: required for `Spine<Rc<OrdValBatch<TStack<((mz_repr::Row, ()), T, i64), Overflowing<u32>>, TimelyStack<((mz_repr::Row, ()), T, i64)>>>>` to implement `TraceReader`
help: consider further restricting this bound
    |
369 |     T: Timestamp + Lattice + timely::container::columnation::Columnation,
    |                            +++++++++++++++++++++++++++++++++++++++++++++

@antiguru antiguru force-pushed the columnated_merge_batcher branch from 7a99184 to 91e092d Compare November 20, 2023 20:43
@antiguru antiguru enabled auto-merge November 20, 2023 20:44
@antiguru
Copy link
Member Author

Thanks for the reviews!

@antiguru antiguru merged commit c44fb2f into MaterializeInc:main Nov 20, 2023
72 checks passed
@antiguru antiguru deleted the columnated_merge_batcher branch November 20, 2023 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants