compute: simplify mz_join_core result computation #19667

teskje · 2023-06-02T11:24:46Z

Previously, the mz_join_core code wrapped the result closure into another work_result closure, to deal with the fact that matches from the todo1 list had their value and diff fields swapped.

Instead of having a closure to unswap the fields, we can simply pass them to the Deferred constructor in the correct order instead. That is, the Deferred constructor should always receive the cursor/storage from input 1 first and from input 2 second. If we make this change, no work_result closure is necessary and the code become easier to reason about.

Motivation

This PR refactors existing code.

This is something I stumbled over while working on support for a fueled merge join strategy.

Tips for reviewer

Part of this PR is renaming the fields of Deferred to reflect that (first, second) shouldn't be (trace, batch) but (input 1, input 2). The rest of the PR removes the work_result wrapper closure by inlining its logic into Deferred::work.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- N/A

vmarcos

Agreed from my side, thanks!

teskje · 2023-06-05T15:54:02Z

On request from @antiguru I opened a backport PR for DD as well: TimelyDataflow/differential-dataflow#392

Previously, the mz_join_core code wrapped the `result` closure into another `work_result` closure, to deal with the fact that matches from the `todo1` list had their value and diff fields swapped. Instead of having a closure to unswap the fields, we can simply pass them to the `Deferred` constructor in the correct order instead. That is, the `Deferred` constructor should always receive the cursor/storage from input 1 first and from input 2 second. If we make this change, no `work_result` closure is necessary and the code become easier to reason about.

antiguru

Thanks, I think this mostly looks good. One thing I'm not 100% sure about is renaming trace/batch to cursor 1/2. Can you confirm that they're used symmetrically, i.e., we could swap the arguments and produce the same result?

teskje · 2023-06-12T13:52:56Z

Here are the things that changes for new batches received from input 1 (nothing changes for batches from input 2):

We iterate over the batch in the outer loop and the trace in the inner loop. Previously that was flipped. That only affects the order in which outputs are produced, but we don't care about that (and the subsequent consolidation should sort anyway).
For an update we produce the time as (batch_t (join) meet) (join) trace_t whereas previously it was (trace_t (join) meet) (join) batch_t. AFAICT, join is associative and commutative, so we get the same results.
For an update we produce the diff as batch_diff.multiply(trace_diff), whereas previously it was trace_diff.multipy(batch_diff). Nothing states that multiply is a commutative operation, so we might get differences here. However, I would argue that it is more correct to always multiply input 1 and input 2 in the same order, rather than switching the order around depending on when updates arrive on the inputs.

teskje marked this pull request as ready for review June 2, 2023 12:29

teskje requested review from a team, antiguru and vmarcos June 2, 2023 12:29

vmarcos approved these changes Jun 5, 2023

View reviewed changes

teskje mentioned this pull request Jun 5, 2023

join: simplify logic passed to Deferred::work TimelyDataflow/differential-dataflow#392

Open

teskje force-pushed the simplify-mz_join_core-result branch from 870de8c to 8b6451b Compare June 5, 2023 15:56

antiguru approved these changes Jun 12, 2023

View reviewed changes

teskje merged commit 3c33bef into MaterializeInc:main Jun 14, 2023

teskje deleted the simplify-mz_join_core-result branch June 14, 2023 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute: simplify mz_join_core result computation #19667

compute: simplify mz_join_core result computation #19667

teskje commented Jun 2, 2023 •

edited

Loading

vmarcos left a comment

teskje commented Jun 5, 2023

antiguru left a comment

teskje commented Jun 12, 2023

compute: simplify mz_join_core result computation #19667

compute: simplify mz_join_core result computation #19667

Conversation

teskje commented Jun 2, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

vmarcos left a comment

Choose a reason for hiding this comment

teskje commented Jun 5, 2023

antiguru left a comment

Choose a reason for hiding this comment

teskje commented Jun 12, 2023

teskje commented Jun 2, 2023 •

edited

Loading