Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient merging with empty batches #407

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Aug 28, 2023

This PR makes merging of batches with empty batches efficient. Previously this operation would be handled like a regular merge, by allocating a new batch and merging the input batches into it. However, if one of the batches is empty, there is no need to allocate a new batch, we can simply extend the bounds of the existing non-empty batch accordingly.

The implementation of this optimization is complicated by the fact that there are Batch implementations for Rc<Batch> and Abomonated<Batch>. Both of these only provide an immutable reference to the underlying batch, making it impossible to adjust the batch description to extend the batch's bounds. This commit solves this by introducing new RcBatch and AbomonatedBatch wrappers that, in addition to the Rc/Abomonated batch, hold their own Descriptions that override the inner batch's description.

Some adjustments to logging of merge events is also necessary, to account for the new merge path.

Fixes https://github.com/MaterializeInc/database-issues/issues/6368.

This commit makes merging of batches with empty batches efficient.
Previously this operation would be handled like a regular merge, by
allocating a new batch and merging the input batches into it. However,
if one of the batches is empty, there is no need to allocate a new
batch, we can simply extend the bounds of the existing non-empty batch
accordingly.

The implementation of this optimization is complicated by the fact that
there are `Batch` implementations for `Rc<Batch>` and
`Abomonated<Batch>`. Both of these only provide an immutable reference
to the underlying batch, making it impossible to adjust the batch
description to extend the batch's bounds. This commit solves this by
introducing new `RcBatch` and `AbomonatedBatch` wrappers that, in
addition to the `Rc`/`Abomonated` batch, hold their own `Description`s
that override the inner batch's description.
This commit makes sure completed merges are logged also for merges that
used the empty batch optimization.

Also, if we log the dropping of a merged batch, we shouldn't forget to
also log the merge event, so consumers don't become confused about the
whereabouts of the input batches.
@teskje
Copy link
Contributor Author

teskje commented Aug 28, 2023

Materialize CI is happy with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant