-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another attempt at introducing a columnated merge batcher #418
Another attempt at introducing a columnated merge batcher #418
Conversation
6eef06b
to
e232749
Compare
Signed-off-by: Moritz Hoffmann <[email protected]>
fe34a22
to
fe33ecd
Compare
@@ -163,11 +161,13 @@ impl<T: Columnation> RetainFrom<T> for TimelyStack<T> { | |||
|
|||
/// An immutable collection of update tuples, from a contiguous interval of logical times. | |||
#[derive(Abomonation)] | |||
pub struct OrdValBatch<L: Layout> { | |||
pub struct OrdValBatch<L: Layout, C> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading this, and .. while I can probably figure it out I have no clue what C
is meant to be. Would you be willing to add some doc text about it? It's a bit awkward that it plays no role for some time, leaving no hints about what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read a bit more, and .. it seems like this is mostly a newtype hack? Like, C
could be bool
and String
and line up with the Vec
and TimelyStack
merge batchers, if the impl
s said that. It feels a bit weird!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make the merge batcher part of the layout and it would resolve all these issues. But it also feels odd because it's not really part of the layout... What I can't wrap my head around is how to get from an update container on the layout to a merge batcher implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I can't see how to make it part of the layout. We don't know the Batch
yet, and the batcher needs to be parametrized with Self
a.k.a. the batch type :/
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
487b738
to
cbd1dad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left several comments, only a few of which are "important" to chase down and most are either questions or stylistic nits. The main one I'd like to figure out is whether you want to double the capacity of pending
to ensure you ship full containers. That seems relatively impactful. Several other discussion points that we can chase down here or 1:1.
keep.copy(datum); | ||
} | ||
else { | ||
builder.push((key.clone(), val.clone(), time.clone(), diff.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self that we should add a copy
method to the builder trait.
} | ||
} | ||
|
||
struct TimelyStackQueue<T: Columnation> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is weird. :D At least, a stack and a queue are different things. I guess this is a queue implemented over a stack, though.
/// Buffer size (number of elements) to use for new/empty buffers. | ||
const fn buffer_size() -> usize { | ||
let size = std::mem::size_of::<(D, T, R)>(); | ||
if size == 0 { | ||
Self::BUFFER_SIZE_BYTES | ||
} else if size <= Self::BUFFER_SIZE_BYTES { | ||
Self::BUFFER_SIZE_BYTES / size | ||
} else { | ||
1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is closely coupled to constants you've picked over in lg_alloc
. It's probably wise to jot that down, in case someone changes that constant and then doesn't understand why nothing pages anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, and worth discussing: I think this size has roughly nothing to do with what lg_alloc
will have to absorb? It indicates how large a Vec<(D, T, R)>
will be, but the container will instead hold on to the spilled data, which can be arbitrarily larger (e.g. if D
is mz_repr::Row
), or arbitrarily smaller. It might be a good set-up for when you are able to back the Vec
itself in lg_alloc
, but .. is it really doing the right thing until then?
Seems not terrible ofc, but idk if we are sure that the backing allocations will be able to be paged out using this logic. Perhaps it's fine, in that it ensures that we take a large enough bite that the Vec
would be that large, and if the alloc is smaller then we have at most 2x the Vec
in memory (it plus a bit less in the region, in the worst case), which is decent insurance to provide (vs a smaller Vec
which might have proportionally more in the region, and still not spill out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a pending_buffer_size
function that returns 2*Self::buffer_size()
so that pending contains enough data to fill a whole buffer. I need to test the performance impact of this, but it seems like the right change to do.
I have another question too: could the merger be for arbitrary |
Signed-off-by: Moritz Hoffmann <[email protected]>
I'd prefer a single general implementation, but it's not easy to get there:
|
Signed-off-by: Moritz Hoffmann <[email protected]>
0a68e4f
to
8fb8e61
Compare
Looks good, and let's merge! |
This builds on #414 but uses the new Layout feature recently merged.