-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log frontiers in progress tracking #539
base: master
Are you sure you want to change the base?
Log frontiers in progress tracking #539
Conversation
This commit makes the reachability tracker directly return changes to implications from `propagate_all`, rather than stashing them and expecting callers to access them through a separate `pushed` method. This API changes allows `propagate_all` to assume that the caller will consume the changes before the next call to `propagate_all`, which in turn guarantees that `pushed_changes` only contains changes from the last `propagate_all` call, rather than possibly changes from previous calls as well.
69285e3
to
a92df78
Compare
This commit changes the shape of `TrackerEvent` to include the source/target distinction in a `Location` rather than separate enum variants. Apart from less boilerplate, this speeds up the frontier logging in `propagate_all` as we can directly use the contents of `pushed_changes`, instead of having to split them into target and source changes. A side effect of this is that the definition of `ProgressEventTimestampVec` changes, which also affects the `TimelyProgressEvent` type.
Benchmark ResultsTo test the performance impact of this change, I ran The test was performed in two configurations:
Below are the results of 10 runs each, for both of the configurations against the master branch and each of the three commits in this PR: From the results we can see that in this benchmark the additional logging has no negative effect on runtime if logging is disabled, and has a small negative effect when logging is enabled. The latter is expected. The former is maybe expected, though there is some weirdness here in that the results show that on master disabling logging actually slows Timely down! I'm not sure why this is. This result occurs consistently when the benchmark is repeated at different times, so it doesn't seem to be caused by noise in the system. Maybe some edge case related to CPU caching? |
I wired operator frontier logging up in Mz to verify that this is useful for tracking hydration progress, and it seems to be! With a simple cross join dataflow: CREATE TABLE t (a int);
INSERT INTO t SELECT generate_series(1, 10000);
CREATE VIEW v AS SELECT t1.a + t2.a FROM t t1, t t2 WHERE t1.a < 1000000 AND t2.a > 0;
CREATE DEFAULT INDEX ON v; We get this frontier output: materialize=> SELECT id, min(time) as time
FROM mz_internal.mz_dataflow_operator_frontiers
JOIN mz_internal.mz_dataflow_addresses USING (address)
JOIN mz_internal.mz_dataflow_operators USING (id)
WHERE
time IS NOT NULL AND
list_length(address) = 3
GROUP BY id
ORDER BY time DESC, id ASC;
id | time
------+---------------
8587 | 1701260559660
8592 | 1701260558660
8594 | 1701260558660
-- ...
8636 | 1701260558660
8658 | 1701260383161
8659 | 1701260383161
8661 | 1701260383161
-- ...
8674 | 1701260383161
(31 rows) ... which implicates operator |
This PR adds logging of frontiers to the reachability tracker. This is in support of hydration progress tracking in Materialize, which would like to know the per-operator frontiers to know which ones have finished processing their snapshot inputs.
Look at the commit messages for more details. The actual logging change is in the second commit, commit 1 and 2 are supporting changes that try to limit the performance impact of the additional logging.
The performance impact of this change should be negligible (or even negative!) when reachability logging is disabled and small when reachability logging is enabled. I'm including microbenchmark results in a comment below.
The reason that commit 3 makes the code more performant is, I think, that is makes it possible to allocate the vec that's passed to
log_frontier_updates
with the correct size upfront, rather than having to geometrically grow two vecs of unknown final size. We can probably get a similar effect without changing howTrackerEvent
looks, by counting the number of target and source changes while pushing topushed_changes
. I like theTrackerEvent
change because it removes some duplication, but it's not a required one.