-
Notifications
You must be signed in to change notification settings - Fork 1.5k
chore: refactor BuildProbeJoinMetrics
to use BaselineMetrics
#16500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I left some suggestions, looking forward to your thoughts.
@@ -632,7 +632,7 @@ impl<T: BatchTransformer> CrossJoinStream<T> { | |||
} | |||
|
|||
self.join_metrics.output_batches.add(1); | |||
self.join_metrics.output_rows.add(batch.num_rows()); | |||
self.join_metrics.baseline.record_output(batch.num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use
pub fn record_poll( |
to keep the implementation consistent with other operators, and the same for other join operators in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it to use record_poll
. Could you check if that's what you were looking for? It's my first time with this codebase, so I'm a little unsure if I did the right thing
pub(crate) output_rows: metrics::Count, | ||
} | ||
|
||
impl Drop for BuildProbeJoinMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the best way but I guess it's okay to merge 🤔 If we forget to count a specific time period in some join operator, we can fix it in the future.
Could you also add a brief comment to explain this drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is not intuitive at all. When I came back to this PR to act on your suggestions, it took me a while to remember how this works :)
I have added a comment for now. I would be open to other ways of handling this!
Closes apache#16495 Here's an example of an `explain analyze` of a hash join showing these metrics: ``` [(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms] ``` Notice `output_rows=100, elapsed_compute=2.313624ms` in the above.
Which issue does this PR close?
Closes #16495
Rationale for this change
What changes are included in this PR?
Add a
BaselineMetrics
and removeoutput_rows
fromBuildProbeJoinMetrics
.elapsed_compute
in baseline is populated in the Drop trait implementation by summing upjoin_time
andbuild_time
.Are these changes tested?
Here's an example of an
explain analyze
of a hash join showing these metrics:Notice
output_rows=100, elapsed_compute=2.313624ms
in the above.Are there any user-facing changes?
No