Skip to content

chore: refactor BuildProbeJoinMetrics to use BaselineMetrics #16500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Samyak2
Copy link

@Samyak2 Samyak2 commented Jun 22, 2025

Which issue does this PR close?

Closes #16495

Rationale for this change

What changes are included in this PR?

Add a BaselineMetrics and remove output_rows from BuildProbeJoinMetrics.

  • The elapsed_compute in baseline is populated in the Drop trait implementation by summing up join_time and build_time.

Are these changes tested?

Here's an example of an explain analyze of a hash join showing these metrics:

[(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms]

Notice output_rows=100, elapsed_compute=2.313624ms in the above.

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 22, 2025
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left some suggestions, looking forward to your thoughts.

@@ -632,7 +632,7 @@ impl<T: BatchTransformer> CrossJoinStream<T> {
}

self.join_metrics.output_batches.add(1);
self.join_metrics.output_rows.add(batch.num_rows());
self.join_metrics.baseline.record_output(batch.num_rows());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use


to keep the implementation consistent with other operators, and the same for other join operators in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to use record_poll. Could you check if that's what you were looking for? It's my first time with this codebase, so I'm a little unsure if I did the right thing

pub(crate) output_rows: metrics::Count,
}

impl Drop for BuildProbeJoinMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the best way but I guess it's okay to merge 🤔 If we forget to count a specific time period in some join operator, we can fix it in the future.

Could you also add a brief comment to explain this drop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is not intuitive at all. When I came back to this PR to act on your suggestions, it took me a while to remember how this works :)

I have added a comment for now. I would be open to other ways of handling this!

Closes apache#16495

Here's an example of an `explain analyze` of a hash join showing these metrics:
```
[(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms]
```

Notice `output_rows=100, elapsed_compute=2.313624ms` in the above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor BuildProbeJoinMetrics in Hash Join to reuse BaselineMetrics
2 participants