Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add initial scanner statistics #3075

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

westonpace
Copy link
Contributor

This just captures some very basic statistics. I'd like to eventually add bytes read, decode time, and time waiting on I/O to the mix. However, those will need to wait for #2977 because those stats will go in the scheduler and we want one scan scheduler to be used in the entire plan first.

@github-actions github-actions bot added enhancement New feature or request python labels Oct 31, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 56.77966% with 51 lines in your changes missing coverage. Please review.

Project coverage is 78.78%. Comparing base (bfacd7c) to head (5b84e93).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner/stats.rs 45.94% 40 Missing ⚠️
rust/lance/src/dataset/scanner.rs 71.79% 7 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3075      +/-   ##
==========================================
- Coverage   78.85%   78.78%   -0.08%     
==========================================
  Files         250      251       +1     
  Lines       91474    91641     +167     
  Branches    91474    91641     +167     
==========================================
+ Hits        72134    72196      +62     
- Misses      16379    16479     +100     
- Partials     2961     2966       +5     
Flag Coverage Δ
unittests 78.78% <56.77%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool. I have some minor suggestions.

Comment on lines +607 to +614
Python::with_gil(|py| {
let args = PyTuple::new_bound(py, vec![wrapped_stats.into_py(py)]);
stats_handler.call1(py, args)
})
.map_err(|err| lance_core::Error::Wrapped {
error: err.into(),
location: location!(),
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might recommend using format_python_error() error here. The nice thing it does is puts the traceback in the error message, which helps Python users a lot in identifying the source of an error.

Comment on lines +51 to +61
#[getter]
fn start(&self) -> PyResult<u64> {
Ok(self
.inner
.start
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_millis()
.try_into()
.unwrap())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're willing to make start and end datetime.datetime objects instead of ints, the PyO3 conversion table suggests you should just be able to do:

Suggested change
#[getter]
fn start(&self) -> PyResult<u64> {
Ok(self
.inner
.start
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_millis()
.try_into()
.unwrap())
}
#[getter]
fn start(&self) -> SystemTime {
self.inner.start.clone()
}

Comment on lines +93 to +96
#[getter]
fn wall_clock_duration(&self) -> PyResult<f64> {
Ok(self.inner.wall_clock_duration.as_secs_f64())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here with Duration. I think it can natively convert to Python's timedelta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants