-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add initial scanner statistics #3075
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3075 +/- ##
==========================================
- Coverage 78.85% 78.78% -0.08%
==========================================
Files 250 251 +1
Lines 91474 91641 +167
Branches 91474 91641 +167
==========================================
+ Hits 72134 72196 +62
- Misses 16379 16479 +100
- Partials 2961 2966 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
95519fd
to
fa1d090
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool. I have some minor suggestions.
Python::with_gil(|py| { | ||
let args = PyTuple::new_bound(py, vec![wrapped_stats.into_py(py)]); | ||
stats_handler.call1(py, args) | ||
}) | ||
.map_err(|err| lance_core::Error::Wrapped { | ||
error: err.into(), | ||
location: location!(), | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might recommend using format_python_error()
error here. The nice thing it does is puts the traceback in the error message, which helps Python users a lot in identifying the source of an error.
#[getter] | ||
fn start(&self) -> PyResult<u64> { | ||
Ok(self | ||
.inner | ||
.start | ||
.duration_since(std::time::UNIX_EPOCH) | ||
.unwrap() | ||
.as_millis() | ||
.try_into() | ||
.unwrap()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're willing to make start
and end
datetime.datetime
objects instead of ints, the PyO3 conversion table suggests you should just be able to do:
#[getter] | |
fn start(&self) -> PyResult<u64> { | |
Ok(self | |
.inner | |
.start | |
.duration_since(std::time::UNIX_EPOCH) | |
.unwrap() | |
.as_millis() | |
.try_into() | |
.unwrap()) | |
} | |
#[getter] | |
fn start(&self) -> SystemTime { | |
self.inner.start.clone() | |
} |
#[getter] | ||
fn wall_clock_duration(&self) -> PyResult<f64> { | ||
Ok(self.inner.wall_clock_duration.as_secs_f64()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here with Duration
. I think it can natively convert to Python's timedelta
.
fbc682b
to
5b84e93
Compare
This just captures some very basic statistics. I'd like to eventually add bytes read, decode time, and time waiting on I/O to the mix. However, those will need to wait for #2977 because those stats will go in the scheduler and we want one scan scheduler to be used in the entire plan first.