feat: add initial scanner statistics #3075

westonpace · 2024-10-31T20:28:54Z

This just captures some very basic statistics. I'd like to eventually add bytes read, decode time, and time waiting on I/O to the mix. However, those will need to wait for #2977 because those stats will go in the scheduler and we want one scan scheduler to be used in the entire plan first.

codecov-commenter · 2024-10-31T20:51:55Z

Codecov Report

Attention: Patch coverage is 56.77966% with 51 lines in your changes missing coverage. Please review.

Project coverage is 78.78%. Comparing base (bfacd7c) to head (5b84e93).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/scanner/stats.rs	45.94%	40 Missing ⚠️
rust/lance/src/dataset/scanner.rs	71.79%	7 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3075      +/-   ##
==========================================
- Coverage   78.85%   78.78%   -0.08%     
==========================================
  Files         250      251       +1     
  Lines       91474    91641     +167     
  Branches    91474    91641     +167     
==========================================
+ Hits        72134    72196      +62     
- Misses      16379    16479     +100     
- Partials     2961     2966       +5

Flag	Coverage Δ
unittests	`78.78% <56.77%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wjones127

This is pretty cool. I have some minor suggestions.

wjones127 · 2025-01-20T23:05:39Z

python/src/dataset.rs

+                    Python::with_gil(|py| {
+                        let args = PyTuple::new_bound(py, vec![wrapped_stats.into_py(py)]);
+                        stats_handler.call1(py, args)
+                    })
+                    .map_err(|err| lance_core::Error::Wrapped {
+                        error: err.into(),
+                        location: location!(),
+                    })?;


I might recommend using format_python_error() error here. The nice thing it does is puts the traceback in the error message, which helps Python users a lot in identifying the source of an error.

wjones127 · 2025-01-20T23:08:59Z

python/src/scanner.rs

+    #[getter]
+    fn start(&self) -> PyResult<u64> {
+        Ok(self
+            .inner
+            .start
+            .duration_since(std::time::UNIX_EPOCH)
+            .unwrap()
+            .as_millis()
+            .try_into()
+            .unwrap())
+    }


If you're willing to make start and end datetime.datetime objects instead of ints, the PyO3 conversion table suggests you should just be able to do:

Suggested change

#[getter]

fn start(&self) -> PyResult<u64> {

Ok(self

.inner

.start

.duration_since(std::time::UNIX_EPOCH)

.unwrap()

.as_millis()

.try_into()

.unwrap())

}

#[getter]

fn start(&self) -> SystemTime {

self.inner.start.clone()

}

wjones127 · 2025-01-20T23:09:50Z

python/src/scanner.rs

+    #[getter]
+    fn wall_clock_duration(&self) -> PyResult<f64> {
+        Ok(self.inner.wall_clock_duration.as_secs_f64())
+    }


Same thing here with Duration. I think it can natively convert to Python's timedelta.

github-actions bot added enhancement New feature or request python labels Oct 31, 2024

westonpace mentioned this pull request Jan 2, 2025

Proposal: Introduce metrics reporting for lance #3325

Open

westonpace force-pushed the feat/scan-stats branch 2 times, most recently from 95519fd to fa1d090 Compare January 20, 2025 14:27

wjones127 approved these changes Jan 20, 2025

View reviewed changes

westonpace added 6 commits January 28, 2025 06:25

Add initial scan statistics

2fd27ac

ruff format

3a8b627

Address clippy warnings

13567e9

Upgraded ruff in precommit to match whats in the workflow

d4870ba

Ruff format

0179c9c

Avoid clone of stats_handler in a place where we don't have GIL

5b84e93

westonpace force-pushed the feat/scan-stats branch from fbc682b to 5b84e93 Compare January 28, 2025 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add initial scanner statistics #3075

feat: add initial scanner statistics #3075

westonpace commented Oct 31, 2024

codecov-commenter commented Oct 31, 2024 •

edited

Loading

wjones127 left a comment

wjones127 Jan 20, 2025

wjones127 Jan 20, 2025

wjones127 Jan 20, 2025

feat: add initial scanner statistics #3075

Are you sure you want to change the base?

feat: add initial scanner statistics #3075

Conversation

westonpace commented Oct 31, 2024

codecov-commenter commented Oct 31, 2024 • edited Loading

Codecov Report

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 Jan 20, 2025

Choose a reason for hiding this comment

wjones127 Jan 20, 2025

Choose a reason for hiding this comment

wjones127 Jan 20, 2025

Choose a reason for hiding this comment

codecov-commenter commented Oct 31, 2024 •

edited

Loading