-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have head() traverse all partitions #419
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #419 +/- ##
==========================================
+ Coverage 95.78% 95.82% +0.03%
==========================================
Files 25 25
Lines 1755 1771 +16
==========================================
+ Hits 1681 1697 +16
Misses 74 74 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
As was pointed out in #380, it is not uncommon for partitions to be empty, which can lead to unintuitive results for users since
head
by default only looks at the first partitions.To make
head(n)
more intuitive for users we can have it search across all partitions stopping only once there aren
rows to return. This differs from Dask behavior where a user must explicitly choose how many partitions to search across.Warning: this removes the
npartitions
argument previously available inEnsembleFrame.head
Solution Description
Here we iterate over each partition similar to how
head()
until we have found enough rows similar to the implementation in LSDBCode Quality
New Feature Checklist