-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: create a rust based PartitionSet
#3515
refactor: create a rust based PartitionSet
#3515
Conversation
CodSpeed Performance ReportMerging #3515 will improve performances by 21.32%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3515 +/- ##
==========================================
+ Coverage 77.50% 77.51% +0.01%
==========================================
Files 703 709 +6
Lines 85645 86279 +634
==========================================
+ Hits 66378 66880 +502
- Misses 19267 19399 +132
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
preference (specifically @universalmind303) on reviewers auto merging if review is requested and we ✅? |
i generally have a preference for leaving it to the author to merge. |
most of this is ported from the python impl inside
daft/runners/partitioning.py
.Note for reviewer.
For context around why this is needed. The
DataFrame
class usesPartitionSet
extensively for various common operations such asshow
, andcollect
. In order to add this functionality to our spark connect implementation, we need a similar construct in rust.Ideally, I'd like to port over the python implementation to use this new rust one, but there are still a few things that I'm not entirely sure how to implement (such as
RayPartitionSet
)Not all of the methods inside
partitioning.rs
are used yet, But I intend to follow up this PR with an implementation for #3498, and this is a prerequisite asshow
relies onget_preview_micropartitions
.