Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: create a rust based PartitionSet #3515

Merged
merged 8 commits into from
Dec 9, 2024

Conversation

universalmind303
Copy link
Contributor

most of this is ported from the python impl inside daft/runners/partitioning.py.

Note for reviewer.

For context around why this is needed. The DataFrame class uses PartitionSet extensively for various common operations such as show, and collect. In order to add this functionality to our spark connect implementation, we need a similar construct in rust.

Ideally, I'd like to port over the python implementation to use this new rust one, but there are still a few things that I'm not entirely sure how to implement (such as RayPartitionSet)

Not all of the methods inside partitioning.rs are used yet, But I intend to follow up this PR with an implementation for #3498, and this is a prerequisite as show relies on get_preview_micropartitions.

Copy link

codspeed-hq bot commented Dec 6, 2024

CodSpeed Performance Report

Merging #3515 will improve performances by 21.32%

Comparing universalmind303:rust-pset (6a702a8) with main (ad175ae)

Summary

⚡ 1 improvements
✅ 16 untouched benchmarks

Benchmarks breakdown

Benchmark main universalmind303:rust-pset Change
test_iter_rows_first_row[100 Small Files] 187.6 ms 154.6 ms +21.32%

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 25.18519% with 101 lines in your changes missing coverage. Please review.

Project coverage is 77.51%. Comparing base (9739bb6) to head (6a702a8).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-micropartition/src/partitioning.rs 17.39% 95 Missing ⚠️
src/daft-connect/src/translation/logical_plan.rs 14.28% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3515      +/-   ##
==========================================
+ Coverage   77.50%   77.51%   +0.01%     
==========================================
  Files         703      709       +6     
  Lines       85645    86279     +634     
==========================================
+ Hits        66378    66880     +502     
- Misses      19267    19399     +132     
Files with missing lines Coverage Δ
...ect/src/translation/logical_plan/local_relation.rs 61.66% <100.00%> (+0.64%) ⬆️
src/daft-local-execution/src/pipeline.rs 93.76% <100.00%> (-0.73%) ⬇️
src/daft-local-execution/src/run.rs 89.52% <100.00%> (+0.05%) ⬆️
src/daft-local-execution/src/sources/in_memory.rs 81.25% <100.00%> (-2.09%) ⬇️
src/daft-micropartition/src/lib.rs 50.00% <ø> (ø)
src/daft-micropartition/src/micropartition.rs 90.81% <ø> (ø)
src/daft-connect/src/translation/logical_plan.rs 73.17% <14.28%> (-12.55%) ⬇️
src/daft-micropartition/src/partitioning.rs 17.39% <17.39%> (ø)

... and 75 files with indirect coverage changes

src/daft-connect/Cargo.toml Show resolved Hide resolved
src/daft-local-execution/src/pipeline.rs Outdated Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Outdated Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Outdated Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Show resolved Hide resolved
src/daft-micropartition/src/partitioning.rs Outdated Show resolved Hide resolved
Copy link
Member

@andrewgazelka andrewgazelka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@andrewgazelka
Copy link
Member

preference (specifically @universalmind303) on reviewers auto merging if review is requested and we ✅?

@universalmind303
Copy link
Contributor Author

preference (specifically @universalmind303) on reviewers auto merging if review is requested and we ✅?

i generally have a preference for leaving it to the author to merge.

@universalmind303 universalmind303 merged commit a99d2ab into Eventual-Inc:main Dec 9, 2024
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants