Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processing for column-based TRowSet generation #5927

Closed
wants to merge 4 commits into from

Conversation

bowenliang123
Copy link
Contributor

@bowenliang123 bowenliang123 commented Dec 28, 2023

🔍 Description

Issue References 🔗

Subtask of #5808

This pull request fixes #

Describe Your Solution 🔧

  • Support parallel processing for column-based TRowSet generation, within a fork-join pool on the engine side
  • The order of columns in TRowSet is still guaranteed by sorting the column index, which is a very light cost operation
  • Add a config to enable/disable this feature

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

I will provide a rough comparison benchmark for this feature in TRowSetGenerator.

Related Unit Tests


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@codecov-commenter
Copy link

codecov-commenter commented Dec 29, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5d59cf1) 61.24% compared to head (02120f6) 61.16%.
Report is 1 commits behind head on master.

Files Patch % Lines
...gine/spark/schema/SparkArrowTRowSetGenerator.scala 0.00% 1 Missing ⚠️
...apache/kyuubi/engine/result/TRowSetGenerator.scala 94.44% 0 Missing and 1 partial ⚠️
...ain/scala/org/apache/kyuubi/util/ThreadUtils.scala 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5927      +/-   ##
============================================
- Coverage     61.24%   61.16%   -0.08%     
  Complexity       23       23              
============================================
  Files           621      621              
  Lines         36864    36900      +36     
  Branches       5014     5016       +2     
============================================
- Hits          22576    22571       -5     
- Misses        11860    11895      +35     
- Partials       2428     2434       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bowenliang123 bowenliang123 deleted the rowset-column-par branch April 24, 2024 05:16
@bowenliang123 bowenliang123 restored the rowset-column-par branch September 4, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants