Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

Closed
wants to merge 4 commits into from

Conversation

beryllw
Copy link
Contributor

@beryllw beryllw commented Feb 22, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6070

Describe Your Solution 🔧

https://issues.apache.org/jira/browse/SPARK-47085

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklist 📝

Be nice. Be informative.

@cxzl25 cxzl25 changed the title [KYUUBI #6070] Performance Improvement for converting rows to thrift … [KYUUBI #6070] Performance Improvement for converting spark rows to thrift rows Feb 22, 2024
@beryllw beryllw requested a review from pan3793 February 23, 2024 08:22
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.08%. Comparing base (e4f9c5d) to head (90a1256).
Report is 3 commits behind head on master.

❗ Current head 90a1256 differs from pull request most recent head 84114f3. Consider uploading reports for the commit 84114f3 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6077      +/-   ##
============================================
- Coverage     61.12%   61.08%   -0.04%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37206    37200       -6     
  Branches       5041     5040       -1     
============================================
- Hits          22741    22725      -16     
+ Misses        12012    12011       -1     
- Partials       2453     2464      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793 pan3793 changed the title [KYUUBI #6070] Performance Improvement for converting spark rows to thrift rows [KYUUBI #6070] Improve perf on assembling row-based TRowSet Feb 23, 2024
@pan3793 pan3793 added this to the v1.9.0 milestone Feb 24, 2024
@pan3793
Copy link
Member

pan3793 commented Feb 24, 2024

Thanks, merged to master

@pan3793 pan3793 closed this in 030ee70 Feb 24, 2024
@yaooqinn
Copy link
Member

TColumnGenerator.getColumnToList was missed?

zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
# 🔍 Description
## Issue References 🔗

This pull request fixes apache#6070

## Describe Your Solution 🔧

https://issues.apache.org/jira/browse/SPARK-47085

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6077 from Kwafoor/kyuubi_6070.

Closes apache#6070

84114f3 [wangjunbo] fix
90a1256 [wangjunbo] fix
97db3c9 [wangjunbo] fix
5442296 [wangjunbo] [KYUUBI apache#6070] Performance Improvement for converting rows to thrift rows

Authored-by: wangjunbo <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
@beryllw beryllw deleted the kyuubi_6070 branch April 30, 2024 05:54
@hh-cn
Copy link

hh-cn commented Aug 30, 2024

@beryllw @yaooqinn

I'm sorry to bother you, but it seems that this issue hasn't been fully resolved. The logic related to toRowBasedSet has been fixed, but in the toColumnBasedSet, the getColumnToList method still involves similar access to a non-IndexedSeq with val row = rows(idx). This results in significantly impacted serialization speed when Hive JDBC statements have a large fetchSize (> 300).

trait TColumnGenerator[RowT] extends TRowSetColumnGetter[RowT] {
protected def getColumnToList[T](
rows: Seq[RowT],
ordinal: Int,
defaultVal: T,
convertFunc: (RowT, Int) => T = null): (JList[T], ByteBuffer) = {
val rowSize = rows.length
val ret = new JArrayListT
val nulls = new JBitSet()
var idx = 0
while (idx < rowSize) {
val row = rows(idx)
val isNull = isColumnNullAt(row, ordinal)
if (isNull) {
nulls.set(idx, true)
ret.add(defaultVal)
} else {
val value = Option(convertFunc) match {
case Some(f) => f(row, ordinal)
case _ => getColumnAs[T](row, ordinal)
}
ret.add(value)
}
idx += 1
}
(ret, ByteBuffer.wrap(nulls.toByteArray))
}

@yaooqinn
Copy link
Member

@hh-cn Can you send a PR for that issue?

@hh-cn
Copy link

hh-cn commented Aug 30, 2024

@hh-cn Can you send a PR for that issue?

I am not familiar with the process and standards for submitting a PR. I will glad to see you fixing it soon. @yaooqinn

@yaooqinn
Copy link
Member

Can you create an issue then? I'm fully booked and don't have time to fix this. But I'm sure someone will be interested in fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Performance Improvement for converting spark rows to thrift rows
6 participants