[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

beryllw · 2024-02-22T10:28:24Z

🔍 Description

Issue References 🔗

This pull request fixes #6070

Describe Your Solution 🔧

https://issues.apache.org/jira/browse/SPARK-47085

Types of changes 🔖

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

Checklist 📝

This patch was not authored or co-authored using Generative Tooling

Be nice. Be informative.

…hrift rows

kyuubi-common/src/main/scala/org/apache/kyuubi/engine/result/TRowSetGenerator.scala

codecov-commenter · 2024-02-23T08:23:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.08%. Comparing base (e4f9c5d) to head (90a1256).
Report is 3 commits behind head on master.

❗ Current head 90a1256 differs from pull request most recent head 84114f3. Consider uploading reports for the commit 84114f3 to get more accurate results

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #6077      +/-   ##
============================================
- Coverage     61.12%   61.08%   -0.04%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37206    37200       -6     
  Branches       5041     5040       -1     
============================================
- Hits          22741    22725      -16     
+ Misses        12012    12011       -1     
- Partials       2453     2464      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kyuubi-common/src/main/scala/org/apache/kyuubi/engine/result/TRowSetGenerator.scala

pan3793 · 2024-02-24T09:39:32Z

Thanks, merged to master

yaooqinn · 2024-02-26T02:24:02Z

TColumnGenerator.getColumnToList was missed?

# 🔍 Description ## Issue References 🔗 This pull request fixes apache#6070 ## Describe Your Solution 🔧 https://issues.apache.org/jira/browse/SPARK-47085 ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#6077 from Kwafoor/kyuubi_6070. Closes apache#6070 84114f3 [wangjunbo] fix 90a1256 [wangjunbo] fix 97db3c9 [wangjunbo] fix 5442296 [wangjunbo] [KYUUBI apache#6070] Performance Improvement for converting rows to thrift rows Authored-by: wangjunbo <[email protected]> Signed-off-by: Cheng Pan <[email protected]>

hh-cn · 2024-08-30T08:33:04Z

@beryllw @yaooqinn

I'm sorry to bother you, but it seems that this issue hasn't been fully resolved. The logic related to toRowBasedSet has been fixed, but in the toColumnBasedSet, the getColumnToList method still involves similar access to a non-IndexedSeq with val row = rows(idx). This results in significantly impacted serialization speed when Hive JDBC statements have a large fetchSize (> 300).

trait TColumnGenerator[RowT] extends TRowSetColumnGetter[RowT] {
protected def getColumnToList[T](
rows: Seq[RowT],
ordinal: Int,
defaultVal: T,
convertFunc: (RowT, Int) => T = null): (JList[T], ByteBuffer) = {
val rowSize = rows.length
val ret = new JArrayListT
val nulls = new JBitSet()
var idx = 0
while (idx < rowSize) {
val row = rows(idx)
val isNull = isColumnNullAt(row, ordinal)
if (isNull) {
nulls.set(idx, true)
ret.add(defaultVal)
} else {
val value = Option(convertFunc) match {
case Some(f) => f(row, ordinal)
case _ => getColumnAs[T](row, ordinal)
}
ret.add(value)
}
idx += 1
}
(ret, ByteBuffer.wrap(nulls.toByteArray))
}

yaooqinn · 2024-08-30T08:42:03Z

@hh-cn Can you send a PR for that issue?

hh-cn · 2024-08-30T08:55:19Z

@hh-cn Can you send a PR for that issue?

I am not familiar with the process and standards for submitting a PR. I will glad to see you fixing it soon. @yaooqinn

yaooqinn · 2024-08-30T09:04:24Z

Can you create an issue then? I'm fully booked and don't have time to fix this. But I'm sure someone will be interested in fixing it.

[KYUUBI apache#6070] Performance Improvement for converting rows to t…

5442296

…hrift rows

github-actions bot added the module:common label Feb 22, 2024

pan3793 reviewed Feb 22, 2024

View reviewed changes

kyuubi-common/src/main/scala/org/apache/kyuubi/engine/result/TRowSetGenerator.scala Outdated Show resolved Hide resolved

cxzl25 changed the title ~~[KYUUBI #6070] Performance Improvement for converting rows to thrift …~~ [KYUUBI #6070] Performance Improvement for converting spark rows to thrift rows Feb 22, 2024

fix

97db3c9

beryllw requested a review from pan3793 February 23, 2024 08:22

fix

90a1256

pan3793 reviewed Feb 23, 2024

View reviewed changes

kyuubi-common/src/main/scala/org/apache/kyuubi/engine/result/TRowSetGenerator.scala Outdated Show resolved Hide resolved

pan3793 approved these changes Feb 23, 2024

View reviewed changes

fix

84114f3

pan3793 changed the title ~~[KYUUBI #6070] Performance Improvement for converting spark rows to thrift rows~~ [KYUUBI #6070] Improve perf on assembling row-based TRowSet Feb 23, 2024

cxzl25 approved these changes Feb 23, 2024

View reviewed changes

pan3793 added this to the v1.9.0 milestone Feb 24, 2024

pan3793 assigned beryllw Feb 24, 2024

pan3793 closed this in 030ee70 Feb 24, 2024

beryllw deleted the kyuubi_6070 branch April 30, 2024 05:54

This was referenced Sep 2, 2024

[Improvement] Performance Improvement for converting spark rows to thrift rows #6070

Closed

[Improvement] Improve performance on converting spark rows to column-based thrift row set #6661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

beryllw commented Feb 22, 2024

codecov-commenter commented Feb 23, 2024 •

edited

Loading

pan3793 commented Feb 24, 2024

yaooqinn commented Feb 26, 2024

hh-cn commented Aug 30, 2024

yaooqinn commented Aug 30, 2024

hh-cn commented Aug 30, 2024 •

edited

Loading

yaooqinn commented Aug 30, 2024

[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

[KYUUBI #6070] Improve perf on assembling row-based TRowSet #6077

Conversation

beryllw commented Feb 22, 2024

🔍 Description

Issue References 🔗

Describe Your Solution 🔧

Types of changes 🔖

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

Checklist 📝

codecov-commenter commented Feb 23, 2024 • edited Loading

Codecov Report

pan3793 commented Feb 24, 2024

yaooqinn commented Feb 26, 2024

hh-cn commented Aug 30, 2024

yaooqinn commented Aug 30, 2024

hh-cn commented Aug 30, 2024 • edited Loading

yaooqinn commented Aug 30, 2024

codecov-commenter commented Feb 23, 2024 •

edited

Loading

hh-cn commented Aug 30, 2024 •

edited

Loading