-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK] Add benchmark for Spark TRowSet generation of row-based and column-based #5809
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5809 +/- ##
============================================
- Coverage 61.44% 61.29% -0.15%
Complexity 23 23
============================================
Files 608 608
Lines 36094 36027 -67
Branches 4952 4952
============================================
- Hits 22178 22083 -95
- Misses 11522 11560 +38
+ Partials 2394 2384 -10 ☔ View full report in Codecov by Sentry. |
Should we create a new class separately? And you can refer to |
Thanks for the advice. Moved to a new class |
import org.apache.spark.sql.types._ | ||
|
||
class RowSetBenchmark extends BaseRowSetSuite { | ||
test("to row set benchmark") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't put benchmarks into tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Could you have a check?
I took the existing TPCDSTableGenerateBenchmark
for the example.
class TPCDSTableGenerateBenchmark extends KyuubiFunSuite with KyuubiBenchmarkBase {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to keep it as running via tests by setting RUN_BENCHMARK=1
, just like other existed benchmarks like TPCDSTableGenerateBenchmark.
JMH for isolated benchmark testing could be introduced next time.
Having trouble in integrating JMH for Scala without official Maven plugin support , using JMH Java annotations , the proper execution entry point to run with JMH and the isolation path for JMH benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 to use Spark's Benchmark tools in engine modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could refactor this onto JMH in the follow-up PRs. These tests are not run with GA tests. This should not be a blocker issue here for evaluating the overall TRowSet generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 to use Spark's Benchmark tools in engine modules
what's the reason/major concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could refactor this onto JMH in the follow-up PRs
We don't need to refactor if it's originally designed with JMH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
089bb2b
to
3619589
Compare
externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/kyuubi/TRowSetBenchmark.scala
Outdated
Show resolved
Hide resolved
And based on your screenshots in your PR desc, what are actually the control group, experimental groups? |
There is no control or experimental group in this PR. It provides a benchmark tool for evaluating both column-based and row-based rowset for the access from V5 and V6 above. In the coming-up experiments, the benchmark will be run on the base version and different improvement implementations for comparison. |
ef0d3a1
to
c7cfba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I agree to adopt this benchmark framework, because
- it is light
- there are existing benchmarks based on it
- many Kyuubi developers are familiar with it
And we could decouple it with Spark's utils and move it to kyuubi-util module for a general light-weight benchmark kit in the future. And when it's ready to integrate JMH in Kyuubi with Maven + sbt + Scala, this benchmark toolkit is able to be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep my -1 as the testing purpose here is not clear
If there are a bunch of PRs, I suggest you create an umbrella, and an KPIP(discuss/vote in the dev list) is necessary for introduce a benchmarking framework. |
As I understand,
I think the testing purpose is clear, this is for benchmarking the conversion performance from |
You are correct about the purpose of this PR, but the benchmark itself needs to be corrected. Technically, if we introduce the Spark benchmark tool, the first line of results in each single benchmark should be the control group as it always produces 1x for The current test also varies the simple rule of |
Got it, thank you for your explanation. |
Closing this PR with no enough consensus on the purposes, the design, the changes and the approaches. |
I'm strongly against your comment here. First, the umbrella issue is created for the whole task list that is still extendable, Second, you did not allow me to use the test-jars of Spark for using existed benchmark kit, unintentionally or intentionally ignoring that several benchmark tests have already introduced on it . Third, you told me to raise a KPIP for such a duplicated framework from a copied implementation. I respect all your comments but I just extremely unwillingly to see every and every and every effort in resolving this problem has been deliberately disregarded and pulled back a meter back for a inch forward. I did no evil and did not violate any community code of conduct now and ever! WHY make it difficult for me !!! |
Hi @bowenliang123. First thing first, calm down. I want to clarify that I am a regular contributor/PMC member of Apache Kyuubi, just like everyone else. My comments on this PR are simply my personal opinion. I have left a veto with explanations, which also have been challenged and discussed.
I know you well in person. You and nobody else don't violate CoC in this PR.
|
cba080a
to
d2a360f
Compare
🔍 Description
Issue References 🔗
Subtask of #5808.
Describe Your Solution 🔧
Add performance benchmark for Spark TRowSet generation for
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Row-based:
Column-based:
Related Unit Tests
Added "to row set benchmark" ut in Spark Engine's RowSetSuite.
Checklists
📝 Author Self Checklist
📝 Committer Pre-Merge Checklist
Be nice. Be informative.