-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable kudo serializer by default #12222
Enable kudo serializer by default #12222
Conversation
Signed-off-by: liurenjie1024 <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liurenjie1024 can you provide performance numbers for NDS with and without MT shuffle?
build |
@@ -500,7 +500,7 @@ class AdaptiveQueryExecSuite | |||
val conf = new SparkConf() | |||
.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true") | |||
.set(SQLConf.LOCAL_SHUFFLE_READER_ENABLED.key, "true") | |||
.set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "400") | |||
.set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "50") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is required since shuffle size changed.
@@ -99,7 +99,7 @@ class GpuLoreSuite extends SparkQueryCompareTestSuite with FunSuiteWithTempDir w | |||
} | |||
|
|||
test("AQE broadcast") { | |||
doTestReplay("90[*]") { spark => | |||
doTestReplay("93[*]") { spark => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, shuffle size change leads to plan change.
@@ -2052,7 +2052,7 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression. | |||
.internal() | |||
.startupOnly() | |||
.booleanConf | |||
.createWithDefault(false) | |||
.createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For spark.rapids.shuffle.kudo.serializer.measure.buffer.copy.enabled
, should we enable that by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's reserved for experiments, and has performance impact. We should only enable it when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes themselves look fine. I mostly want to see the performance numbers to show that it is at least as good as the old code. I know we have done some of that in the past and that there have been a lot of optimizations recently so it should be good. But this is a big change so I want to see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the results I got through our regression check and posted to the description. It looks good to me. It would be nice to understand if q67's change is explained by Kudo or not.
Closes #12202 .
Enable kudo serializer by default, and contains several fix due to shuffle size change.
Regression results in our performance cluster: