Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable kudo serializer by default #12222

Merged
merged 3 commits into from
Feb 27, 2025

Conversation

liurenjie1024
Copy link
Collaborator

@liurenjie1024 liurenjie1024 commented Feb 25, 2025

Closes #12202 .

Enable kudo serializer by default, and contains several fix due to shuffle size change.

Regression results in our performance cluster:

query1: Previous (1966.0 ms) vs Current (2219.8 ms) Diff -253 E2E 0.89x
query2: Previous (1876.6 ms) vs Current (1731.0 ms) Diff 145 E2E 1.08x
query3: Previous (636.8 ms) vs Current (638.2 ms) Diff -1 E2E 1.00x
query4: Previous (11516.2 ms) vs Current (11913.2 ms) Diff -397 E2E 0.97x
query5: Previous (2437.6 ms) vs Current (2476.0 ms) Diff -38 E2E 0.98x
query6: Previous (922.4 ms) vs Current (888.2 ms) Diff 34 E2E 1.04x
query7: Previous (3589.8 ms) vs Current (3441.2 ms) Diff 148 E2E 1.04x
query8: Previous (1223.8 ms) vs Current (1176.4 ms) Diff 47 E2E 1.04x
query9: Previous (5396.6 ms) vs Current (5223.0 ms) Diff 173 E2E 1.03x
query10: Previous (1821.8 ms) vs Current (1808.6 ms) Diff 13 E2E 1.01x
query11: Previous (6031.6 ms) vs Current (6381.8 ms) Diff -350 E2E 0.95x
query12: Previous (698.6 ms) vs Current (767.0 ms) Diff -68 E2E 0.91x
query13: Previous (1658.0 ms) vs Current (1617.8 ms) Diff 40 E2E 1.02x
query14_part1: Previous (8006.0 ms) vs Current (8680.2 ms) Diff -674 E2E 0.92x
query14_part2: Previous (7066.8 ms) vs Current (7180.6 ms) Diff -113 E2E 0.98x
query15: Previous (1312.0 ms) vs Current (1322.6 ms) Diff -10 E2E 0.99x
query16: Previous (4183.0 ms) vs Current (4142.8 ms) Diff 40 E2E 1.01x
query17: Previous (2124.0 ms) vs Current (2384.4 ms) Diff -260 E2E 0.89x
query18: Previous (2571.6 ms) vs Current (2900.4 ms) Diff -328 E2E 0.89x
query19: Previous (1536.6 ms) vs Current (1507.8 ms) Diff 28 E2E 1.02x
query20: Previous (707.2 ms) vs Current (827.8 ms) Diff -120 E2E 0.85x
query21: Previous (688.0 ms) vs Current (697.0 ms) Diff -9 E2E 0.99x
query22: Previous (1559.6 ms) vs Current (1236.4 ms) Diff 323 E2E 1.26x
query23_part1: Previous (15344.2 ms) vs Current (14627.0 ms) Diff 717 E2E 1.05x
query23_part2: Previous (19364.4 ms) vs Current (19491.8 ms) Diff -127 E2E 0.99x
query24_part1: Previous (9705.4 ms) vs Current (9815.6 ms) Diff -110 E2E 0.99x
query24_part2: Previous (10125.2 ms) vs Current (10003.6 ms) Diff 121 E2E 1.01x
query25: Previous (1935.6 ms) vs Current (2087.6 ms) Diff -152 E2E 0.93x
query26: Previous (1204.2 ms) vs Current (1275.8 ms) Diff -71 E2E 0.94x
query27: Previous (1600.0 ms) vs Current (1512.0 ms) Diff 88 E2E 1.06x
query28: Previous (8589.4 ms) vs Current (8524.4 ms) Diff 65 E2E 1.01x
query29: Previous (2920.0 ms) vs Current (2997.6 ms) Diff -77 E2E 0.97x
query30: Previous (2373.8 ms) vs Current (2012.0 ms) Diff 361 E2E 1.18x
query31: Previous (2520.6 ms) vs Current (2914.6 ms) Diff -394 E2E 0.86x
query32: Previous (1441.4 ms) vs Current (1499.2 ms) Diff -57 E2E 0.96x
query33: Previous (1310.0 ms) vs Current (1383.6 ms) Diff -73 E2E 0.95x
query34: Previous (2644.4 ms) vs Current (2942.2 ms) Diff -297 E2E 0.90x
query35: Previous (2277.2 ms) vs Current (2416.2 ms) Diff -139 E2E 0.94x
query36: Previous (1632.4 ms) vs Current (1676.8 ms) Diff -44 E2E 0.97x
query37: Previous (1635.8 ms) vs Current (1469.0 ms) Diff 166 E2E 1.11x
query38: Previous (2796.8 ms) vs Current (3094.0 ms) Diff -297 E2E 0.90x
query39_part1: Previous (2121.2 ms) vs Current (2161.8 ms) Diff -40 E2E 0.98x
query39_part2: Previous (1633.2 ms) vs Current (1562.2 ms) Diff 71 E2E 1.05x
query40: Previous (1425.0 ms) vs Current (1526.2 ms) Diff -101 E2E 0.93x
query41: Previous (377.0 ms) vs Current (396.2 ms) Diff -19 E2E 0.95x
query42: Previous (436.8 ms) vs Current (430.8 ms) Diff 6 E2E 1.01x
query43: Previous (1266.8 ms) vs Current (1300.4 ms) Diff -33 E2E 0.97x
query44: Previous (774.0 ms) vs Current (816.8 ms) Diff -42 E2E 0.95x
query45: Previous (1363.2 ms) vs Current (1217.2 ms) Diff 146 E2E 1.12x
query46: Previous (1786.0 ms) vs Current (2001.4 ms) Diff -215 E2E 0.89x
query47: Previous (2360.2 ms) vs Current (2256.4 ms) Diff 103 E2E 1.05x
query48: Previous (1399.8 ms) vs Current (1687.0 ms) Diff -287 E2E 0.83x
query49: Previous (2814.0 ms) vs Current (2815.2 ms) Diff -1 E2E 1.00x
query50: Previous (9065.6 ms) vs Current (9034.0 ms) Diff 31 E2E 1.00x
query51: Previous (2627.4 ms) vs Current (2193.4 ms) Diff 434 E2E 1.20x
query52: Previous (687.4 ms) vs Current (818.0 ms) Diff -130 E2E 0.84x
query53: Previous (1013.4 ms) vs Current (1076.6 ms) Diff -63 E2E 0.94x
query54: Previous (1754.0 ms) vs Current (1798.0 ms) Diff -44 E2E 0.98x
query55: Previous (632.4 ms) vs Current (603.8 ms) Diff 28 E2E 1.05x
query56: Previous (1180.0 ms) vs Current (1140.0 ms) Diff 40 E2E 1.04x
query57: Previous (1845.4 ms) vs Current (1745.6 ms) Diff 99 E2E 1.06x
query58: Previous (1107.0 ms) vs Current (1112.8 ms) Diff -5 E2E 0.99x
query59: Previous (2775.0 ms) vs Current (2974.4 ms) Diff -199 E2E 0.93x
query60: Previous (1539.2 ms) vs Current (1555.2 ms) Diff -16 E2E 0.99x
query61: Previous (1643.2 ms) vs Current (1640.6 ms) Diff 2 E2E 1.00x
query62: Previous (1646.0 ms) vs Current (1751.2 ms) Diff -105 E2E 0.94x
query63: Previous (1144.8 ms) vs Current (1172.0 ms) Diff -27 E2E 0.98x
query64: Previous (17034.0 ms) vs Current (17343.6 ms) Diff -309 E2E 0.98x
query65: Previous (4208.0 ms) vs Current (4359.8 ms) Diff -151 E2E 0.97x
query66: Previous (3315.4 ms) vs Current (3468.2 ms) Diff -152 E2E 0.96x
query67: Previous (27362.0 ms) vs Current (24280.6 ms) Diff 3081 E2E 1.13x
query68: Previous (1380.6 ms) vs Current (1421.4 ms) Diff -40 E2E 0.97x
query69: Previous (1641.4 ms) vs Current (1777.8 ms) Diff -136 E2E 0.92x
query70: Previous (2221.4 ms) vs Current (2014.8 ms) Diff 206 E2E 1.10x
query71: Previous (4220.0 ms) vs Current (3778.0 ms) Diff 442 E2E 1.12x
query72: Previous (3527.4 ms) vs Current (3684.8 ms) Diff -157 E2E 0.96x
query73: Previous (1265.2 ms) vs Current (1457.8 ms) Diff -192 E2E 0.87x
query74: Previous (4290.6 ms) vs Current (4430.6 ms) Diff -140 E2E 0.97x
query75: Previous (8853.6 ms) vs Current (8597.8 ms) Diff 255 E2E 1.03x
query76: Previous (3388.2 ms) vs Current (3063.6 ms) Diff 324 E2E 1.11x
query77: Previous (1273.6 ms) vs Current (1433.0 ms) Diff -159 E2E 0.89x
query78: Previous (9151.8 ms) vs Current (9138.4 ms) Diff 13 E2E 1.00x
query79: Previous (1352.0 ms) vs Current (1206.6 ms) Diff 145 E2E 1.12x
query80: Previous (4912.4 ms) vs Current (5252.6 ms) Diff -340 E2E 0.94x
query81: Previous (2690.8 ms) vs Current (2454.0 ms) Diff 236 E2E 1.10x
query82: Previous (2457.8 ms) vs Current (2562.8 ms) Diff -105 E2E 0.96x
query83: Previous (836.4 ms) vs Current (814.2 ms) Diff 22 E2E 1.03x
query84: Previous (1395.4 ms) vs Current (1490.4 ms) Diff -95 E2E 0.94x
query85: Previous (1877.0 ms) vs Current (2070.6 ms) Diff -193 E2E 0.91x
query86: Previous (1209.2 ms) vs Current (1297.4 ms) Diff -88 E2E 0.93x
query87: Previous (2896.0 ms) vs Current (2944.6 ms) Diff -48 E2E 0.98x
query88: Previous (5839.2 ms) vs Current (5986.4 ms) Diff -147 E2E 0.98x
query89: Previous (1622.8 ms) vs Current (2188.0 ms) Diff -565 E2E 0.74x
query90: Previous (1404.8 ms) vs Current (1045.2 ms) Diff 359 E2E 1.34x
query91: Previous (1231.6 ms) vs Current (1290.8 ms) Diff -59 E2E 0.95x
query92: Previous (670.0 ms) vs Current (650.0 ms) Diff 20 E2E 1.03x
query93: Previous (10255.2 ms) vs Current (10063.0 ms) Diff 192 E2E 1.02x
query94: Previous (4808.0 ms) vs Current (4785.2 ms) Diff 22 E2E 1.00x
query95: Previous (6993.8 ms) vs Current (7260.4 ms) Diff -266 E2E 0.96x
query96: Previous (5022.4 ms) vs Current (4883.0 ms) Diff 139 E2E 1.03x
query97: Previous (2275.6 ms) vs Current (2321.8 ms) Diff -46 E2E 0.98x
query98: Previous (1724.8 ms) vs Current (1818.2 ms) Diff -93 E2E 0.95x
query99: Previous (2267.2 ms) vs Current (2245.6 ms) Diff 21 E2E 1.01x
benchmark: Previous (363000.0 ms) vs Current (363600.0 ms) Diff -600 E2E 1.00x

--------------------------------------------------------------------
Name = query67
Means = 27362.0, 24280.6
Time diff = 3081.4000000000015
Speedup = 1.1269079017816694
T-Test (test statistic, p value, df) = 7.987130606242702, 4.417720265634195e-05, 8.0
T-Test Confidence Interval = 2191.7537061652265, 3971.0462938347764
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed

@liurenjie1024
Copy link
Collaborator Author

build

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liurenjie1024 can you provide performance numbers for NDS with and without MT shuffle?

@sameerz sameerz added the performance A performance related task/issue label Feb 25, 2025
@liurenjie1024
Copy link
Collaborator Author

build

@@ -500,7 +500,7 @@ class AdaptiveQueryExecSuite
val conf = new SparkConf()
.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true")
.set(SQLConf.LOCAL_SHUFFLE_READER_ENABLED.key, "true")
.set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "400")
.set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "50")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required since shuffle size changed.

@@ -99,7 +99,7 @@ class GpuLoreSuite extends SparkQueryCompareTestSuite with FunSuiteWithTempDir w
}

test("AQE broadcast") {
doTestReplay("90[*]") { spark =>
doTestReplay("93[*]") { spark =>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, shuffle size change leads to plan change.

@@ -2052,7 +2052,7 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.
.internal()
.startupOnly()
.booleanConf
.createWithDefault(false)
.createWithDefault(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For spark.rapids.shuffle.kudo.serializer.measure.buffer.copy.enabled, should we enable that by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's reserved for experiments, and has performance impact. We should only enable it when necessary.

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves look fine. I mostly want to see the performance numbers to show that it is at least as good as the old code. I know we have done some of that in the past and that there have been a lot of optimizations recently so it should be good. But this is a big change so I want to see it.

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the results I got through our regression check and posted to the description. It looks good to me. It would be nice to understand if q67's change is explained by Kudo or not.

@liurenjie1024 liurenjie1024 merged commit 62cada8 into NVIDIA:branch-25.04 Feb 27, 2025
51 of 52 checks passed
@liurenjie1024 liurenjie1024 deleted the ray/issue-12202 branch February 27, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable kudo serializer by default.
5 participants