feat(hashjoin): Add fast row size estimation for hash probe #11558

tanjialiang · 2024-11-16T03:05:32Z

Add column stats for row container to collect aggregated column stats. The aggregated column stats will be used in hash probe to decide if row size estimation is applicable. If it is applicable, column stats will be used to compose a fast row size estimation to avoid memory exploding when probing and listing results. This added feature makes hash join more performant, and in some extreme skew cases that we've seen in Meta internal queries, it helped to decrease the query latency by >20x.
The work of this feature also helped to discovered a bug in HashTable when using simd for fast path result listing -> when max number of rows is smaller than kWidth, the unsigned integer overflow bug will make the max number of rows be ignored. Fixed the bug and the new test covers that case.

netlify · 2024-11-16T03:05:49Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`07ede69`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6740f6ecd8c3000008da482b

facebook-github-bot · 2024-11-17T01:11:04Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-17T05:51:41Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-18T04:56:21Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng

@tanjialiang thanks for the improvement % minors.

velox/exec/RowContainer.h

xiaoxmeng · 2024-11-19T19:13:03Z

velox/exec/RowContainer.h

+
+ private:
+  // Aggregated stats for non-null rows of the column.
+  int32_t minBytes_{0};


why not use uint32_t? Thanks!

int32_t is widely used as cell size in row container. I'm trying to make it compatible so that no cast is needed when performing ::max ::min. If we want we can have another PR for refactoring the types in general.

velox/exec/RowContainer.cpp

velox/exec/RowContainer.h

xiaoxmeng · 2024-11-19T19:32:32Z

velox/exec/RowContainer.cpp

+      if (columnStatsValid) {
+        for (uint32_t columnIndex = 0; columnIndex < rowColumnsStats_.size();
+             columnIndex++) {
+          if (types_[columnIndex]->isFixedWidth()) {


I'd keep this simple by having column stats for fixed width as well like null count sth.

velox/exec/RowContainer.cpp

velox/exec/HashProbe.h

velox/exec/HashProbe.cpp

xiaoxmeng

@tanjialiang LGTM % minors. Thanks!

velox/exec/RowContainer.h

velox/exec/RowContainer.cpp

velox/exec/HashProbe.h

velox/exec/HashProbe.cpp

xiaoxmeng · 2024-11-20T00:41:22Z

velox/exec/HashProbe.cpp

+    if (totalMaxBytes == 0) {
+      return 0;
+    }
+    return std::nullopt;


Why return null here?

After offline discussion, I re-thought about it. The reason we want to return nullopt is as follows:
Imagine we have a case where 99999999 rows are of size 0, and 1 row is of size 1MB. And all left side join with this row. This will explode the memory.

velox/exec/HashProbe.cpp

xiaoxmeng

@tanjialiang thanks!

xiaoxmeng · 2024-11-20T01:00:02Z

velox/exec/HashProbe.cpp

+    totalAvgBytes += stats.avgBytes();
+    totalMaxBytes += stats.maxBytes();
+  }
+  if (totalAvgBytes == 0) {


if (totalAvgBytes == 0) { return 0; }

After offline discussion, I re-thought about it. The reason we want to return nullopt is as follows:
Imagine we have a case where 99999999 rows are of size 0, and 1 row is of size 1MB. And all left side join with this row. This will explode the memory.

Why it explodes? So most of rows are zero size so it is ok to execute fast path? There is number of output row limit.

facebook-github-bot · 2024-11-20T01:58:24Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-21T00:18:42Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng

@tanjialiang can you add to track the row column size for non-join case as well. The prefix sort might require that as well @zhli1142015 . Thanks!

xiaoxmeng · 2024-11-21T01:45:49Z

velox/functions/prestosql/tests/SimpleComparisonMatcherTest.cpp

@@ -27,7 +27,7 @@ namespace facebook::velox::functions::prestosql {
 namespace {

 class SimpleComparisonMatcherTest : public testing::Test,
-                                    public test::VectorTestBase {
+                                    public velox::test::VectorTestBase {


Why we need these namespace changes?

xiaoxmeng · 2024-11-21T01:46:08Z

velox/functions/sparksql/fuzzer/tests/SparkQueryRunnerTest.cpp

@@ -26,6 +26,7 @@
 #include "velox/parse/TypeResolver.h"
 #include "velox/vector/tests/utils/VectorTestBase.h"

+using namespace facebook;


Why we need additional facebook namespace?

For making complier not confused with the newly added test namespace in exec.

zhli1142015 · 2024-11-21T01:52:21Z

Yes, we should also need max length for string columns for prefix sort.
#11527 (comment)
#11272

tanjialiang · 2024-11-21T20:45:25Z

ot successful

I'll do a followup PR for that, to make sure there's no regression if enabled by default.

facebook-github-bot · 2024-11-21T20:48:06Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-22T06:38:26Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-22T07:18:32Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…incubator#11558) Summary: * Add column stats for row container to collect aggregated column stats. The aggregated column stats will be used in hash probe to decide if row size estimation is applicable. If it is applicable, column stats will be used to compose a fast row size estimation to avoid memory exploding when probing and listing results. This added feature makes hash join more performant, and in some extreme skew cases that we've seen in Meta internal queries, it helped to decrease the query latency by >20x. * The work of this feature also helped to discovered a bug in HashTable when using simd for fast path result listing -> when max number of rows is smaller than kWidth, the unsigned integer overflow bug will make the max number of rows be ignored. Fixed the bug and the new test covers that case. Pull Request resolved: facebookincubator#11558 Reviewed By: xiaoxmeng Differential Revision: D66064300 Pulled By: tanjialiang

facebook-github-bot · 2024-11-22T21:26:14Z

This pull request was exported from Phabricator. Differential Revision: D66064300

facebook-github-bot · 2024-11-22T23:32:30Z

@tanjialiang merged this pull request in 059337f.

conbench-facebook · 2024-11-23T00:05:32Z

Conbench analyzed the 1 benchmark run on commit 059337fc.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

FelixYBW · 2024-12-04T19:56:38Z

Thank you for the fix. @zhouyuan @zhztheplayer Is our Gluten Jenkins verified?

…incubator#11558) Summary: * Add column stats for row container to collect aggregated column stats. The aggregated column stats will be used in hash probe to decide if row size estimation is applicable. If it is applicable, column stats will be used to compose a fast row size estimation to avoid memory exploding when probing and listing results. This added feature makes hash join more performant, and in some extreme skew cases that we've seen in Meta internal queries, it helped to decrease the query latency by >20x. * The work of this feature also helped to discovered a bug in HashTable when using simd for fast path result listing -> when max number of rows is smaller than kWidth, the unsigned integer overflow bug will make the max number of rows be ignored. Fixed the bug and the new test covers that case. Pull Request resolved: facebookincubator#11558 Reviewed By: xiaoxmeng Differential Revision: D66064300 Pulled By: tanjialiang fbshipit-source-id: 886cd943036350b1c1bf0b6741ebe7165883a30f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 16, 2024

tanjialiang force-pushed the rc_column_stats branch 2 times, most recently from 852cbe2 to 9ed88ab Compare November 17, 2024 01:10

tanjialiang force-pushed the rc_column_stats branch from 9ed88ab to 3f1b34c Compare November 17, 2024 05:49

tanjialiang force-pushed the rc_column_stats branch 2 times, most recently from a9dc3bc to b4758b6 Compare November 18, 2024 04:47

tanjialiang changed the title ~~[WIP] feature(hashjoin): Add fast path to list join result~~ feature(hashjoin): Add fast row size estimation for hash probe Nov 18, 2024

tanjialiang marked this pull request as ready for review November 18, 2024 04:55

xiaoxmeng reviewed Nov 19, 2024

View reviewed changes

tanjialiang force-pushed the rc_column_stats branch from b4758b6 to 261ab55 Compare November 19, 2024 23:58

xiaoxmeng reviewed Nov 20, 2024

View reviewed changes

xiaoxmeng approved these changes Nov 20, 2024

View reviewed changes

tanjialiang force-pushed the rc_column_stats branch 2 times, most recently from b69ee43 to 9a3270b Compare November 20, 2024 01:56

tanjialiang changed the title ~~feature(hashjoin): Add fast row size estimation for hash probe~~ feat(hashjoin): Add fast row size estimation for hash probe Nov 20, 2024

tanjialiang force-pushed the rc_column_stats branch 3 times, most recently from 03bbefd to 613aebe Compare November 20, 2024 09:16

tanjialiang requested a review from majetideepak as a code owner November 20, 2024 09:16

tanjialiang force-pushed the rc_column_stats branch 6 times, most recently from c041da5 to 5ab4a6e Compare November 20, 2024 19:53

tanjialiang force-pushed the rc_column_stats branch 3 times, most recently from f99c1be to ccfd4e7 Compare November 20, 2024 23:55

xiaoxmeng reviewed Nov 21, 2024

View reviewed changes

tanjialiang force-pushed the rc_column_stats branch 5 times, most recently from 1f56443 to 397baeb Compare November 22, 2024 01:14

tanjialiang force-pushed the rc_column_stats branch from 397baeb to f3724d3 Compare November 22, 2024 07:18

tanjialiang force-pushed the rc_column_stats branch from f3724d3 to 07ede69 Compare November 22, 2024 21:26

facebook-github-bot added the fb-exported label Nov 22, 2024

facebook-github-bot closed this in 059337f Nov 22, 2024

facebook-github-bot added the Merged label Nov 22, 2024

Yuhta mentioned this pull request Dec 3, 2024

Hash probe performance regression with https://github.com/facebookincubator/velox/pull/10652 #11438

Closed

zsmj2017 mentioned this pull request Dec 4, 2024

RowContainer's columns' HasNulls info #11741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hashjoin): Add fast row size estimation for hash probe #11558

feat(hashjoin): Add fast row size estimation for hash probe #11558

tanjialiang commented Nov 16, 2024 •

edited

Loading

netlify bot commented Nov 16, 2024 •

edited

Loading

facebook-github-bot commented Nov 17, 2024

facebook-github-bot commented Nov 17, 2024

facebook-github-bot commented Nov 18, 2024

xiaoxmeng left a comment

xiaoxmeng Nov 19, 2024

tanjialiang Nov 19, 2024

xiaoxmeng Nov 19, 2024

xiaoxmeng left a comment

xiaoxmeng Nov 20, 2024

tanjialiang Nov 20, 2024

xiaoxmeng left a comment

xiaoxmeng Nov 20, 2024

tanjialiang Nov 20, 2024

xiaoxmeng Nov 21, 2024

facebook-github-bot commented Nov 20, 2024

facebook-github-bot commented Nov 21, 2024

xiaoxmeng left a comment

xiaoxmeng Nov 21, 2024

xiaoxmeng Nov 21, 2024

tanjialiang Nov 21, 2024

zhli1142015 commented Nov 21, 2024 •

edited

Loading

tanjialiang commented Nov 21, 2024

ot successful

facebook-github-bot commented Nov 21, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

conbench-facebook bot commented Nov 23, 2024

FelixYBW commented Dec 4, 2024

feat(hashjoin): Add fast row size estimation for hash probe #11558

feat(hashjoin): Add fast row size estimation for hash probe #11558

Conversation

tanjialiang commented Nov 16, 2024 • edited Loading

netlify bot commented Nov 16, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Nov 17, 2024

facebook-github-bot commented Nov 17, 2024

facebook-github-bot commented Nov 18, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 20, 2024

facebook-github-bot commented Nov 21, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhli1142015 commented Nov 21, 2024 • edited Loading

tanjialiang commented Nov 21, 2024

ot successful

facebook-github-bot commented Nov 21, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

facebook-github-bot commented Nov 22, 2024

conbench-facebook bot commented Nov 23, 2024

FelixYBW commented Dec 4, 2024

tanjialiang commented Nov 16, 2024 •

edited

Loading

netlify bot commented Nov 16, 2024 •

edited

Loading

zhli1142015 commented Nov 21, 2024 •

edited

Loading