Skip to content

Commit

Permalink
Fix parallel build overflow for last bucket
Browse files Browse the repository at this point in the history
A parallel build inserts one range of buckets per thread. If an insert does not fit in the last bucket in the range, it is added to overflows. The overflows are inserted sequentially at the end of the build.
When inserting overflows, htere are no partition bounds and as long as there is at least one free slot the insert cannot fail.

However, when inserting the overflows, the upper bound of the
partition must be -1 to indicate no bounds. If it is sizeMask + 1 and
the last bucket is full, the insert cannot wrap around to the first
bucket like it should.
  • Loading branch information
Orri Erling committed Dec 7, 2023
1 parent 13d54c4 commit 1195ef5
Showing 1 changed file with 4 additions and 6 deletions.
10 changes: 4 additions & 6 deletions velox/exec/HashTable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -930,12 +930,7 @@ void HashTable<ignoreNullKeys>::parallelJoinBuild() {
false,
hashes);
insertForJoin(
overflows.data(),
hashes.data(),
overflows.size(),
0,
sizeMask_ + 1,
nullptr);
overflows.data(), hashes.data(), overflows.size(), 0, -1, nullptr);
auto table = i == 0 ? this : otherTables_[i - 1].get();
VELOX_CHECK_EQ(table->rows()->numRows(), table->numParallelBuildRows_);
}
Expand Down Expand Up @@ -1113,6 +1108,9 @@ FOLLY_ALWAYS_INLINE void HashTable<ignoreNullKeys>::buildFullProbe(
PartitionBoundIndexType partitionBegin,
PartitionBoundIndexType partitionEnd,
std::vector<char*>* overflows) {
VELOX_DCHECK(
partitionEnd >= 0 ? overflows == nullptr : overflows != nullptr,
"if partition bounds are given, overflows must also be given.");
auto insertFn = [&](int32_t /*row*/, PartitionBoundIndexType index) {
if (index < partitionBegin || index >= partitionEnd) {
overflows->push_back(inserted);
Expand Down

0 comments on commit 1195ef5

Please sign in to comment.