Skip to content

Commit

Permalink
Fix simd::gatherBits for Mac M1 or when AVX2 is disabled (#8415)
Browse files Browse the repository at this point in the history
Summary:
Fix #8377

When avx2 is disabled or run on Mac M1(arm64), simd::gatherBits works incorrectly.

This fix comes from DecoderUtil::nonNullRowsFromSparse.

Pull Request resolved: #8415

Reviewed By: xiaoxmeng

Differential Revision: D52873543

Pulled By: Yuhta

fbshipit-source-id: ce8cbeb2069a809410b7a259e05285be2e1a70b5
  • Loading branch information
icejoywoo authored and facebook-github-bot committed Jan 18, 2024
1 parent e09a927 commit 44cdc9e
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions velox/common/base/SimdUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ void gatherBits(
const uint64_t* bits,
folly::Range<const int32_t*> indexRange,
uint64_t* result) {
constexpr int32_t kStep = xsimd::batch<int32_t>::size;
const auto size = indexRange.size();
auto indices = indexRange.data();
uint8_t* resultPtr = reinterpret_cast<uint8_t*>(result);
Expand All @@ -37,14 +38,16 @@ void gatherBits(
}

int32_t i = 0;
for (; i + 8 < size; i += 8) {
*(resultPtr++) =
simd::gather8Bits(bits, xsimd::load_unaligned(indices + i), 8);
for (; i + kStep < size; i += kStep) {
uint16_t flags =
simd::gather8Bits(bits, xsimd::load_unaligned(indices + i), kStep);
bits::storeBitsToByte<kStep>(flags, resultPtr, i);
}
const auto bitsLeft = size - i;
if (bitsLeft > 0) {
*resultPtr =
uint16_t flags =
simd::gather8Bits(bits, xsimd::load_unaligned(indices + i), bitsLeft);
bits::storeBitsToByte<kStep>(flags, resultPtr, i);
}
}

Expand Down

0 comments on commit 44cdc9e

Please sign in to comment.