Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimdUtilTest.gatherBits does not work for arm64(M1) #8377

Closed
icejoywoo opened this issue Jan 14, 2024 · 2 comments
Closed

SimdUtilTest.gatherBits does not work for arm64(M1) #8377

icejoywoo opened this issue Jan 14, 2024 · 2 comments
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@icejoywoo
Copy link
Contributor

icejoywoo commented Jan 14, 2024

Bug description

UT SimdUtilTest.gatherBits(added in #7726) is not passed in Mac M1 arm64, but can pass in Mac Intel. This also impact the PrestoVectorSerde, because only PrestoVectorSerializer use this function to handle nulls(see #7883).

I dig into this issue, and I tested on my 2 Macbook with intel chip and M1. Then I found that simd::gatherBits only works for xsimd::batch<int32_t>::size is 8 in Intel avx2, but it does NOT work for xsimd::batch<int32_t>::size is 4 in ARM neon.

I think the current implementation of simd::gatherBits is not correct for arm64 M1.

In #1505, function nonNullRowsFromSparse in DecoderUtil.cpp handle the different situtation with xsimd::batch<int32_t>::size = 4 or 8.

The code snippet in DecoderUtil.cpp follows:

constexpr int32_t kStep = xsimd::batch<int32_t>::size;
// ...
uint16_t flags = simd::gather8Bits(nulls, next8Rows, width);
if (outputNulls) {
  if constexpr (kStep == 8) {
    resultNullBytes[i / 8] = flags;
  } else {
    VELOX_DCHECK_EQ(kStep, 4);
    if (i % 8 == 0) {
      resultNullBytes[i / 8] = flags;
    } else {
      resultNullBytes[i / 8] |= flags << 4;
    }
  }
// ...
}

According to this snippet, I think this can fix this issue too.

System information

Velox System Info v0.0.2
Commit: ca8a0413aba800be0da7a0ece3d9c2ea26d6b7bf
CMake Version: 3.27.3
System: Darwin-22.5.0
Arch: arm64
C++ Compiler: /Library/Developer/CommandLineTools/usr/bin/c++
C++ Compiler Version: 14.0.3.14030022
C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc
C Compiler Version: 14.0.3.14030022
CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/opt/homebrew/Cellar/cmake/3.27.3;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local

Relevant logs

No response

@icejoywoo icejoywoo added bug Something isn't working triage Newly created issue that needs attention. labels Jan 14, 2024
@icejoywoo
Copy link
Contributor Author

@mbasmanova Could you please check this issue? I found an issue in simd::gatherBits. I already push a fix to this.

@mbasmanova mbasmanova changed the title SimdUtilTest.gatherBits does not work for arm64(M1) SimdUtilTest.gatherBits does not work for arm64(M1) Jan 17, 2024
@mbasmanova
Copy link
Contributor

CC: @oerling @Yuhta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants