-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
__mm256_srl_epi64() returns different results on LDC when -mattr=+avx2 is on #143
Comments
Hello, your unittest seems to pass here, what's your LDC version? (EDIT: and OS?) |
1.11.20 now implements I think you got trapped by an old LDC promoting your int THAT SAID, it seems newer LDC prevent such implicit conversions. As said, I don't repro your unittest, which is odd. |
Thanks for responding and implementing _mm256_srli_epi64()! Looking at the output more, I realized while my application was being compiled with -mattr=+avx2, intel-intrinsics was not. After changing:
to
and also replacing _mm256_srl_epi64() with _mm256_srli_epi64(), my unittest now passes; though it does not pass when intel-intrinsics is not built with -mattr=+avx2 This is reproducible by me on debian 12 bookworm with both the repo version of LDC v1.30.0 and a freshly compiled v1.39.0 |
I still can't repro on Windows or godbolt, I'm going to left it here. Probably need your LLVM version with |
Sorry for taking so long to reply. Best I can tell is that the fallback implementation of _mm256_srli_epi64() (when -mattr=+avx2 is not passed in) is incorrect:
|
Your compiler is a LDC 1.30 with LLVM 14.0.6 I can only get a LDC 1.30 with LLVM 14.0.3 How do you explain that you have a LDC based upon LLVM 14.0.6? When the official Linux x86_64 build here of LDC 1.30 is LLVM 14.0.3, see here https://github.com/ldc-developers/ldc/releases/tag/v1.30.0 No luck reproducing that either in Linux or Windows.
|
I am using the version of LDC shipped by debian 12. I will try to reproduce later tonight with the upstream LDC 1.40 release |
Here is me reproducing the issue with LDC 1.40:
|
verbose dub's output with ldc 1.40:
|
Reproduced something, thanks! Sounds related to what you said indeed.
import std.stdio;
void main()
{
import inteli.avx2intrin;
import std.stdio;
long4 start = [0xffff_ffff_0000_0000,
0xffff_ffff_0000_0000,
0xffff_ffff_0000_0000,
0xffff_ffff_0000_0000];
long4 res = _mm256_srli_epi64(start, 32);
long4 expected = [0x0000_0000_ffff_ffff,
0x0000_0000_ffff_ffff,
0x0000_0000_ffff_ffff,
0x0000_0000_ffff_ffff];
writefln!("start:\n%(0x%08x %)")(start.array);
writefln!"after shift of 32 bits\n%(0x%08x %)"(res.array);
assert(res.array == expected.array);
} To reproduce: LDC 1.24 or LDC 1.28 +
LDC 1.24 or 1.28 + After LDC 1.30 the result changes :) LDC 1.30 or LDC 1.36 or LDC 1.40 +
LDC 1.30 or 1.36 or LDC 1.40 + Workarounds:
Reproduced with: |
is a codegen issue, need reduced test case but that one necessitate two dub package it seems |
The following code fails on my machine when compiling (ldc) with -mattr=+avx2
When NOT building with -mattr=+avx2:
When building with -mattr=+avx2:
CPU information:
If _mm256_srli_epi64() was implemented I would just use that instead :)
The text was updated successfully, but these errors were encountered: