-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcc sanitizer fails on _mm_loadu_si128 #591
Comments
Hi @romange ,
According to the document,
Though using memcpy will let the vld1q_s32 conversion relies on C library, I think it can be a possible implementation. ReferencesThe disassemblies are originated from this function: // Originated from tests/impl.cpp
result_t test_mm_loadu_si128(const SSE2NEONTestImpl &impl, uint32_t iter)
{
const int32_t *_a = (const int32_t *) impl.mTestIntPointer1;
__m128i c = _mm_loadu_si128((const __m128i *) _a);
return VALIDATE_INT32_M128(c, _a);
} Armv7-A00046528 <_ZN8SSE2NEON19test_mm_loadu_si128ERKNS_16SSE2NEONTestImplEj>:
46528: b580 push {r7, lr}
4652a: b08e sub sp, #56 ; 0x38
4652c: af00 add r7, sp, #0
4652e: 6078 str r0, [r7, #4]
46530: 6039 str r1, [r7, #0]
46532: 687b ldr r3, [r7, #4]
46534: 68db ldr r3, [r3, #12]
46536: 60fb str r3, [r7, #12]
46538: 68fb ldr r3, [r7, #12]
4653a: 613b str r3, [r7, #16]
4653c: 693b ldr r3, [r7, #16]
4653e: 617b str r3, [r7, #20]
46540: 697b ldr r3, [r7, #20]
46542: f963 0a8f vld1.32 {d16-d17}, [r3]
46546: bf00 nop
46548: edc7 0b0a vstr d16, [r7, #40] ; 0x28
4654c: edc7 1b0c vstr d17, [r7, #48] ; 0x30
46550: edd7 0b0a vldr d16, [r7, #40] ; 0x28
46554: edd7 1b0c vldr d17, [r7, #48] ; 0x30
46558: bf00 nop
4655a: edc7 0b06 vstr d16, [r7, #24]
4655e: edc7 1b08 vstr d17, [r7, #32]
46562: 68fb ldr r3, [r7, #12]
46564: 6818 ldr r0, [r3, #0]
46566: 68fb ldr r3, [r7, #12]
46568: 3304 adds r3, #4
4656a: 6819 ldr r1, [r3, #0]
4656c: 68fb ldr r3, [r7, #12]
4656e: 3308 adds r3, #8
46570: 681a ldr r2, [r3, #0]
46572: 68fb ldr r3, [r7, #12]
46574: 330c adds r3, #12
46576: 681b ldr r3, [r3, #0]
46578: ed97 0b06 vldr d0, [r7, #24]
4657c: ed97 1b08 vldr d1, [r7, #32]
46580: f7d1 f8e0 bl 17744 <_ZN8SSE2NEON13validateInt32E17__simd128_int64_tiiii>
46584: 4603 mov r3, r0
46586: 4618 mov r0, r3
46588: 3738 adds r7, #56 ; 0x38
4658a: 46bd mov sp, r7
4658c: bd80 pop {r7, pc} Armv8-A0000000000423b90 <_ZN8SSE2NEON19test_mm_loadu_si128ERKNS_16SSE2NEONTestImplEj>:
423b90: a9ba7bfd stp x29, x30, [sp, #-96]!
423b94: 910003fd mov x29, sp
423b98: f9000fe0 str x0, [sp, #24]
423b9c: b90017e1 str w1, [sp, #20]
423ba0: f9400fe0 ldr x0, [sp, #24]
423ba4: f9400c00 ldr x0, [x0, #24]
423ba8: f90017e0 str x0, [sp, #40]
423bac: f94017e0 ldr x0, [sp, #40]
423bb0: f9001be0 str x0, [sp, #48]
423bb4: f9401be0 ldr x0, [sp, #48]
423bb8: f9001fe0 str x0, [sp, #56]
423bbc: f9401fe0 ldr x0, [sp, #56]
423bc0: 3dc00000 ldr q0, [x0]
423bc4: d503201f nop
423bc8: 3d8017e0 str q0, [sp, #80]
423bcc: 3dc017e0 ldr q0, [sp, #80]
423bd0: d503201f nop
423bd4: 3d8013e0 str q0, [sp, #64]
423bd8: f94017e0 ldr x0, [sp, #40]
423bdc: b9400004 ldr w4, [x0]
423be0: f94017e0 ldr x0, [sp, #40]
423be4: 91001000 add x0, x0, #0x4
423be8: b9400001 ldr w1, [x0]
423bec: f94017e0 ldr x0, [sp, #40]
423bf0: 91002000 add x0, x0, #0x8
423bf4: b9400002 ldr w2, [x0]
423bf8: f94017e0 ldr x0, [sp, #40]
423bfc: 91003000 add x0, x0, #0xc
423c00: b9400000 ldr w0, [x0]
423c04: 2a0003e3 mov w3, w0
423c08: 2a0403e0 mov w0, w4
423c0c: 3dc013e0 ldr q0, [sp, #64]
423c10: 97ff83a2 bl 404a98 <_ZN8SSE2NEON13validateInt32E11__Int64x2_tiiii>
423c14: a8c67bfd ldp x29, x30, [sp], #96
423c18: d65f03c0 ret |
I made a simple test here The implementation
The test function
The result:
If there any thing I could do to improve the test please point it out. |
Why not use |
Hi @aqrit, Based on my experiment, the Test CodeFORCE_INLINE __m128i old_mm_loadu_si128(const __m128i *p)
{
return vreinterpretq_m128i_s32(vld1q_s32((const int32_t *) p));
}
FORCE_INLINE __m128i new_mm_loadu_si128(const __m128i *p)
{
return vreinterpretq_m128i_u8(vld1q_u8((const uint8_t *) p));
} Test Function (thanks to @howjmay's code)result_t test_mm_loadu_si128(const SSE2NEONTestImpl &impl, uint32_t iter)
{
const int32_t *_a = (const int32_t *) impl.mTestIntPointer1;
const int test_times = 100000;
clock_t t;
double time_taken = 0;
t = clock();
for (int i = 0; i < test_times; i++) {
__m128i c = old_mm_loadu_si128((const __m128i *) _a+(i%8));
}
time_taken = ((double)t)/CLOCKS_PER_SEC;
printf("NEON implementation: %f\n", time_taken);
t = clock();
for (int i = 0; i < test_times; i++) {
__m128i c = new_mm_loadu_si128((const __m128i *) _a+(i%8));
}
time_taken = ((double)t)/CLOCKS_PER_SEC;
printf("NEON vld1q_u8 implementation: %f\n", time_taken);;
return TEST_FAIL;
} Test ResultsArmv8-A
Armv7-A test result
Armv8-A (32-bit)
|
should be fixed by #632 ? |
Close as confirmed |
sse2neon implements
_mm_loadu_si128
like this:which involves casting to
int32_t
. This, in turn, enforces 4-bytes alignment that the original pointer may not have. As a result, the sanitizer crashes the program in debug mode.The text was updated successfully, but these errors were encountered: