-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for loongarch compilers #1263
Conversation
simde/x86/sse2.h
Outdated
#define simde_mm_slli_epi16(a, imm8) ((imm8 & ~15) ? simde_mm_setzero_si128() : simde__m128i_from_lsx_i64(__lsx_vslli_h(simde__m128i_to_private(a).lsx_i64, ((imm8) & 15)))) | ||
#define simde_mm_slli_epi16(a, imm8) ((imm8 & ~15) ? simde_mm_setzero_si128() : simde__m128i_from_lsx_i64(__lsx_vsll_h(simde__m128i_to_private(a).lsx_i64, __lsx_vreplgr2vr_h((imm8) & 15)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing this? For simde_mm_slli_epi16
we require imm8
to be a constant value.
See 1715c16#diff-d5773576112e65731ce8f01f5beee4f665e6fcffd1017ee76c583fa0988a40b4R6554-R6555
Same for simde_mm_slli_epi32
, simde_mm_slli_epi64
, simde_mm_srli_epi16
, simde_mm_srli_epi32
and simde_mm_srli_epi64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing this?
That's because we found that end-users would use _mm_slli_epi16 like below :
int shift;
computed the shift in some time...
then use the shfit to do : __mm_slli_epi16(src_vec, shift)
At compile time the shift was unknow, in this case loongarch compilers would emit errors in the build process. Actually, we really found the strange use-cases in google's AV1 Codec Library aom project. So in order to avoid build failure on loongarch we made these modifications. What do you think for these use-cases, Michael ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wacky. Sure, it doesn't bother me either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <stdio.h>
#include "/data/home/Jin/data/community/simde-loong64/simde/x86/sse2.h"
int main()
{
simde__m128i vec0;
const int imm8 = 1; //emulate use-cases that end-users pass variable for shift count.
simde_mm_slli_epi16(vec0, imm8);
return 0;
}
I built above demo on real loongarch hardware with simde git master, and the compiler output errors:
Same for simde_mm_slli_epi32, simde_mm_slli_epi64, simde_mm_srli_epi16, simde_mm_srli_epi32 and simde_mm_srli_epi64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <stdio.h>
#include "/data/home/Jin/data/community/simde-loong64/simde/x86/sse2.h"
int main()
{
simde__m128i vec0;
//const int imm8 = 1; //emulate use-cases that end-users pass variable for shift count.
simde_mm_slli_epi16(vec0, 1);
return 0;
}
If user pass immediate values for shift count just like above demo, it's ok for loongarch compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, at present, we need to load imm8 into vectors and then use sll version instruction instead. If in the feature, the loongarch compiler can accept a variable for shift count for slli instruction then we can solve this dilemma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the LASX_NATIVE
versions don't require the use of immediate values, then please move those optimized implementaions into the function definitions instead of using macros.
Understood, thank you @mr-c , I have posted the test result based on the real loongarch hardware and made explanations for you. Thanks for you reply! |
It depends on how users use them. We do have slli version instructions that require the use of immediate values, but users must pass an immediate value rather than a variable value for shift count when using them. Now the problem we are taking about is that those slli version instructions that require the use of immediate values are passed variable values for shift count in some strange use-cases, so we need workround for those implementations using macros. |
At present, the instructions like slli/srli on loongarch only accept immediate constant values for shit count, so in order to avoid compiler errors in the build process, we need to use sll/srl version instead. In the feature, if the compiler supports the formers, we can restore them.
1. fix typo: Modify simde_mm256_bslli_epi128 to simde_mm256_bsrli_epi128. 2. fix loongarch compiler errors: At present, the instructions like slli/srli/srai on loongarch only accept immediate constant values for shit count, so in order to avoid compiler errors in the build process, we need to use sll/srl/sra version instead. In the feature, if the compiler supports the formers, we can restore them.
Done by the latest update, thank you Michael. |
Happy New Year, Michael @mr-c.
We rewrited the PR #1247 (can be closed now) and seperated it into two patches. We did two things on them.
1)The first thing is to fix a typo for lasx implemention in x86 avx2 that modified simde_mm256_bslli_epi128 to simde_mm256_bsrli_epi128, very sorry for this typo error.
2)The second thing is to work around compiler errors in use-cases that users would pass variables for shift count for slli/srli/srai instrutions, which can not support for loongarch at present.
We hope you could take some time to review this for us, if you have any questions please feel free to let us know, thank you Michael, thank you.