Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for loongarch compilers #1263

Merged
merged 2 commits into from
Jan 14, 2025
Merged

Fixes for loongarch compilers #1263

merged 2 commits into from
Jan 14, 2025

Conversation

jinboson
Copy link
Contributor

Happy New Year, Michael @mr-c.

We rewrited the PR #1247 (can be closed now) and seperated it into two patches. We did two things on them.

1)The first thing is to fix a typo for lasx implemention in x86 avx2 that modified simde_mm256_bslli_epi128 to simde_mm256_bsrli_epi128, very sorry for this typo error.
2)The second thing is to work around compiler errors in use-cases that users would pass variables for shift count for slli/srli/srai instrutions, which can not support for loongarch at present.

We hope you could take some time to review this for us, if you have any questions please feel free to let us know, thank you Michael, thank you.

simde/x86/sse2.h Outdated
#define simde_mm_slli_epi16(a, imm8) ((imm8 & ~15) ? simde_mm_setzero_si128() : simde__m128i_from_lsx_i64(__lsx_vslli_h(simde__m128i_to_private(a).lsx_i64, ((imm8) & 15))))
#define simde_mm_slli_epi16(a, imm8) ((imm8 & ~15) ? simde_mm_setzero_si128() : simde__m128i_from_lsx_i64(__lsx_vsll_h(simde__m128i_to_private(a).lsx_i64, __lsx_vreplgr2vr_h((imm8) & 15))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this? For simde_mm_slli_epi16 we require imm8 to be a constant value.

See 1715c16#diff-d5773576112e65731ce8f01f5beee4f665e6fcffd1017ee76c583fa0988a40b4R6554-R6555

Same for simde_mm_slli_epi32, simde_mm_slli_epi64, simde_mm_srli_epi16, simde_mm_srli_epi32 and simde_mm_srli_epi64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this?

That's because we found that end-users would use _mm_slli_epi16 like below :

int shift;
computed the shift in some time...
then use the shfit to do : __mm_slli_epi16(src_vec, shift)

At compile time the shift was unknow, in this case loongarch compilers would emit errors in the build process. Actually, we really found the strange use-cases in google's AV1 Codec Library aom project. So in order to avoid build failure on loongarch we made these modifications. What do you think for these use-cases, Michael ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wacky. Sure, it doesn't bother me either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <stdio.h>
#include "/data/home/Jin/data/community/simde-loong64/simde/x86/sse2.h"

int main()
{
        simde__m128i vec0;
        const int imm8 = 1;     //emulate use-cases that end-users pass variable for shift count.
        simde_mm_slli_epi16(vec0, imm8);
	return 0;
}

I built above demo on real loongarch hardware with simde git master, and the compiler output errors:
image
Same for simde_mm_slli_epi32, simde_mm_slli_epi64, simde_mm_srli_epi16, simde_mm_srli_epi32 and simde_mm_srli_epi64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <stdio.h>
#include "/data/home/Jin/data/community/simde-loong64/simde/x86/sse2.h"

int main()
{
        simde__m128i vec0;
        //const int imm8 = 1;     //emulate use-cases that end-users pass variable for shift count.
        simde_mm_slli_epi16(vec0, 1);
	return 0;
}

If user pass immediate values for shift count just like above demo, it's ok for loongarch compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, at present, we need to load imm8 into vectors and then use sll version instruction instead. If in the feature, the loongarch compiler can accept a variable for shift count for slli instruction then we can solve this dilemma.

Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the LASX_NATIVE versions don't require the use of immediate values, then please move those optimized implementaions into the function definitions instead of using macros.

@mr-c
Copy link
Collaborator

mr-c commented Jan 13, 2025

@jinboson if SIMDe is important to the LOONGARCH ecosystem, it would be helpful if I had some assistance with #1177 and similar ; I'm the only maintainer these days..

@jinboson
Copy link
Contributor Author

@jinboson if SIMDe is important to the LOONGARCH ecosystem, it would be helpful if I had some assistance with #1177 and similar ; I'm the only maintainer these days..

Understood, thank you @mr-c , I have posted the test result based on the real loongarch hardware and made explanations for you. Thanks for you reply!

@jinboson
Copy link
Contributor Author

jinboson commented Jan 14, 2025

if the LASX_NATIVE versions don't require the use of immediate values, then please move those optimized implementaions into the function definitions instead of using macros.

It depends on how users use them. We do have slli version instructions that require the use of immediate values, but users must pass an immediate value rather than a variable value for shift count when using them. Now the problem we are taking about is that those slli version instructions that require the use of immediate values are passed variable values for shift count in some strange use-cases, so we need workround for those implementations using macros.

At present, the instructions like slli/srli on loongarch only
accept immediate constant values for shit count, so in order
to avoid compiler errors in the build process, we need to use
sll/srl version instead. In the feature, if the compiler
supports the formers, we can restore them.
1. fix typo:
  Modify simde_mm256_bslli_epi128 to simde_mm256_bsrli_epi128.
2. fix loongarch compiler errors:
  At present, the instructions like slli/srli/srai on loongarch
  only accept immediate constant values for shit count, so in order to
  avoid compiler errors in the build process, we need to use
  sll/srl/sra version instead. In the feature, if the compiler
  supports the formers, we can restore them.
@jinboson
Copy link
Contributor Author

move those optimized implementaions into the function definitions instead of using macros.

Done by the latest update, thank you Michael.

@mr-c mr-c enabled auto-merge (rebase) January 14, 2025 10:03
@mr-c mr-c merged commit 1bbb5af into simd-everywhere:master Jan 14, 2025
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants