Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable longlong.h #1633

Closed

Conversation

albinahlback
Copy link
Collaborator

@albinahlback albinahlback commented Nov 19, 2023

Just like with gmpcompat.h, it is now configurable. Unlike before, it now contains intrinsics for Windows. The longlong-generic_gnu.h.in also contains builtin functions from GNU which should lead to a speed-up, which is going to be used if compiler is GCC compatible and processor is not x86 or ARM.

Also removed unused flint_popcount. This was also set wrong during compilation, so I think it is for the better.

Also removed some FLINT_XXX macros I had introduced half a year ago, which I see was totally unnecessary now as they are still unused.

Also do not check for popcount, clz or ctz, or __builtin_constant_p during configuration process. For the former three, they would all be defined in longlong.h, for which the header is checked during the configuration process. For __builtin_constant_p, we assume that if you have defined __GNUC__ that it will be available.

@albinahlback albinahlback force-pushed the configure_more_in_flint_h branch 4 times, most recently from 6d109c0 to 916fcba Compare November 19, 2023 20:51
@albinahlback
Copy link
Collaborator Author

I can say that intrinsics/inline assembly was never used for 64-bit ARM processors. GCC or whatever never defines __arm64__.

@fredrik-johansson
Copy link
Collaborator

I have a small concern about code duplication here, where some of the generic functions are being copied and pasted in each version of this file instead of being included from a single source.

I think we will also think about how to handle platform-dependent code throughout FLINT. We shouldn't have to hardcode special rules in the build system for each file (longlong.h, fft_small, machine_vectors.h, crt_helpers.h, gmpcompat.h...).

Will try to do a more thorough review when I have some free time...

Namely
- arm, GNUC
- arm64, GNUC
- x86, GNUC
- x86_64, GNUC
- generic, GNUC
- x86(_64), Windows
- arm(64), Windows
- generic

Moreover, remove flint_popcount as it was never used and wasn't
configured correctly.
flint_clz and flint_ctz will be set during configuration based on which
longlong file to use. __builtin_constant_p will be defined only if
__GNUC__ is defined.
@albinahlback
Copy link
Collaborator Author

I'd be open to discuss this.

@albinahlback
Copy link
Collaborator Author

I think we can remove all architecture specific assembly instructions for GCC compilers. Looks like GCC will generate the fastest possible code with generics instead of inline assembly. Moreover, if I recall correctly, the compiler is not as good with optimizing the surrounding code when inline assembly is called.

@albinahlback
Copy link
Collaborator Author

albinahlback commented Nov 21, 2023

I think we can remove all architecture specific assembly instructions for GCC compilers. Looks like GCC will generate the fastest possible code with generics instead of inline assembly. Moreover, if I recall correctly, the compiler is not as good with optimizing the surrounding code when inline assembly is called.

Okay, this may not be true with add_ssaaaa. I want to be able to generate the fastest possible code with generics. I'll try to investigate this.

Edit: It is only add_sss{n}aaaa{2n} that are problematic. For add_ssaaa,

    unsigned __int128 a, b, r;
    a = ((unsigned __int128) a0) + (((unsigned __int128) a1) << 64);
    b = ((unsigned __int128) b0) + (((unsigned __int128) b1) << 64);
    r = a + b;
    s0 = (ulong) r;
    s1 = (ulong) (r >> 64);

is equivalent to

    add_ssaaaa(s1, s0, a1, a0, b1, b0);

@albinahlback albinahlback deleted the configure_more_in_flint_h branch January 23, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants