-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM CPU feature cleanups #355
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
ebiggers
commented
Mar 17, 2024
- checksum_benchmarks.sh: handle adler32_arm_neon_dotprod()
- lib/arm: move selection of pmull_wide into arm_cpu_features
- lib/arm: drop the arm32 support for pmull and crc32 instructions
- lib/arm: simplify by not trying to skip target attributes
- lib/arm: fix arm64 builds with -march=armv8-a+nosimd
- lib/arm: centralize the intrinsic header inclusions
- lib/arm: simplify conditions for detecting intrinsics
- lib/arm: use asm fallback when clang intrinsics unusable
- lib/arm: remove unnecessary NATIVE macros
Handle the selection of crc32_arm_pmullx12_crc using a CPU feature flag, similar to X86_CPU_FEATURE_ZMM. This allows the code to be tested on platforms other than macOS.
Drop support for the pmull and crc32 optimized CRC-32 functions when building for 32-bit ARM. Not many people care about 32-bit ARM these days, and these optimizations were always a struggle to keep working on 32-bit due to compiler issues. They also only ever applied to processors that support 64-bit too.
As was done in lib/x86/, use the target function attribute even if the features are available natively, as this has no known downside. Exception: this cannot be done for plain simd (NEON), since old versions of clang don't accept the target attribute for it.
With MSVC it's necessary to assume that arm64 means NEON is available, but this logic should not be applied generally because gcc and recent versions of clang support arm64 without NEON.
Include all needed intrinsic headers from lib/arm/cpu_features.h so that includes don't need to be scattered in other places.
- Don't check *_NATIVE or HAVE_DYNAMIC_ARM_CPU_FEATURES, since technically these are orthognal to intrinsic support. It's true that when building for an operating system that doesn't have runtime CPU feature detection enabled, there is no use in using intrinsics except when the features are supported natively. But we can still build the code; it just won't be called and will be optimized out as unused. - Don't place conditions like defined(ARCH_ARM64) and !defined(_MSC_VER) on HAVE_SHA3_NATIVE and HAVE_DOTPROD_NATIVE. These conditions are only relevant to intrinsics, not the CPU feature per se.
Instead of manually defining macros like __ARM_FEATURE_CRC32 to get the intrinsic headers of clang 15 and earlier to work, just use inline assembly. This should be a better solution as it does not rely on clang implementation details as much. We already used an inline assembly fallback for veor3q_u8 with gcc 8, and with clang 7 through 12. This commit extends the same pattern to the crc32 and dotprod intrinsics, and extends the version range to clang 15. It also drops gcc 8 from the veor3q_u8 fallback, as that is just a single major version and not worth enabling the fallback for.
Since most of the uses of the HAVE_*_NATIVE macros have been removed, and most of them provide no additional value over the original compiler-provided macro like __ARM_FEATURE_CRC32 anyway, there's not much point in having them anymore. Remove them, except for HAVE_NEON_NATIVE which is still worthwhile to have.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.