Architecture-specific dispatching #6

Wunkolo · 2022-09-26T01:20:01Z

Implements patterns and abstractions to dispatch architecture-specific accelerations.

This also adds an ARM64 accelerated CRC32 implementation

CRC and qCheck are different compilation targets now. CRC32 is still header-only at the moment but this sets it up to have a more private implementation

Removes the template-based function generation.

Staticaly generates the CRC tables for each of the polynomials rather than generating them at run-time.

Use a function rather than a lambda. Use specific memory-ordering for atomic variables.

``` ... In Armv8-A, this is an OPTIONAL instruction, and in Armv8.1 it is mandatory for all implementations to implement it. ... ``` https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/CRC32--CRC32- Uses the `crc32{b,h,w,d}` and `crc32c{b,h,w,d}`instructions. Dispatch is determined at compile-time using the `__ARM_FEATURE_CRC32` preprocessor. Tested on an M2 Mac Mini, it's not all that faster, might be something I'm doing wrong? Before: ``` ./qCheck-generic -c 26.95s user 3.82s system 104% cpu 29.345 total ``` After: ``` ./qCheck-crc32 -c 26.85s user 3.69s system 104% cpu 29.204 total ```

Re-implements the x64 version of the carryless-multiply acceleration to fold 64 bytes of data at a time. Passes unit tests.

When multiplying the two high arguments together, `vmull_high_p64` can be used instead of extracting the lanes manually.

Let this be a user-provided variable since we can't always assume everyone has x64 and arm64 library files at the same time.

Fixes `std:max` compilation error

Wunkolo added the enhancement New feature or request label Sep 26, 2022

Wunkolo self-assigned this Sep 26, 2022

Wunkolo force-pushed the arch-dispatch branch 3 times, most recently from 7240c28 to dd1d62f Compare October 5, 2022 02:30

Wunkolo force-pushed the arch-dispatch branch from dd1d62f to 3c21db2 Compare February 2, 2023 06:32

Wunkolo force-pushed the arch-dispatch branch 2 times, most recently from c1a9b8f to f0c36c5 Compare March 7, 2023 07:19

Wunkolo added 4 commits March 6, 2023 23:22

Separate translation units for CRC and qCheck

21ce23b

CRC and qCheck are different compilation targets now. CRC32 is still header-only at the moment but this sets it up to have a more private implementation

Migrate CRC32 implementation to private implementation

e3f6812

Removes the template-based function generation.

Fix compile-time CRC table generation

a05a5dd

Staticaly generates the CRC tables for each of the polynomials rather than generating them at run-time.

Add ARM+Intel Universal-Binary support for MacOS

a6a075d

Wunkolo force-pushed the arch-dispatch branch from f0c36c5 to a6a075d Compare March 7, 2023 07:22

Wunkolo added 3 commits March 7, 2023 10:52

Update checker threading

fbebd34

Use a function rather than a lambda. Use specific memory-ordering for atomic variables.

Separate entry points to CheckSFV/GenerateSFV

3ebde49

Wunkolo force-pushed the arch-dispatch branch from 01f3674 to 08b2b89 Compare March 7, 2023 21:19

Wunkolo added 3 commits July 18, 2024 16:00

Add ARMv8-based carryless-multiply acceleration

7def524

Re-implements the x64 version of the carryless-multiply acceleration to fold 64 bytes of data at a time. Passes unit tests.

Add special-case pmull_p64

d9f2a16

When multiplying the two high arguments together, `vmull_high_p64` can be used instead of extracting the lanes manually.

Disable multi-arch builds by default

0b4881f

Let this be a user-provided variable since we can't always assume everyone has x64 and arm64 library files at the same time.

Wunkolo marked this pull request as ready for review July 18, 2024 23:56

Wunkolo added 3 commits September 11, 2024 09:34

Separate CRC32 into x64 and a64 translation units

fcb7f6a

Add missing algorithm header

5a97b8f

Fixes `std:max` compilation error

Add benchmarks to CRC32 test suite

b505fc1

Wunkolo changed the base branch from main to dev September 11, 2024 18:01

Wunkolo merged commit a548751 into dev Sep 11, 2024
2 checks passed

Wunkolo deleted the arch-dispatch branch September 11, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture-specific dispatching #6

Architecture-specific dispatching #6

Wunkolo commented Sep 26, 2022 •

edited

Loading

Architecture-specific dispatching #6

Architecture-specific dispatching #6

Conversation

Wunkolo commented Sep 26, 2022 • edited Loading

Wunkolo commented Sep 26, 2022 •

edited

Loading