Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/x86/adler32: add an AVX-512 implementation #342

Merged
merged 4 commits into from
Feb 24, 2024
Merged

lib/x86/adler32: add an AVX-512 implementation #342

merged 4 commits into from
Feb 24, 2024

Commits on Feb 24, 2024

  1. Configuration menu
    Copy the full SHA
    2929aea View commit details
    Browse the repository at this point in the history
  2. lib/x86: disambiguate 512-bit vector from AVX-512F

    crc32_x86_vpclmulqdq_avx512vl and crc32_x86_vpclmulqdq_avx512f_avx512vl
    actually use the same CPU features, considering that vpternlog always
    requires at least avx512f, and compilers consider avx512vl to imply
    avx512f.  Rename them to *_avx512_vl256 and *_avx512_vl512 to reflect
    that they differ only in vector length, and fix the CPU feature checking
    to use a separate flag for whether 512-bit vectors are enabled.
    ebiggers committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    ea028f1 View commit details
    Browse the repository at this point in the history
  3. lib/x86: fix XCR0 check for AVX-512VL

    According to the Intel manual, the ZMM_Hi256 bit needs to be checked for
    all AVX-512 instructions, even if 512-bit vectors aren't being used.
    ebiggers committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    5b217a1 View commit details
    Browse the repository at this point in the history
  4. lib/x86/adler32: add an AVX-512 implementation

    libdeflate used to (before commit 416bac3) have an AVX512BW
    implementation of Adler-32, but I removed it due to AVX-512's
    downclocking issues.  Since then, newer Intel and AMD CPUs have come out
    with better AVX-512 implementations, and these CPUs tend to have
    AVX512VNNI which includes a dot product instruction which is useful for
    Adler-32.  Therefore, add an AVX512VNNI/AVX512BW implementation.
    ebiggers committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    a026a04 View commit details
    Browse the repository at this point in the history