Skip to content

Conversation

@iha-taisei
Copy link
Contributor

Closes #5514
This pull request addresses issue #5514 by implementing loop unrolling of [SD]GER kernels.
This improves DGER single-thread performance by 1.3x on A64FX and 1.2x on Neoverse V1.

A64FX:
image

Neoverse V1:
image

@martin-frbg martin-frbg added this to the 0.3.31 milestone Oct 24, 2025
@martin-frbg martin-frbg merged commit 585e6d0 into OpenMathLib:develop Oct 24, 2025
78 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve [SD]GER performance on A64FX and Neoverse V1

2 participants