Skip to content

Improve single-thread performance of [SD]GER on A64FX and Neoverse V1#5515

Merged
martin-frbg merged 1 commit into
OpenMathLib:developfrom
iha-taisei:feature/ger_unroll
Oct 24, 2025
Merged

Improve single-thread performance of [SD]GER on A64FX and Neoverse V1#5515
martin-frbg merged 1 commit into
OpenMathLib:developfrom
iha-taisei:feature/ger_unroll

Conversation

@iha-taisei
Copy link
Copy Markdown
Contributor

Closes #5514
This pull request addresses issue #5514 by implementing loop unrolling of [SD]GER kernels.
This improves DGER single-thread performance by 1.3x on A64FX and 1.2x on Neoverse V1.

A64FX:
image

Neoverse V1:
image

@martin-frbg martin-frbg added this to the 0.3.31 milestone Oct 24, 2025
@martin-frbg martin-frbg merged commit 585e6d0 into OpenMathLib:develop Oct 24, 2025
78 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve [SD]GER performance on A64FX and Neoverse V1

2 participants