SME2/SVE/NEON heuristic - ArmNN#820
Open
damdoo01-arm wants to merge 4 commits into
Open
Conversation
Signed-off-by: Damien Dooley <damien.dooley@arm.com>
| @@ -1,5 +1,5 @@ | |||
| // | |||
| // Copyright © 2017-2024 Arm Ltd and Contributors. All rights reserved. | |||
| @@ -1,10 +1,12 @@ | |||
| // | |||
| // Copyright © 2020 Arm Ltd and Contributors. All rights reserved. | |||
| // Copyright © 2026 Arm Ltd and Contributors. All rights reserved. | |||
| @@ -1,5 +1,5 @@ | |||
| // | |||
| // Copyright © 2020 Arm Ltd and Contributors. All rights reserved. | |||
| // Copyright © 2026 Arm Ltd and Contributors. All rights reserved. | |||
ce27c85 to
c93a29c
Compare
c93a29c to
a667499
Compare
damdoo01-arm
added a commit
to damdoo01-arm/ComputeLibrary
that referenced
this pull request
Jun 16, 2026
Expose runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474
damdoo01-arm
added a commit
to damdoo01-arm/ComputeLibrary
that referenced
this pull request
Jun 18, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474
damdoo01-arm
added a commit
to damdoo01-arm/ComputeLibrary
that referenced
this pull request
Jun 19, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474
gunes-arm
pushed a commit
to ARM-software/ComputeLibrary
that referenced
this pull request
Jun 22, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title:
Add CpuAcc SME/SVE shape policy for Geekbench AI workloads
Description:
This PR adds a graph-level CpuAcc policy that controls whether SME/SME2 and SVE/SVE2 implementations are exposed to ACL for a given optimized ArmNN graph. - Note associated PR at ARM-software/ComputeLibrary#1294
Problem statement:
On SME2-capable client devices, some Geekbench AI workloads showed inferior scores when SME2 kernels were selected under high thread count. The main regression was seen in quantized INT8 models with awkward GEMM shapes, where the cost of SME2 packing and contention under 8-thread execution outweighed the matmul benefit. This is due a hardware/resource pressure issue that is particularly acute in situations where a single cme unit is included: SME2 can improve some shapes, but high thread count plus unfriendly Conv2D/GEMM decompositions can regress overall benchmark score.
Previous scores, note in particular the regression in Qunatized Score due to quantization overhead on the SME2 core.
Device SP HP Q
non-SME2 2655 2658 4305
SME2 2750 3991 3690
High-level approach:
The policy scans the optimized ArmNN graph and records datatype and GEMM-like shape features from Convolution2d, FullyConnected, BatchMatMul, and DepthwiseConvolution2d.
It then applies a conservative heuristic:
FP16 graphs: hide SME/SME2.
Quantized graphs: hide SME/SME2 for the known regressing shape classes, while keeping SVE/SVE2 available.
FP32 graphs: hide SME/SME2 only for detected regression-risk spatial/dense graph patterns.
Quantized graphs that keep SME enabled may have CpuAcc thread count capped for specific shape classes.
The heuristic does not rewrite the graph or force a specific kernel. It emits CpuAcc ModelOptions:
SmeEnabled = true/false
SveEnabled = true/false
NumberOfThreads = optional override
Those options are later consumed by the Neon backend model context and passed into ACL CPU feature masking.
Latest representative results:
Datatype-isolated latest run
Mode SP HP Q
NEON 3032 5324 7117
SME2 3458 5193 7072
The latest run shows the quantized path recovering from the previous S26 CME result of 3690 to roughly 7072, bringing it close to the non-SME/QMX-class quantized results while preserving strong SP/HP performance.