SME2/SVE/NEON heuristic - ArmNN by damdoo01-arm · Pull Request #820 · ARM-software/armnn

damdoo01-arm · 2026-06-12T11:35:14Z

Title:
Add CpuAcc SME/SVE shape policy for Geekbench AI workloads

Description:

This PR adds a graph-level CpuAcc policy that controls whether SME/SME2 and SVE/SVE2 implementations are exposed to ACL for a given optimized ArmNN graph. - Note associated PR at ARM-software/ComputeLibrary#1294

Problem statement:

On SME2-capable client devices, some Geekbench AI workloads showed inferior scores when SME2 kernels were selected under high thread count. The main regression was seen in quantized INT8 models with awkward GEMM shapes, where the cost of SME2 packing and contention under 8-thread execution outweighed the matmul benefit. This is due a hardware/resource pressure issue that is particularly acute in situations where a single cme unit is included: SME2 can improve some shapes, but high thread count plus unfriendly Conv2D/GEMM decompositions can regress overall benchmark score.

Previous scores, note in particular the regression in Qunatized Score due to quantization overhead on the SME2 core.
Device SP HP Q
non-SME2 2655 2658 4305
SME2 2750 3991 3690

High-level approach:

The policy scans the optimized ArmNN graph and records datatype and GEMM-like shape features from Convolution2d, FullyConnected, BatchMatMul, and DepthwiseConvolution2d.

It then applies a conservative heuristic:

FP16 graphs: hide SME/SME2.
Quantized graphs: hide SME/SME2 for the known regressing shape classes, while keeping SVE/SVE2 available.
FP32 graphs: hide SME/SME2 only for detected regression-risk spatial/dense graph patterns.
Quantized graphs that keep SME enabled may have CpuAcc thread count capped for specific shape classes.
The heuristic does not rewrite the graph or force a specific kernel. It emits CpuAcc ModelOptions:

SmeEnabled = true/false
SveEnabled = true/false
NumberOfThreads = optional override
Those options are later consumed by the Neon backend model context and passed into ACL CPU feature masking.

Latest representative results:

Datatype-isolated latest run

Mode SP HP Q
NEON 3032 5324 7117
SME2 3458 5193 7072

The latest run shows the quantized path recovering from the previous S26 CME result of 3690 to roughly 7072, bringing it close to the non-SME/QMX-class quantized results while preserving strong SP/HP performance.

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

CianMcGriskinARM · 2026-06-12T11:46:45Z

@@ -1,5 +1,5 @@
 //
-// Copyright © 2017-2024 Arm Ltd and Contributors. All rights reserved.


2017-2024, 2026

CianMcGriskinARM · 2026-06-12T11:48:01Z

@@ -1,10 +1,12 @@
 //
-// Copyright © 2020 Arm Ltd and Contributors. All rights reserved.
+// Copyright © 2026 Arm Ltd and Contributors. All rights reserved.


CianMcGriskinARM · 2026-06-12T12:09:29Z

@@ -1,5 +1,5 @@
 //
-// Copyright © 2020 Arm Ltd and Contributors. All rights reserved.
+// Copyright © 2026 Arm Ltd and Contributors. All rights reserved.


Expose runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474

Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied. Full context in the ArmNN PR: ARM-software/armnn#820 Signed-off-by: Damien Dooley <damien.dooley@arm.com> Change-Id: I602cebdd58942930d248948788bfac9e2be56474

damdoo01-arm added 3 commits June 4, 2026 16:02

Add SME2 shape heuristic

4f628a2

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

Updated heuristic for better coverage FP16 and FP32 use cases

3515aac

Separated heuristic defnition and tightened other logic

00fca53

damdoo01-arm mentioned this pull request Jun 12, 2026

SME2/SVE/NEON heuristic - ACL ARM-software/ComputeLibrary#1294

Closed

damdoo01-arm changed the title ~~Damdoo01/geekbench sme2 heuristic~~ SME2/SVE/NEON heuristic Jun 12, 2026

damdoo01-arm changed the title ~~SME2/SVE/NEON heuristic~~ SME2/SVE/NEON heuristic - ArmNN Jun 12, 2026

CianMcGriskinARM reviewed Jun 15, 2026

View reviewed changes

damdoo01-arm force-pushed the damdoo01/geekbench_sme2_heuristic branch from ce27c85 to c93a29c Compare June 15, 2026 15:35

Widened M band for heuristic

a667499

damdoo01-arm force-pushed the damdoo01/geekbench_sme2_heuristic branch from c93a29c to a667499 Compare June 15, 2026 15:38

damdoo01-arm mentioned this pull request Jun 16, 2026

feat: Add experimental SME/SVE runtime selection controls ARM-software/ComputeLibrary#1295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SME2/SVE/NEON heuristic - ArmNN#820

SME2/SVE/NEON heuristic - ArmNN#820
damdoo01-arm wants to merge 4 commits into
ARM-software:mainfrom
damdoo01-arm:damdoo01/geekbench_sme2_heuristic

damdoo01-arm commented Jun 12, 2026 •

edited

Loading

Uh oh!

CianMcGriskinARM Jun 12, 2026

Uh oh!

CianMcGriskinARM Jun 12, 2026

Uh oh!

CianMcGriskinARM Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,5 +1,5 @@
		//
		// Copyright © 2017-2024 Arm Ltd and Contributors. All rights reserved.

Conversation

damdoo01-arm commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CianMcGriskinARM Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

CianMcGriskinARM Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

CianMcGriskinARM Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

damdoo01-arm commented Jun 12, 2026 •

edited

Loading