Skip to content

SME2/SVE/NEON heuristic - ArmNN#820

Open
damdoo01-arm wants to merge 4 commits into
ARM-software:mainfrom
damdoo01-arm:damdoo01/geekbench_sme2_heuristic
Open

SME2/SVE/NEON heuristic - ArmNN#820
damdoo01-arm wants to merge 4 commits into
ARM-software:mainfrom
damdoo01-arm:damdoo01/geekbench_sme2_heuristic

Conversation

@damdoo01-arm

@damdoo01-arm damdoo01-arm commented Jun 12, 2026

Copy link
Copy Markdown

Title:
Add CpuAcc SME/SVE shape policy for Geekbench AI workloads

Description:

This PR adds a graph-level CpuAcc policy that controls whether SME/SME2 and SVE/SVE2 implementations are exposed to ACL for a given optimized ArmNN graph. - Note associated PR at ARM-software/ComputeLibrary#1294

Problem statement:

On SME2-capable client devices, some Geekbench AI workloads showed inferior scores when SME2 kernels were selected under high thread count. The main regression was seen in quantized INT8 models with awkward GEMM shapes, where the cost of SME2 packing and contention under 8-thread execution outweighed the matmul benefit. This is due a hardware/resource pressure issue that is particularly acute in situations where a single cme unit is included: SME2 can improve some shapes, but high thread count plus unfriendly Conv2D/GEMM decompositions can regress overall benchmark score.

Previous scores, note in particular the regression in Qunatized Score due to quantization overhead on the SME2 core.
Device SP HP Q
non-SME2 2655 2658 4305
SME2 2750 3991 3690

High-level approach:

The policy scans the optimized ArmNN graph and records datatype and GEMM-like shape features from Convolution2d, FullyConnected, BatchMatMul, and DepthwiseConvolution2d.

It then applies a conservative heuristic:

FP16 graphs: hide SME/SME2.
Quantized graphs: hide SME/SME2 for the known regressing shape classes, while keeping SVE/SVE2 available.
FP32 graphs: hide SME/SME2 only for detected regression-risk spatial/dense graph patterns.
Quantized graphs that keep SME enabled may have CpuAcc thread count capped for specific shape classes.
The heuristic does not rewrite the graph or force a specific kernel. It emits CpuAcc ModelOptions:

SmeEnabled = true/false
SveEnabled = true/false
NumberOfThreads = optional override
Those options are later consumed by the Neon backend model context and passed into ACL CPU feature masking.

Latest representative results:

Datatype-isolated latest run

Mode SP HP Q
NEON 3032 5324 7117
SME2 3458 5193 7072

The latest run shows the quantized path recovering from the previous S26 CME result of 3690 to roughly 7072, bringing it close to the non-SME/QMX-class quantized results while preserving strong SP/HP performance.

@damdoo01-arm damdoo01-arm changed the title Damdoo01/geekbench sme2 heuristic SME2/SVE/NEON heuristic Jun 12, 2026
@damdoo01-arm damdoo01-arm changed the title SME2/SVE/NEON heuristic SME2/SVE/NEON heuristic - ArmNN Jun 12, 2026
@@ -1,5 +1,5 @@
//
// Copyright © 2017-2024 Arm Ltd and Contributors. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017-2024, 2026

@@ -1,10 +1,12 @@
//
// Copyright © 2020 Arm Ltd and Contributors. All rights reserved.
// Copyright © 2026 Arm Ltd and Contributors. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020, 2026

@@ -1,5 +1,5 @@
//
// Copyright © 2020 Arm Ltd and Contributors. All rights reserved.
// Copyright © 2026 Arm Ltd and Contributors. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020, 2026

@damdoo01-arm damdoo01-arm force-pushed the damdoo01/geekbench_sme2_heuristic branch from ce27c85 to c93a29c Compare June 15, 2026 15:35
@damdoo01-arm damdoo01-arm force-pushed the damdoo01/geekbench_sme2_heuristic branch from c93a29c to a667499 Compare June 15, 2026 15:38
damdoo01-arm added a commit to damdoo01-arm/ComputeLibrary that referenced this pull request Jun 16, 2026
Expose runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied.

Full context in the ArmNN PR: ARM-software/armnn#820

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

Change-Id: I602cebdd58942930d248948788bfac9e2be56474
damdoo01-arm added a commit to damdoo01-arm/ComputeLibrary that referenced this pull request Jun 18, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied.

Full context in the ArmNN PR: ARM-software/armnn#820

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

Change-Id: I602cebdd58942930d248948788bfac9e2be56474
damdoo01-arm added a commit to damdoo01-arm/ComputeLibrary that referenced this pull request Jun 19, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied.

Full context in the ArmNN PR: ARM-software/armnn#820

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

Change-Id: I602cebdd58942930d248948788bfac9e2be56474
gunes-arm pushed a commit to ARM-software/ComputeLibrary that referenced this pull request Jun 22, 2026
Expose experimental runtime controls in CPUInfo so clients can mask SME/SME2 and SVE capabilities when selecting CPU kernels. This lets higher-level frameworks steer ACL away from ISA paths that should not be used for a graph while preserving default hardware-based selection when no override is supplied.

Full context in the ArmNN PR: ARM-software/armnn#820

Signed-off-by: Damien Dooley <damien.dooley@arm.com>

Change-Id: I602cebdd58942930d248948788bfac9e2be56474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants