[CPU][RV64] Enhance RV64 convolution with GEMM optimizations and refactoring#312
[CPU][RV64] Enhance RV64 convolution with GEMM optimizations and refactoring#312strelkovkm wants to merge 16 commits into
Conversation
Reason: fix jit_utils
|
@strelkovkm, @aobolensk, any particular reason to do this in the fork instead of upstream oneDNN? There was a lot of activity in upstream related to RISC-V upstream in the last 6 months. |
Required convolution was used on top of the current OpenVINO trunk. It lags a liitle on a version or two, so it was decided to publish it here for now with further upstreaming. But, as I can see, we need to see the difference between this and https://github.com/uxlfoundation/oneDNN/blob/main/src/cpu/rv64/rvv_brgemm_conv.cpp |
It would be probably more efficient to upstream first and then cherry-pick to the fork. Otherwise you might end reimplementing things that are already available upstream :) |
Description
Summary of the change:
This PR introduces an optimized convolution execution path for the RISC-V (RV64) architecture using RISC-V Vector (RVV) intrinsics. It implements a vectorized
im2col+GEMMapproach to accelerate spatial convolutions. Additionally, it introduces a robust fallback mechanism to reference primitives (ref) to ensure numerical stability and graceful degradation for tensor shapes or strides not currently covered by the optimized vector loops.Motivation and context:
Native execution of heavy topological blocks (like Convolution) on RISC-V targets previously defaulted to suboptimal scalar reference implementations, resulting in severe inference bottlenecks on edge hardware. By utilizing RVV intrinsics, this patch significantly improves layout handling, hot loop execution efficiency, and overall compute throughput.
Validation was performed natively on an 8-core SpacemiT K1 RISC-V platform (Orange Pi RV2). The optimization was verified using
benchmark_appwith performance counters (-pc) enabled, alongside end-to-end functional validation on YOLOv5/v8/v11/v26 object detection models to guarantee absolute correctness and zero regressions.Fixes # (N/A)
Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Performance improvements
New features
Bug fixes
RFC PR