Skip to content

[CPU][RV64] Enhance RV64 convolution with GEMM optimizations and refactoring#312

Open
strelkovkm wants to merge 16 commits into
openvinotoolkit:v3.10_for_ie_masterfrom
strelkovkm:dev_conv_rv64
Open

[CPU][RV64] Enhance RV64 convolution with GEMM optimizations and refactoring#312
strelkovkm wants to merge 16 commits into
openvinotoolkit:v3.10_for_ie_masterfrom
strelkovkm:dev_conv_rv64

Conversation

@strelkovkm
Copy link
Copy Markdown

@strelkovkm strelkovkm commented May 16, 2026

Description

Summary of the change:
This PR introduces an optimized convolution execution path for the RISC-V (RV64) architecture using RISC-V Vector (RVV) intrinsics. It implements a vectorized im2col + GEMM approach to accelerate spatial convolutions. Additionally, it introduces a robust fallback mechanism to reference primitives (ref) to ensure numerical stability and graceful degradation for tensor shapes or strides not currently covered by the optimized vector loops.

Motivation and context:
Native execution of heavy topological blocks (like Convolution) on RISC-V targets previously defaulted to suboptimal scalar reference implementations, resulting in severe inference bottlenecks on edge hardware. By utilizing RVV intrinsics, this patch significantly improves layout handling, hot loop execution efficiency, and overall compute throughput.

Validation was performed natively on an 8-core SpacemiT K1 RISC-V platform (Orange Pi RV2). The optimization was verified using benchmark_app with performance counters (-pc) enabled, alongside end-to-end functional validation on YOLOv5/v8/v11/v26 object detection models to guarantee absolute correctness and zero regressions.

Fixes # (N/A)

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements? (Note: Validated up to 10x local latency reduction on target SpacemiT K1 hardware during YOLO inferencing).

New features

  • Have you published an RFC for the new feature?
  • Was the RFC approved?
  • Have you added relevant tests?

Bug fixes

  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • Have you added relevant regression tests?

RFC PR

  • Does RFC document follow the template?
  • Have you added a link to the rendered document?

@strelkovkm strelkovkm changed the title [CPU][RISC-V] Enhance RV64 convolution with JIT optimizations and refactoring [CPU][RISC-V] Enhance RV64 convolution with GEMM optimizations and refactoring May 16, 2026
@aobolensk
Copy link
Copy Markdown

@vpirogov
Copy link
Copy Markdown

vpirogov commented May 18, 2026

@strelkovkm, @aobolensk, any particular reason to do this in the fork instead of upstream oneDNN? There was a lot of activity in upstream related to RISC-V upstream in the last 6 months.

@strelkovkm strelkovkm changed the title [CPU][RISC-V] Enhance RV64 convolution with GEMM optimizations and refactoring [CPU][RV64] Enhance RV64 convolution with GEMM optimizations and refactoring May 19, 2026
@aobolensk
Copy link
Copy Markdown

@strelkovkm, @aobolensk, any particular reason to do this in the fork instead of upstream oneDNN? There was a lot of activity in upstream related to RISC-V upstream in the last 6 months.

Required convolution was used on top of the current OpenVINO trunk. It lags a liitle on a version or two, so it was decided to publish it here for now with further upstreaming. But, as I can see, we need to see the difference between this and https://github.com/uxlfoundation/oneDNN/blob/main/src/cpu/rv64/rvv_brgemm_conv.cpp

@vpirogov
Copy link
Copy Markdown

so it was decided to publish it here for now with further upstreaming

It would be probably more efficient to upstream first and then cherry-pick to the fork. Otherwise you might end reimplementing things that are already available upstream :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants