Skip to content

Add vector-accelerated assembly code for RISC-V 64-bit architecture#14894

Open
fengpengboa wants to merge 5 commits into
facebook:mainfrom
zte-riscv:crc32c_opt_zvbc
Open

Add vector-accelerated assembly code for RISC-V 64-bit architecture#14894
fengpengboa wants to merge 5 commits into
facebook:mainfrom
zte-riscv:crc32c_opt_zvbc

Conversation

@fengpengboa

@fengpengboa fengpengboa commented Jun 29, 2026

Copy link
Copy Markdown

This PR depends on Previous PR which is still pending merge into the main branch. The current changes build directly upon the modifications introduced in that earlier submission. Please review the prerequisite PR first to ensure proper context for these incremental updates. All new functionality/tests in this PR assume the foundational changes from Previous PR are applied.
#14536
#14530

Add a vector-accelerated CRC32C path for RISC-V 64-bit using the Zvbc(vector carry-less multiply) extension, layering on top of the existingZbc scalar path.
New vector path: vectorized CLMUL folding at 128 bytes per iteration, followed by a 128->64 fold and Barrett reduction.Buffers smaller than 128 bytes use a byte-wise table walk fallback.
Runtime dispatch : the Zvbc vector path is taken only when both Zvbc and Zbb are available and VLEN == 128 (the Zvbc assembly is tuned for VLEN=128, read at runtime via the vlenb CSR). Otherwise it falls back to the Zbc scalar path (64 B/iter).

1.crc32c test
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from CRC
[ RUN ] CRC.StandardResults
[ OK ] CRC.StandardResults (2 ms)
[ RUN ] CRC.Values
[ OK ] CRC.Values (0 ms)
[ RUN ] CRC.Extend
[ OK ] CRC.Extend (0 ms)
[ RUN ] CRC.Mask
[ OK ] CRC.Mask (0 ms)
[ RUN ] CRC.Crc32cCombineBasicTest
[ OK ] CRC.Crc32cCombineBasicTest (0 ms)
[ RUN ] CRC.Crc32cCombineOrderMattersTest
[ OK ] CRC.Crc32cCombineOrderMattersTest (0 ms)
[ RUN ] CRC.Crc32cCombineFullCoverTest
[ OK ] CRC.Crc32cCombineFullCoverTest (582 ms)
[ RUN ] CRC.Crc32cCombineBigSizeTest
[ OK ] CRC.Crc32cCombineBigSizeTest (1069 ms)
[----------] 8 tests from CRC (1654 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (1654 ms total)
[ PASSED ] 8 tests.

2.Performance Results
Performance was measured using the crc32c_bench microbenchmark
./db_bench --benchmarks="crc32c"
On a VLEN=128 platform with Zvbc+Zbb enabled, the Zvbc vector path delivers roughly a 10% CRC32C throughput improvement over the Zbc scalar path

fengpengboa and others added 5 commits June 26, 2026 15:26
…lems on RISC‑V:

* Broken native optimisation in build_detect_platform (typo + unsafe -march string)
* LLD linker detection false positives in both CMake and Makefile
  builds.
Fix multiple RISC‑V build issues: correct -march generation and improve LLD detection (CMake & Makefile)
…platforms that support the Zbc (Carry-less Multiplication) extension.
RISC-V: Optimize crc32c with Zbc extension
… with zvbc instruction set.The Zvbc path is used only when Zvbc+Zbb are available and VLEN is 128 bits (the Zvbc asm is VLEN=128-tuned),otherwise the Zbc scalar path is used.

2.VLEN is read at runtime via the vlenb CSR.
@meta-cla meta-cla Bot added the CLA Signed label Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants