Add vector-accelerated assembly code for RISC-V 64-bit architecture#14894
Open
fengpengboa wants to merge 5 commits into
Open
Add vector-accelerated assembly code for RISC-V 64-bit architecture#14894fengpengboa wants to merge 5 commits into
fengpengboa wants to merge 5 commits into
Conversation
…lems on RISC‑V: * Broken native optimisation in build_detect_platform (typo + unsafe -march string) * LLD linker detection false positives in both CMake and Makefile builds.
Fix multiple RISC‑V build issues: correct -march generation and improve LLD detection (CMake & Makefile)
…platforms that support the Zbc (Carry-less Multiplication) extension.
RISC-V: Optimize crc32c with Zbc extension
… with zvbc instruction set.The Zvbc path is used only when Zvbc+Zbb are available and VLEN is 128 bits (the Zvbc asm is VLEN=128-tuned),otherwise the Zbc scalar path is used. 2.VLEN is read at runtime via the vlenb CSR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR depends on Previous PR which is still pending merge into the main branch. The current changes build directly upon the modifications introduced in that earlier submission. Please review the prerequisite PR first to ensure proper context for these incremental updates. All new functionality/tests in this PR assume the foundational changes from Previous PR are applied.
#14536
#14530
Add a vector-accelerated CRC32C path for RISC-V 64-bit using the Zvbc(vector carry-less multiply) extension, layering on top of the existingZbc scalar path.
New vector path: vectorized CLMUL folding at 128 bytes per iteration, followed by a 128->64 fold and Barrett reduction.Buffers smaller than 128 bytes use a byte-wise table walk fallback.
Runtime dispatch : the Zvbc vector path is taken only when both Zvbc and Zbb are available and VLEN == 128 (the Zvbc assembly is tuned for VLEN=128, read at runtime via the vlenb CSR). Otherwise it falls back to the Zbc scalar path (64 B/iter).
1.crc32c test
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from CRC
[ RUN ] CRC.StandardResults
[ OK ] CRC.StandardResults (2 ms)
[ RUN ] CRC.Values
[ OK ] CRC.Values (0 ms)
[ RUN ] CRC.Extend
[ OK ] CRC.Extend (0 ms)
[ RUN ] CRC.Mask
[ OK ] CRC.Mask (0 ms)
[ RUN ] CRC.Crc32cCombineBasicTest
[ OK ] CRC.Crc32cCombineBasicTest (0 ms)
[ RUN ] CRC.Crc32cCombineOrderMattersTest
[ OK ] CRC.Crc32cCombineOrderMattersTest (0 ms)
[ RUN ] CRC.Crc32cCombineFullCoverTest
[ OK ] CRC.Crc32cCombineFullCoverTest (582 ms)
[ RUN ] CRC.Crc32cCombineBigSizeTest
[ OK ] CRC.Crc32cCombineBigSizeTest (1069 ms)
[----------] 8 tests from CRC (1654 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (1654 ms total)
[ PASSED ] 8 tests.
2.Performance Results
Performance was measured using the crc32c_bench microbenchmark
./db_bench --benchmarks="crc32c"
On a VLEN=128 platform with Zvbc+Zbb enabled, the Zvbc vector path delivers roughly a 10% CRC32C throughput improvement over the Zbc scalar path