Skip to content

Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772

Draft
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:qat_review
Draft

Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:qat_review

Conversation

@dgarske

@dgarske dgarske commented Jun 24, 2026

Copy link
Copy Markdown
Member

Changes

  • Interleave instances across devices. cpaCyGetInstances() returns instances grouped by device, so the per-thread round-robin piled thread counts below the instance count onto device 0. IntelQaInterleaveInstances() reorders by device so consecutive threads land on different devices. Default on; opt-out QAT_NO_DEV_INTERLEAVE.
  • Software-fallback fix. The NUMA allocator returned NULL when the QAT service isn't started, breaking RSA / TLS cert-verify (-142/-140/-173) whenever the device couldn't be opened. It now falls back to regular memory so crypto runs in software, gated by IntelQaIsStarted() so a live device still gets a clean error on real NUMA exhaustion.
  • Cavium/Nitrox req_count OOB write. wolfAsync_EventQueuePoll() did not reset req_count after the multi-request flush, indexing past multi_req.req[CAVIUM_MAX_POLL]. HAVE_CAVIUM-gated. (CWE-787, Project Vanessa.)
  • RSA public free used dev instead of dev->heap.
  • Docs (port/intel/README.md): sudo-free operation, serial make check, multi-device benchmark guidance, and a QAT health-diagnostics section.

Performance (3x Intel C62x, RSA-2048 sign, ops/sec)

The interleave spreads load across all 3 devices at thread counts below the instance count (18); neutral above that. AES unchanged vs master.

threads before after
6 1.72M (device 0 only) 2.01M (all 3 devices)
16 9.39M 12.66M (+35%)
18 15.7M 14.9M (noise)

@dgarske dgarske self-assigned this Jun 24, 2026
Copilot AI review requested due to automatic review settings June 24, 2026 22:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves hardware-accelerated crypto offload behavior for Intel QuickAssist (QAT) and Cavium/Nitrox, focusing on better multi-device utilization, more robust software fallback when QAT isn’t available, and a Nitrox polling safety fix.

Changes:

  • Reorders QAT crypto instances to interleave across devices by default (opt-out via QAT_NO_DEV_INTERLEAVE) to improve utilization at lower thread counts.
  • Fixes software-fallback behavior in the QAT NUMA allocator when the QAT service isn’t started, allowing crypto to proceed in software.
  • Fixes a Cavium/Nitrox multi-request polling OOB condition by resetting req_count after buffer flush; also corrects an RSA public free heap parameter and expands Intel QAT documentation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
wolfssl/wolfcrypt/port/intel/quickassist_mem.h Adds an internal “is QAT started” query used by the QAT memory layer to decide when to fall back to regular memory.
wolfcrypt/src/port/intel/README.md Updates Intel QAT usage docs (non-sudo operation, serialized testing guidance, multi-device benchmarking, diagnostics).
wolfcrypt/src/port/intel/quickassist.c Adds IntelQaIsStarted() and instance interleaving logic; fixes RSA public free heap usage.
wolfcrypt/src/port/intel/quickassist_mem.c Adds fallback to regular malloc when NUMA allocation fails and QAT service is not started.
wolfcrypt/src/async.c Resets Cavium req_count after flushing multi-request poll buffer to avoid OOB writes.
Makefile.am Serializes make execution when Intel QAT is enabled via .NOTPARALLEL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +326 to +333
/* Returns nonzero when the QAT crypto service is running. The memory layer
* uses this to decide whether a failed NUMA allocation should fall back to
* regular memory (service not started -> software mode) or remain NULL (real
* NUMA exhaustion while the device is in use). */
int IntelQaIsStarted(void)
{
return (g_cyServiceStarted == CPA_TRUE) ? 1 : 0;
}
Comment on lines +413 to +422
/* If the QAT memory subsystem is not available (async device not
* opened, e.g. "Running without async") fall back to regular memory
* so software crypto can proceed. A NULL while the subsystem IS up
* means real NUMA exhaustion and is left NULL so the QAT operation
* fails cleanly rather than receiving non-DMA memory. */
if (ptr == NULL && !IntelQaIsStarted()) {
isNuma = 0;
page_offset = QAE_NOT_NUMA_PAGE;
ptr = malloc(size + sizeof(qaeMemHeader));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants