diff --git a/docs/pipelines/HPC_CICD_ARCHITECTURE.md b/docs/pipelines/HPC_CICD_ARCHITECTURE.md index d42fc6b..34f0bf3 100644 --- a/docs/pipelines/HPC_CICD_ARCHITECTURE.md +++ b/docs/pipelines/HPC_CICD_ARCHITECTURE.md @@ -90,7 +90,7 @@ Developer 37. latency-analysis 38. memory-bandwidth-tests -**Primary tools:** pytest-benchmark, nvprof/nsight, Airspeed Velocity. +**Primary tools:** pytest-benchmark, nvprof, Nsight Systems, Nsight Compute, Airspeed Velocity. ### Layer 8 — Security @@ -154,6 +154,18 @@ Use artifact-backed baseline comparison: 3. Compare with a fixed threshold (default: 5% slowdown). 4. Fail PR when threshold is exceeded. +The 5% slowdown value is intended as a conservative default that works well for many +numerical and GPU/CPU kernel workloads where benchmark noise is typically low but +non-zero. Projects SHOULD tune this threshold based on: + +- The inherent variance of each benchmark (e.g., noisy integration tests may need a higher tolerance). +- The business and latency criticality of the kernel (e.g., low-latency paths may warrant a stricter threshold). +- The stability of the underlying hardware and environment (e.g., shared/cloud GPUs vs. dedicated machines). + +In practice, configure the threshold as a parameter in your CI (per-suite or per-benchmark) +rather than hard-coding 5% globally, and document the chosen values and rationale in +the repository (e.g., in `CONTRIBUTING.md` or a benchmarking README). + ## 6) Hardware Validation Strategy Recommended dedicated GPU runner pools: @@ -187,7 +199,7 @@ Alternative cloud-backed runner pools: Recommended minimum gate: ```bash -pytest --cov=kernels --cov=implementations --cov-fail-under=85 +pytest --cov=kernels --cov-fail-under=85 ``` Raise the threshold for critical packages over time as flaky suites are removed.