[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1972

2026-04-14T12:56:51Z

github-actions[bot]
bot Apr 14, 2026

📊 Current CI/CD Pipeline Status

The repository has a comprehensive and mature CI/CD setup with 57 active workflows — 30 standard GitHub Actions and 27 agentic (AWF-powered) workflows. Key metrics from a recent sample of PR runs (30 runs) show a 53% failure rate, driven largely by smoke tests and the Security Guard agentic workflow, which are non-blocking or reaction-gated.

Pipeline health summary (PR-triggered workflows):

Workflow	Purpose	Trigger
Build Verification	Lint + build + unit tests (Node 20/22)	All PRs
Lint	ESLint + markdownlint	All PRs
TypeScript Type Check	`tsc --noEmit`	All PRs
Test Coverage	Jest unit tests + regression guard	All PRs
Integration Tests	Domain/network/protocol/container tests	All PRs
Chroot Integration Tests	Language/package manager/procfs tests	All PRs
Examples Test	Shell example script execution	All PRs
Test Setup Action	GitHub Action install verification	All PRs
CodeQL	SAST (JS/TS + Actions workflows)	All PRs
Dependency Vulnerability Audit	`npm audit --audit-level=high`	All PRs
PR Title Check	Semantic commit format enforcement	All PRs
Security Guard	Agentic Claude security review	All PRs (agentic)
Smoke Tests (Claude/Copilot/Codex/Chroot/Services)	Full agent E2E runs	Reaction-gated PRs
Link Check	Markdown link validation	`.md` file PRs
Documentation Preview	Docs site build	All PRs
Performance Benchmarks	Startup/teardown latency	Daily only (not PRs)

✅ Existing Quality Gates

The following checks are well-established and running on every PR:

ESLint with custom rules (eslint-rules/no-unsafe-execa.test.js)
Markdownlint via markdownlint-cli2
TypeScript strict type checking (tsconfig.check.json)
Multi-node build matrix (Node 20 and 22)
Unit test coverage with PR comparison comments and regression detection
Integration test suites split across 6 parallel jobs (domain, network, protocol/security, container ops, API proxy, chroot)
CodeQL SAST scanning JS/TS code and Actions workflow files with security-extended,security-and-quality query pack
Dependency vulnerability scanning with npm audit → SARIF upload to Security tab
Semantic PR title enforcement via amannn/action-semantic-pull-request
Example scripts tested end-to-end
GitHub Action self-test (latest version + specific version + invalid version rejection)
Link checker for documentation
Agentic security review (Claude-powered Security Guard)
Daily performance benchmarks with regression issue creation
AI-powered agentic workflows: security review, dependency monitoring, doc maintenance, token optimization, issue triage, CI doctor

🔍 Identified Gaps

High Priority

1. Critically Low Unit Test Coverage on Core Modules

Impact: Bugs in the most important code paths go undetected.

Current coverage is very low and the thresholds are set barely above current reality:

cli.ts → 0% coverage (0/69 statements, 0/10 functions)
docker-manager.ts → 18% coverage (45/250 statements, 1/25 functions)
Overall: 38.39% statements, 31.78% branches

These two files are the core of the system, yet they have near-zero automated test coverage. The coverage gate allows regressions as long as coverage doesn't drop below already-low thresholds.

2. Performance Benchmarks Not Running on PRs

Impact: PRs that introduce startup/teardown latency regressions aren't caught until the next daily run, making it hard to identify the culprit commit.

performance-monitor.yml runs only on schedule (daily at 06:00 UTC) and workflow_dispatch. There is no PR integration, so a PR doubling container startup time would merge undetected.

3. No Container/Docker Image Vulnerability Scanning

Impact: Base images (ubuntu/squid:latest, ubuntu:22.04, node:*) may contain known CVEs.

npm audit covers Node.js dependencies in the host CLI and API proxy but doesn't scan Docker images. There's no Trivy, Grype, or Docker Scout integration checking the three container images for OS-level CVEs.

4. No Dockerfile / Shell Script Static Analysis

Impact: Shell script bugs, insecure Dockerfile patterns, and Squid config errors go unreviewed by automated tooling.

containers/agent/entrypoint.sh, containers/agent/setup-iptables.sh, and the squid entrypoint are security-critical scripts with no automated linting (ShellCheck, Hadolint).

5. Smoke Tests Are Reaction-Gated (Not Required)

Impact: Full E2E agent runs (Claude, Copilot, Codex) are optional and only run when maintainers add specific emoji reactions. A breaking change could merge without ever running a real agent.

Smoke workflows use reaction: triggers (heart/eyes/hooray/rocket) rather than always running on PRs. This is understandable for cost, but creates a risk for changes to containers or agent entrypoint.

Medium Priority

6. No Mutation Testing

Impact: Tests may pass without actually validating behavior (false green tests).

Jest unit tests exist but there's no mutation testing (e.g., Stryker) to verify tests would catch real bugs. Given the low overall coverage and the security-sensitive nature of the codebase, mutation testing would dramatically improve confidence in the test suite.

7. Coverage Thresholds Too Permissive

Impact: PRs can freely reduce coverage to the thresholds (38%/30%/35%/38%) without failing.

Current thresholds are essentially set to the current minimum, meaning a PR can delete all tests for a file and still pass if overall coverage stays above the floor. The thresholds should be raised incrementally and file-level thresholds should be added for critical modules.

8. No SBOM (Software Bill of Materials) Generation

Impact: No machine-readable inventory of dependencies for supply chain security auditing.

There's no SBOM generation step in the release pipeline. Given this is a security-focused tool, an SBOM (npm sbom or syft) would improve supply chain transparency.

9. `test-integration.yml` File Naming Confusion

Impact: Developer confusion about what each workflow file does.

The file .github/workflows/test-integration.yml contains the TypeScript Type Check workflow, not integration tests. The actual integration test suite is in test-integration-suite.yml. This causes confusion when debugging CI failures.

10. No Secrets / Credential Scanning in PRs

Impact: Accidental secret commits could go undetected without a dedicated scanner.

While CodeQL covers some patterns, there's no dedicated gitleaks, trufflehog, or secret-scanning step in the PR pipeline. GitHub's native secret scanning may cover this at the org level, but it's not visible in the workflow files.

11. Integration Tests Not Enforced as Required Status Checks

Impact: Integration tests run in parallel but their failure doesn't clearly block merging if branch protection isn't configured to require them.

From the workflow files alone, it's not clear which checks are configured as required status checks in branch protection rules. If integration tests aren't required, PRs can merge even when they fail.

Low Priority

12. No Build Artifact Size Monitoring

Impact: Gradual dist bundle bloat goes unnoticed.

No workflow tracks dist/ size changes across PRs. A PR adding a large dependency could significantly increase the CLI install footprint without triggering any alerts.

13. No Snapshot Testing for Generated Configs

Impact: Unintended changes to squid.conf or docker-compose.yml output format may slip through.

squid-config.ts has 100% unit test coverage, but the tests likely assert behavior rather than exact output. Snapshot tests for the full generated squid.conf and Docker Compose YAML would catch unintended formatting/structure changes.

14. Performance Benchmarks Use Pinned Unpinned Actions

Impact: Supply chain risk.

performance-monitor.yml uses actions/checkout@v4, actions/setup-node@v4, actions/github-script@v7, actions/upload-artifact@v4 — all floating tags without SHA pins. All other workflows properly pin actions to SHAs.

15. No Automated Changelog Validation

Impact: PRs can merge without CHANGELOG entries for user-facing changes.

There's semantic PR title enforcement, but no check that CHANGELOG.md or release notes are updated for feat and fix type PRs.

📋 Actionable Recommendations

#	Gap	Recommended Solution	Complexity	Impact
1	Low `cli.ts`/`docker-manager.ts` coverage	Add unit tests with mocked Docker/execa; raise per-file thresholds	High	Critical
2	Performance not on PRs	Add lightweight benchmark job to `build.yml` measuring startup time with comment	Medium	High
3	No container CVE scanning	Add Trivy scan step in `build.yml` after `docker build` steps	Low	High
4	No Dockerfile/shell linting	Add `hadolint` for Dockerfiles and `shellcheck` for `.sh` files in `lint.yml`	Low	High
5	Smoke tests are optional	Add path-filtered smoke test for `containers/**` changes (not reaction-gated)	Medium	High
6	Mutation testing absent	Add Stryker.js to `package.json` and run weekly/on coverage regressions	High	Medium
7	Permissive coverage thresholds	Raise thresholds 2% per quarter; add per-file floors for `cli.ts`/`docker-manager.ts`	Low	Medium
8	No SBOM	Add `npx` @cyclonedx/cyclonedx-npm`` or `npm sbom` to release workflow	Low	Medium
9	Workflow file naming	Rename `test-integration.yml` to `type-check.yml`	Low	Low
10	No dedicated secrets scanning	Add `trufflesecurity/trufflehog-actions-scan` on PRs	Low	Medium
11	Integration tests not gated	Configure required status checks in branch protection for integration test jobs	Low	High
12	No artifact size tracking	Add `du -sh dist/` comparison step to `build.yml` with step summary	Low	Low
13	No snapshot tests	Add Jest snapshot tests for `generateSquidConfig()` and `generateDockerCompose()`	Medium	Medium
14	Unpinned actions in perf workflow	Pin all actions in `performance-monitor.yml` to commit SHAs	Low	Low
15	No changelog enforcement	Add a check for CHANGELOG update on `feat`/`fix` PRs using `actions/github-script`	Medium	Low

📈 Metrics Summary

Metric	Value
Total active workflows	57
Standard GitHub Actions workflows	30
Agentic (AWF-powered) workflows	27
Workflows triggering on every PR	~15
Reaction-gated smoke test workflows	5
Recent PR run overall failure rate (sample n=30)	~53%
Unit test statement coverage	38.39%
Unit test branch coverage	31.78%
`cli.ts` coverage	0%
`docker-manager.ts` function coverage	4%
Integration test suites	6 parallel jobs
Max integration test timeout	45 minutes
Performance benchmarks frequency	Daily (not on PRs)
SAST coverage	JS/TS + Actions workflows (CodeQL)
Container image vulnerability scanning	❌ None
Shell script static analysis	❌ None

Top 3 recommendations by ROI:

🔴 Add hadolint + shellcheck to lint.yml — 1 hour of work, catches security bugs in shell scripts
🔴 Add Trivy container scanning after docker build steps — 30 minutes, catches OS CVEs
🟡 Raise coverage thresholds incrementally + add file-level floors for cli.ts / docker-manager.ts — drives the most impactful long-term quality improvement

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 732.9K · ◷

expires on Apr 21, 2026, 12:56 PM UTC

2026-04-14T12:57:51Z

github-actions[bot]
bot Apr 14, 2026
Author

🔮 The ancient spirits stir, and the smoke-test oracle has walked this thread.
The runes report: network, build, and file rites were observed.
May the firewall wards remain unbroken until the next moon.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1972

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1972

Uh oh!

github-actions[bot] bot Apr 14, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

High Priority

1. Critically Low Unit Test Coverage on Core Modules

2. Performance Benchmarks Not Running on PRs

3. No Container/Docker Image Vulnerability Scanning

4. No Dockerfile / Shell Script Static Analysis

5. Smoke Tests Are Reaction-Gated (Not Required)

Medium Priority

6. No Mutation Testing

7. Coverage Thresholds Too Permissive

8. No SBOM (Software Bill of Materials) Generation

9. test-integration.yml File Naming Confusion

10. No Secrets / Credential Scanning in PRs

11. Integration Tests Not Enforced as Required Status Checks

Low Priority

12. No Build Artifact Size Monitoring

13. No Snapshot Testing for Generated Configs

14. Performance Benchmarks Use Pinned Unpinned Actions

15. No Automated Changelog Validation

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 14, 2026 Author

github-actions[bot]
bot Apr 14, 2026

9. `test-integration.yml` File Naming Confusion

github-actions[bot]
bot Apr 14, 2026
Author