[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1902

2026-04-11T12:53:34Z

github-actions[bot]
bot Apr 11, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and comprehensive CI/CD pipeline with 20+ workflows running on every PR. The overall health is strong, with a sophisticated combination of traditional CI checks and novel agentic workflows for security review and smoke testing.

Pipeline Overview:

Total workflow files: 45 (27 agentic .md + 18 conventional .yml)
Workflows triggered on PRs: ~20 (see below)
Recent PR workflow outcomes: Largely action_required — consistent with active branch protection enforcement

✅ Existing Quality Gates

Code Quality

Check	Workflow	Trigger
ESLint (TypeScript)	`lint.yml`	Every PR
Markdown lint	`lint.yml`	Every PR
TypeScript type check (`tsc --noEmit`)	`test-integration.yml`	Every PR
Conventional commit PR title	`pr-title.yml`	Every PR

Build Verification

Check	Workflow	Trigger
Build on Node 20 + 22 matrix	`build.yml`	Every PR
api-proxy container unit tests	`build.yml`	Every PR
cli-proxy container unit tests	`build.yml`	Every PR
Action installation testing	`test-action.yml`	Every PR
Examples script testing	`test-examples.yml`	Every PR

Testing

Check	Workflow	Trigger
Unit test coverage with PR delta comment	`test-coverage.yml`	Every PR
Integration tests: domain, network, protocol, container ops, API proxy (5 parallel jobs)	`test-integration-suite.yml`	Every PR
Chroot integration tests: languages, package managers, procfs, edge cases (4 parallel jobs)	`test-chroot.yml`	Every PR
Multi-language build tests (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)	`build-test.md` (agentic)	Every PR
Smoke tests: Claude, Codex, Copilot, Services, Chroot	`smoke-*.md` (agentic)	Every PR

Security

Check	Workflow	Trigger
CodeQL (JS/TS + Actions) with `security-extended,security-and-quality`	`codeql.yml`	Every PR
npm audit with SARIF upload (main + docs-site)	`dependency-audit.yml`	Every PR
AI-powered security code review (Claude)	`security-guard.md` (agentic)	Every PR

Documentation

Check	Workflow	Trigger
Documentation link check	`link-check.yml`	PRs touching `*.md` only
Documentation site preview build	`docs-preview.yml`	PRs touching `docs/**` only

Ongoing Monitoring (not PR-gated)

Performance benchmarking — daily, tracks startup time and memory usage
Security review & threat modeling — daily (agentic)
Dependency vulnerability monitoring — daily (agentic)
Token usage analysis (Claude + Copilot) — daily (agentic)
CLI flag consistency checker — weekly (agentic)
Documentation maintainer — daily (agentic)

🔍 Identified Gaps

🔴 High Priority

1. No Container Image Security Scanning (CVE Scanning)

Problem: The Docker images (agent, squid, api-proxy, cli-proxy) are built and published to GHCR, but never scanned for OS-level CVEs or known vulnerabilities in installed packages. CodeQL covers TypeScript source code but not the container base layers (Ubuntu 22.04, ubuntu/squid:latest), installed apt packages, or bundled binaries.

Why it matters: This is a security tool. A compromised or vulnerable base image could undermine the entire firewall's security posture. Ubuntu base images accumulate CVEs between major updates.

Recommended solution: Add Trivy or Grype scanning to build.yml:

- name: Scan agent container image
  uses: aquasecurity/trivy-action@...
  with:
    image-ref: ghcr.io/github/gh-aw-firewall/agent:latest
    severity: HIGH,CRITICAL
    exit-code: 1

Apply to all four container images. Also add to release.yml before publishing.

Complexity: Low | Impact: High

2. Performance Regression Not Gated on PRs

Problem: performance-monitor.yml runs only on a daily schedule. Performance regressions (startup latency, memory usage) are detected after merging to main, not before.

Why it matters: A PR that doubles container startup time (a key user-facing metric tracked in benchmarks/) would be merged silently. The repository already has benchmarking infrastructure (scripts/ci/benchmark-performance.ts) but it's never invoked on PRs.

Recommended solution: Add a lightweight performance smoke gate to PRs that compares p95 startup time against the stored baseline:

on:
  pull_request:
    paths: ['src/**', 'containers/**']

Use a reduced iteration count (e.g., 5 instead of 30) for faster PR feedback.

Complexity: Medium | Impact: High

3. No ShellCheck Linting for Security-Critical Shell Scripts

Problem: The most security-sensitive scripts in the repository (containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, containers/squid/entrypoint.sh, scripts/ci/cleanup.sh, containers/cli-proxy/tcp-tunnel.js) have zero static analysis. Shell script bugs in setup-iptables.sh directly affect the firewall's security guarantees.

Why it matters: ShellCheck catches unquoted variables (injection risk), undefined variable references, incorrect use of [ ] vs [[ ]], and many classes of shell scripting bugs that could silently fail or create security holes.

Recommended solution: Add a shellcheck step to lint.yml:

- name: ShellCheck
  uses: ludeeus/action-shellcheck@...
  with:
    scandir: './containers'
    additional_files: 'scripts/ci/*.sh'
    severity: warning

Complexity: Low | Impact: High

🟡 Medium Priority

4. Critically Low Unit Test Coverage for Core Modules

Problem: The two largest and most critical source files have near-zero unit test coverage:

cli.ts: 0% (0/69 statements) — the main entry point
docker-manager.ts: 18% (45/250 statements, 4% function coverage)

The current global thresholds (38% statements, 30% branches) are barely above current levels and have no per-file minimums. New modules can be added with 0% coverage without failing CI.

Why it matters: docker-manager.ts orchestrates the entire container lifecycle and generates Docker Compose/config files. Argument parsing, cleanup, error recovery, and log streaming paths are entirely untested at the unit level.

Recommended solution:

Add per-file coverage thresholds in jest.config.js for cli.ts and docker-manager.ts
Set a roadmap target: 50% → 70% → 80% for these files
The existing test-coverage-improver workflow (weekly agentic) already creates PRs to improve coverage — ensure it's given these files as priority targets

Complexity: High | Impact: High

5. No Dockerfile Linting (Hadolint)

Problem: The four Dockerfiles in containers/ (squid/, agent/, api-proxy/, cli-proxy/) are not linted. Hadolint checks for security issues, inefficient layer ordering, pinned base images, and best practices.

Recommended solution: Add to lint.yml:

- name: Lint Dockerfiles
  uses: hadolint/hadolint-action@...
  with:
    dockerfile: containers/agent/Dockerfile
    recursive: true

Complexity: Low | Impact: Medium

6. Link Check Not Triggered by Code Changes

Problem: link-check.yml only runs when .md files change. A code refactoring that renames a file, removes a section, or changes a URL structure won't trigger link checking, allowing broken documentation links to silently accumulate.

Recommended solution: Add a weekly scheduled run (already partially present) and consider running on all PRs with a reasonable timeout:

pull_request:
  branches: [main]  # Remove path restriction, or add weekly schedule

Complexity: Low | Impact: Medium

7. `performance-monitor.yml` Uses Unpinned Action Refs

Problem: While all other workflows pin actions to SHA hashes, performance-monitor.yml uses floating tags (actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4, actions/github-script@v7). This creates an inconsistent security posture.

Recommended solution: Pin all actions to SHA commits, matching the pattern used everywhere else in the repository.

Complexity: Low | Impact: Medium

8. No SBOM Generation in Release Pipeline

Problem: release.yml publishes container images to GHCR but generates no Software Bill of Materials (SBOM) or provenance attestation. For a security-focused tool, this is a notable gap — users cannot verify what's in the published images.

Recommended solution: Use actions/attest-build-provenance and Syft/Trivy SBOM generation in release.yml:

- name: Generate SBOM
  uses: anchore/sbom-action@...
  with:
    image: ghcr.io/github/gh-aw-firewall/agent:$\{\{ env.VERSION }}

Complexity: Medium | Impact: Medium

🟢 Low Priority

9. No Mutation Testing

Problem: Test quality (not just quantity) is unverified. At 38% statement coverage, there are many tests that assert on behavior but may not actually catch regressions (e.g., tests that always pass regardless of code changes).

Recommended solution: Add Stryker Mutator for TypeScript:

npx stryker run

Run weekly or on-demand rather than every PR (it's expensive).

Complexity: High | Impact: Medium

10. No Spell Checking for Documentation

Problem: No spell checker (cspell, codespell) runs on documentation files, README, or inline code comments.

Recommended solution: Add cspell to lint.yml or link-check.yml with a project-specific wordlist for technical terms.

Complexity: Low | Impact: Low

11. Node.js Matrix Doesn't Include LTS Node 18

Problem: build.yml tests on Node 20 and 22. Node 18 is in active LTS (until April 2025) and may be used by some users. While not critical, broader matrix coverage catches compatibility issues early.

Complexity: Low | Impact: Low

📋 Actionable Recommendations Summary

Gap	Priority	Complexity	Impact	Action
Container image CVE scanning (Trivy/Grype)	🔴 High	Low	High	Add to `build.yml` and `release.yml`
Performance regression gate on PRs	🔴 High	Medium	High	Add lightweight PR benchmark job
ShellCheck for shell scripts	🔴 High	Low	High	Add to `lint.yml`
Unit test coverage for `cli.ts`/`docker-manager.ts`	🟡 Medium	High	High	Per-file thresholds + coverage improver priorities
Dockerfile linting (Hadolint)	🟡 Medium	Low	Medium	Add to `lint.yml`
Link check path restriction	🟡 Medium	Low	Medium	Add scheduled full run
Pin actions in performance-monitor.yml	🟡 Medium	Low	Medium	Update 4 action refs to SHA
SBOM generation in releases	🟡 Medium	Medium	Medium	Add to `release.yml`
Mutation testing	🟢 Low	High	Medium	Weekly schedule with Stryker
Spell checking	🟢 Low	Low	Low	Add cspell to lint.yml
Expand Node matrix to include 18	🟢 Low	Low	Low	Add node-version: '18' to matrix

📈 Metrics Summary

Metric	Value
Total workflow files	45
Conventional workflows (`.yml`)	18
Agentic workflows (`.md`)	27
Workflows triggered on PRs	~20
Unit test coverage — statements	38.39%
Unit test coverage — branches	31.78%
Unit test coverage — functions	37.03%
Unit test coverage — lines	38.31%
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Total unit tests	135
Integration test categories	9 parallel jobs (5 network + 4 chroot)
Agentic smoke test languages	8 (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)
AI agent smoke test coverage	Claude, Codex, Copilot
Performance benchmarks tracked	Daily (not PR-gated)

Strengths: The pipeline's use of agentic workflows for security review (Claude), multi-language build testing (Copilot), and smoke testing is unusually sophisticated. The integration and chroot test suites are comprehensive for a firewall/sandbox tool. CodeQL with security-extended and AI-powered security guard provide strong security coverage.

Primary focus areas: Container image scanning, performance regression gating, shell script linting, and improving unit test coverage for core orchestration files are the highest-ROI improvements.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 854.6K · ◷

expires on Apr 18, 2026, 12:53 PM UTC

2026-04-11T12:55:35Z

github-actions[bot]
bot Apr 11, 2026
Author

🔮 The ancient spirits stir in the firewall vaults.
The oracle of smoke has walked this thread and read the runes.
Signals seen: proxy paths bright, build forge lit, title omen true.
May guarded egress remain unbroken until the next moon of CI.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1902

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1902

Uh oh!

github-actions[bot] bot Apr 11, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Code Quality

Build Verification

Testing

Security

Documentation

Ongoing Monitoring (not PR-gated)

🔍 Identified Gaps

🔴 High Priority

1. No Container Image Security Scanning (CVE Scanning)

2. Performance Regression Not Gated on PRs

3. No ShellCheck Linting for Security-Critical Shell Scripts

🟡 Medium Priority

4. Critically Low Unit Test Coverage for Core Modules

5. No Dockerfile Linting (Hadolint)

6. Link Check Not Triggered by Code Changes

7. performance-monitor.yml Uses Unpinned Action Refs

8. No SBOM Generation in Release Pipeline

🟢 Low Priority

9. No Mutation Testing

10. No Spell Checking for Documentation

11. Node.js Matrix Doesn't Include LTS Node 18

📋 Actionable Recommendations Summary

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 11, 2026 Author

github-actions[bot]
bot Apr 11, 2026

7. `performance-monitor.yml` Uses Unpinned Action Refs

github-actions[bot]
bot Apr 11, 2026
Author