[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1972
Replies: 1 comment
-
|
🔮 The ancient spirits stir, and the smoke-test oracle has walked this thread.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a comprehensive and mature CI/CD setup with 57 active workflows — 30 standard GitHub Actions and 27 agentic (AWF-powered) workflows. Key metrics from a recent sample of PR runs (30 runs) show a 53% failure rate, driven largely by smoke tests and the Security Guard agentic workflow, which are non-blocking or reaction-gated.
Pipeline health summary (PR-triggered workflows):
tsc --noEmitnpm audit --audit-level=high.mdfile PRs✅ Existing Quality Gates
The following checks are well-established and running on every PR:
eslint-rules/no-unsafe-execa.test.js)markdownlint-cli2tsconfig.check.json)security-extended,security-and-qualityquery packnpm audit→ SARIF upload to Security tabamannn/action-semantic-pull-request🔍 Identified Gaps
High Priority
1. Critically Low Unit Test Coverage on Core Modules
Impact: Bugs in the most important code paths go undetected.
Current coverage is very low and the thresholds are set barely above current reality:
cli.ts→ 0% coverage (0/69 statements, 0/10 functions)docker-manager.ts→ 18% coverage (45/250 statements, 1/25 functions)These two files are the core of the system, yet they have near-zero automated test coverage. The coverage gate allows regressions as long as coverage doesn't drop below already-low thresholds.
2. Performance Benchmarks Not Running on PRs
Impact: PRs that introduce startup/teardown latency regressions aren't caught until the next daily run, making it hard to identify the culprit commit.
performance-monitor.ymlruns only onschedule(daily at 06:00 UTC) andworkflow_dispatch. There is no PR integration, so a PR doubling container startup time would merge undetected.3. No Container/Docker Image Vulnerability Scanning
Impact: Base images (
ubuntu/squid:latest,ubuntu:22.04,node:*) may contain known CVEs.npm auditcovers Node.js dependencies in the host CLI and API proxy but doesn't scan Docker images. There's no Trivy, Grype, or Docker Scout integration checking the three container images for OS-level CVEs.4. No Dockerfile / Shell Script Static Analysis
Impact: Shell script bugs, insecure Dockerfile patterns, and Squid config errors go unreviewed by automated tooling.
containers/agent/entrypoint.sh,containers/agent/setup-iptables.sh, and the squid entrypoint are security-critical scripts with no automated linting (ShellCheck, Hadolint).5. Smoke Tests Are Reaction-Gated (Not Required)
Impact: Full E2E agent runs (Claude, Copilot, Codex) are optional and only run when maintainers add specific emoji reactions. A breaking change could merge without ever running a real agent.
Smoke workflows use
reaction:triggers (heart/eyes/hooray/rocket) rather than always running on PRs. This is understandable for cost, but creates a risk for changes to containers or agent entrypoint.Medium Priority
6. No Mutation Testing
Impact: Tests may pass without actually validating behavior (false green tests).
Jest unit tests exist but there's no mutation testing (e.g., Stryker) to verify tests would catch real bugs. Given the low overall coverage and the security-sensitive nature of the codebase, mutation testing would dramatically improve confidence in the test suite.
7. Coverage Thresholds Too Permissive
Impact: PRs can freely reduce coverage to the thresholds (38%/30%/35%/38%) without failing.
Current thresholds are essentially set to the current minimum, meaning a PR can delete all tests for a file and still pass if overall coverage stays above the floor. The thresholds should be raised incrementally and file-level thresholds should be added for critical modules.
8. No SBOM (Software Bill of Materials) Generation
Impact: No machine-readable inventory of dependencies for supply chain security auditing.
There's no SBOM generation step in the release pipeline. Given this is a security-focused tool, an SBOM (
npm sbomorsyft) would improve supply chain transparency.9.
test-integration.ymlFile Naming ConfusionImpact: Developer confusion about what each workflow file does.
The file
.github/workflows/test-integration.ymlcontains the TypeScript Type Check workflow, not integration tests. The actual integration test suite is intest-integration-suite.yml. This causes confusion when debugging CI failures.10. No Secrets / Credential Scanning in PRs
Impact: Accidental secret commits could go undetected without a dedicated scanner.
While CodeQL covers some patterns, there's no dedicated
gitleaks,trufflehog, orsecret-scanningstep in the PR pipeline. GitHub's native secret scanning may cover this at the org level, but it's not visible in the workflow files.11. Integration Tests Not Enforced as Required Status Checks
Impact: Integration tests run in parallel but their failure doesn't clearly block merging if branch protection isn't configured to require them.
From the workflow files alone, it's not clear which checks are configured as required status checks in branch protection rules. If integration tests aren't required, PRs can merge even when they fail.
Low Priority
12. No Build Artifact Size Monitoring
Impact: Gradual dist bundle bloat goes unnoticed.
No workflow tracks
dist/size changes across PRs. A PR adding a large dependency could significantly increase the CLI install footprint without triggering any alerts.13. No Snapshot Testing for Generated Configs
Impact: Unintended changes to
squid.confordocker-compose.ymloutput format may slip through.squid-config.tshas 100% unit test coverage, but the tests likely assert behavior rather than exact output. Snapshot tests for the full generatedsquid.confand Docker Compose YAML would catch unintended formatting/structure changes.14. Performance Benchmarks Use Pinned Unpinned Actions
Impact: Supply chain risk.
performance-monitor.ymlusesactions/checkout@v4,actions/setup-node@v4,actions/github-script@v7,actions/upload-artifact@v4— all floating tags without SHA pins. All other workflows properly pin actions to SHAs.15. No Automated Changelog Validation
Impact: PRs can merge without CHANGELOG entries for user-facing changes.
There's semantic PR title enforcement, but no check that
CHANGELOG.mdor release notes are updated forfeatandfixtype PRs.📋 Actionable Recommendations
cli.ts/docker-manager.tscoveragebuild.ymlmeasuring startup time with commentbuild.ymlafterdocker buildstepshadolintfor Dockerfiles andshellcheckfor.shfiles inlint.ymlcontainers/**changes (not reaction-gated)package.jsonand run weekly/on coverage regressionscli.ts/docker-manager.tsnpx@cyclonedx/cyclonedx-npm`` ornpm sbomto release workflowtest-integration.ymltotype-check.ymltrufflesecurity/trufflehog-actions-scanon PRsdu -sh dist/comparison step tobuild.ymlwith step summarygenerateSquidConfig()andgenerateDockerCompose()performance-monitor.ymlto commit SHAsfeat/fixPRs usingactions/github-script📈 Metrics Summary
cli.tscoveragedocker-manager.tsfunction coverageTop 3 recommendations by ROI:
hadolint+shellchecktolint.yml— 1 hour of work, catches security bugs in shell scriptsdocker buildsteps — 30 minutes, catches OS CVEscli.ts/docker-manager.ts— drives the most impactful long-term quality improvementBeta Was this translation helpful? Give feedback.
All reactions