fix(guardrails): recover from provider outages#875
Conversation
zaxbysauce
left a comment
There was a problem hiding this comment.
Swarm PR Review — PR #875 fix(guardrails): recover from provider outages
Structured review via 6-phase swarm workflow: 6 parallel explorer lanes → independent reviewer → T2 targeted re-verification → synthesis.
PR is currently in DRAFT state — feedback below is intended to inform the remaining work before promotion.
Verdict: REQUEST_CHANGES
Reason: Two confirmed HIGH gaps (T2: new provider-recovery detection logic is entirely untested; T4: dedup suppression invariant O-002 has no test) plus one confirmed HIGH finding (T1: pre-existing mock.module() debt). Core implementation is sound — five HIGH/MEDIUM security and correctness explorer candidates were disproved after reviewer verification.
Explorer False Positives (DISPROVED — calling these out explicitly)
The following candidates were raised by explorers but are not real issues:
| ID | Explorer Claim | Verdict | Reason |
|---|---|---|---|
| C4 | HIGH — regex [_\s-] char class ambiguity |
DISPROVED | - at end of char class is always literal in JS regex; pattern is correct |
| S1 | MEDIUM — prompt injection via advisory embedding | DISPROVED | Advisory uses a hardcoded template; zero user/provider-controlled text is embedded |
| S2 | MEDIUM — HTML/Markdown escaping absent | DISPROVED | Output goes to system prompt string, not HTML; escaping is not applicable |
| C5 | MEDIUM — fingerprint set before advisory injected | DISPROVED | Actual code order is correct: advisory pushed at lines 2326–2328, fingerprint set at 2329 |
| D1 | MEDIUM — version mismatch 7.20.0 vs v7.20.1 | DISPROVED | Both package.json and docs/releases/v7.20.0.md are 7.20.0; explorer misread filename |
Pre-Existing Issues (not introduced by this PR)
| ID | Severity | Finding | Location |
|---|---|---|---|
| T1 | HIGH (pre-existing) | mock.module() used instead of _internals DI — violates invariant 7 |
tests/unit/hooks/guardrails-error-classification.test.ts lines 1138, 1168, 1211, 1255, 1318 |
T1 pre-dates PR #875 and is not a blocker for this PR, but is tracked tech debt.
Confirmed Findings (blockers and advisories)
🔴 T2 — HIGH | Missing test coverage for core new detection logic
tests/unit/hooks/advisory-injection.test.ts — verified by targeted reviewer agent reading the full file.
The five existing tests in advisory-injection.test.ts cover only the generic advisory queue mechanics (inject, clear, non-architect skip, create-when-absent, multi-join). The new provider-recovery detection path is entirely untested:
| Function / Behavior | Test Status |
|---|---|
isTransientProviderFailureText() |
❌ NOT TESTED |
getProviderFailureFingerprint() |
❌ NOT TESTED |
getMostRecentAssistantText() |
❌ NOT TESTED |
lastProviderRecoveryFingerprint dedup suppression |
❌ NOT TESTED |
| Non-transient error does NOT trigger advisory | ❌ NOT TESTED |
| Second identical fingerprint suppresses advisory (O-002) | ❌ NOT TESTED |
The invariant audit claims "advisory-injection.test.ts covers injection, dedupe, and false-positive prose behavior" — this is partially inaccurate. The dedup mechanism (O-002) is the central correctness claim of this PR and has no test.
Required tests before merge:
isTransientProviderFailureText()returns true forprovider_unavailable/network connection lost, false for auth errors and prosegetProviderFailureFingerprint()returns stable output for same input, different for different inputgetMostRecentAssistantText()returns empty string for empty array, correct text for non-empty- Same fingerprint on second call → no advisory injected (dedup)
- Non-transient error (e.g.
401 unauthorized) → no advisory injected
🟡 T4 — MEDIUM | No test for fingerprint dedup suppression
Directly related to T2: the core O-002 obligation ("bounded and session-scoped, no spam") has no test verifying that a repeated identical provider outage fingerprint causes the advisory to be skipped on the second detection.
🟡 T5 — MEDIUM | No test for non-transient error non-injection
No test confirms that a non-matching error pattern (auth error, rate limit, validation failure) does not trigger the recovery advisory. This is essential to prevent advisory noise on hard errors that shouldn't trigger recovery guidance.
🟡 A1 — MEDIUM | O(N) transcript scan on every messagesTransform call
getMostRecentAssistantText() iterates messages backward unconditionally on every messagesTransform invocation for Architect sessions. For long-running sessions with large transcripts, this adds measurable latency per turn. Consider an early-exit gate and caching the last-seen message index.
ℹ️ C1 — ADVISORY (downgraded from MEDIUM)
The fingerprint is read and written with no await between them. Explorer flagged this as a TOCTOU race; in practice JavaScript's single-threaded execution model makes this a non-issue in the current hook architecture. Noted for awareness only.
Obligation Assessment
| ID | Obligation | Status |
|---|---|---|
| O-001 | Add next-turn Architect recovery guidance | ✅ MET |
| O-002 | Session-scoped, no duplicate spam | |
| O-003 | Preserve existing retry path | ✅ MET |
| O-004 | Release notes (docs/releases/v7.20.0.md) |
✅ MET |
| O-005 | bun run build passes, dist included |
✅ MET |
| O-006 | 228 tests pass | |
| O-007 | lastProviderRecoveryFingerprint is session-scoped |
✅ MET |
| O-008 | New tests use _internals DI not mock.module |
Summary
The implementation is clean and correct — advisory injection uses a hardcoded safe template, fingerprint dedup is synchronous and correct, session scoping is correct, release notes are accurate, existing retry path is preserved. The gap is entirely in test coverage: the new detection helpers and the dedup mechanism that are the heart of this PR's correctness claim are not tested. Adding the five tests listed under T2 closes the blockers.
Generated by Claude Code
902e98c to
1a5a5f0
Compare
zaxbysauce
left a comment
There was a problem hiding this comment.
Swarm Re-validation — PR #875 (updated commit 1a5a5f0)
Re-ran targeted verification against all blockers from the previous review. Results:
| Blocker | Status | Evidence |
|---|---|---|
T2 (HIGH) — isTransientProviderFailureText() untested |
✅ RESOLVED | Test "provider recovery helpers classify, fingerprint, and select transcript text directly" — directly exercises all three helper functions with true/false cases including 401 auth error |
T2 (HIGH) — getProviderFailureFingerprint() untested |
✅ RESOLVED | Same test verifies stability (same input → same output) and differentiation (different inputs → different outputs) |
T2 (HIGH) — getMostRecentAssistantText() untested |
✅ RESOLVED | Same test verifies correct message selection, multi-part filtering, and non-text-part exclusion |
| T4 (MEDIUM) — No dedup suppression test | ✅ RESOLVED | Tests "does not duplicate provider recovery guidance for the same interrupted transcript" and "does not duplicate provider recovery guidance when OpenCode supplies fresh message arrays" |
| T5 (MEDIUM) — No non-transient error test | ✅ RESOLVED | Test "does not inject provider recovery guidance for non-transient auth failures" — 401 unauthorized produces no TRANSIENT PROVIDER RECOVERY advisory |
Only open item: A1 (MEDIUM advisory) — the backward scan of all messages in getMostRecentAssistantText() on every messagesTransform call is still O(N) with no early-exit gate. This was flagged as a performance advisory, not a blocker. Leaving it to the author's discretion.
Updated verdict: APPROVE_WITH_NOTES — all blockers resolved. A1 remains advisory-only. PR is still in DRAFT; when ready to promote, the one open item is the O(N) scan optimization.
Generated by Claude Code
…ble output Removes console.warn/console.info/console.error from the non-fatal diagnostic paths in executePhaseComplete and routes them to either the tool result warnings array (operator-visible, structured) or the debug-gated logger (OPENCODE_SWARM_DEBUG=1 only). Fixes issue #875. Changes: - safeWarn: console.warn → logger.warn (debug-gated) - Turbo-mode gate bypass: console.warn → warnings.push - drift_check disabled skip: console.info removed (warnings.push kept) - non-code phase skip: console.info removed (warnings.push kept) - Lock release failure: console.error → logger.warn (non-blocking, post-decision) Regression tests added for turbo-mode and non-code drift-skip paths asserting no console writes occur. ## Invariant audit - 1 (plugin init): not touched — no init path changes; phase-complete runs post-init - 2 (runtime portability): touched — dist/ rebuilt; node --input-type=module -e "await import('./dist/index.js')" → dist import OK - 3 (subprocesses): not touched — grep of changed source files shows no new spawn call sites - 4 (.swarm containment): not touched — no .swarm/ path changes - 5 (plan durability): not touched — no plan schema changes - 6 (test_runner safety): not touched — no test_runner changes - 7 (test writing): touched — two new bun:test regression tests with finally-block console restore; no mock.module usage - 8 (session state): not touched — no session state changes - 9 (guardrails/retry): not touched — no retry logic changes - 10 (chat/system msg): touched — grep src/tools/phase-complete.ts for console.warn/info/error returns zero matches; fix removes the violation - 11 (tool registration): not touched — no tool registration changes - 12 (release/cache): touched — docs/releases/v7.20.2.md created; package.json/CHANGELOG/.release-please-manifest.json untouched ## Pre-existing failures Tiers 2–4 contain pre-existing failures confirmed on clean main checkout: diagnostic-gating (3), phase-complete-lean-turbo-critic (9), phase-complete-lean-turbo-reviewer (7), sast-and-cochanger-tools (18), tool-registration-conformance (1), update-task-status.gate-fix (4), integration/phase-completion-e2e (subset), adversarial (10). None are in files touched by this PR. https://claude.ai/code/session_013N8A2pNYg7vwLtnjvP68cW
…ble output (#876) Removes console.warn/console.info/console.error from the non-fatal diagnostic paths in executePhaseComplete and routes them to either the tool result warnings array (operator-visible, structured) or the debug-gated logger (OPENCODE_SWARM_DEBUG=1 only). Fixes issue #875. Changes: - safeWarn: console.warn → logger.warn (debug-gated) - Turbo-mode gate bypass: console.warn → warnings.push - drift_check disabled skip: console.info removed (warnings.push kept) - non-code phase skip: console.info removed (warnings.push kept) - Lock release failure: console.error → logger.warn (non-blocking, post-decision) Regression tests added for turbo-mode and non-code drift-skip paths asserting no console writes occur. ## Invariant audit - 1 (plugin init): not touched — no init path changes; phase-complete runs post-init - 2 (runtime portability): touched — dist/ rebuilt; node --input-type=module -e "await import('./dist/index.js')" → dist import OK - 3 (subprocesses): not touched — grep of changed source files shows no new spawn call sites - 4 (.swarm containment): not touched — no .swarm/ path changes - 5 (plan durability): not touched — no plan schema changes - 6 (test_runner safety): not touched — no test_runner changes - 7 (test writing): touched — two new bun:test regression tests with finally-block console restore; no mock.module usage - 8 (session state): not touched — no session state changes - 9 (guardrails/retry): not touched — no retry logic changes - 10 (chat/system msg): touched — grep src/tools/phase-complete.ts for console.warn/info/error returns zero matches; fix removes the violation - 11 (tool registration): not touched — no tool registration changes - 12 (release/cache): touched — docs/releases/v7.20.2.md created; package.json/CHANGELOG/.release-please-manifest.json untouched ## Pre-existing failures Tiers 2–4 contain pre-existing failures confirmed on clean main checkout: diagnostic-gating (3), phase-complete-lean-turbo-critic (9), phase-complete-lean-turbo-reviewer (7), sast-and-cochanger-tools (18), tool-registration-conformance (1), update-task-status.gate-fix (4), integration/phase-completion-e2e (subset), adversarial (10). None are in files touched by this PR. https://claude.ai/code/session_013N8A2pNYg7vwLtnjvP68cW Co-authored-by: Claude <noreply@anthropic.com>
Summary
provider_unavailable/Network connection lost.Invariant audit
node scripts/repro-704.mjswas run and timed out on both this branch and cleanorigin/main.bun run buildpassed;node --input-type=module -e "await import('./dist/index.js'); console.log('dist import OK')"printeddist import OK..swarmstorage or working-directory containment changes.bun/nodecommands; no broad OpenCodetest_runnervalidation was used..opencode/skills/writing-tests/SKILL.md; new/updated tests usebun:testand_internalsdependency injection instead ofmock.modulefor changed hook collaborators.AgentSessionState.lastProviderRecoveryFingerprintis session-scoped and covered by same-array plus fresh-array advisory dedupe tests.advisory-injection.test.tscovers injection, dedupe, and false-positive prose behavior.TOOL_NAMES,AGENT_TOOL_MAP, command, or help surface changes.docs/releases/v7.20.1.md; release-please-owned version files were not edited manually.Test plan
bun run buildgit diff --exit-code -- distidentified tracked dist drift after build; dist artifacts are included in this PR.node --input-type=module -e "await import('./dist/index.js'); console.log('dist import OK')"bun run typecheckbunx biome ci .(passed with existing warnings insrc/turbo/lean/runner.ts)bun --smol test tests/unit/hooks/advisory-injection.test.ts tests/unit/hooks/guardrails-error-classification.test.ts tests/unit/hooks/guardrails-model-fallback.test.ts tests/unit/hooks/guardrails-fallback.test.ts tests/unit/hooks/guardrails-model-fallback.adversarial.test.ts tests/unit/hooks/guardrails.test.ts tests/unit/hooks/telemetry-guardrails-wiring.test.ts --timeout 30000(228 pass, 1 pre-existing failure also reproduced onorigin/main)git diff --checkPre-existing failures
node scripts/repro-704.mjstimed out after 304s on this branch and on a freshly built cleanorigin/mainworktree atC:\tmp\opencode-swarm-main-856.tests/unit/hooks/guardrails-model-fallback.adversarial.test.tshas a stale object-error assertion after current main added structured provider-error extraction; the same failure reproduces onorigin/main.tests/unit/tools/diagnostic-gating.test.ts,doc-scan*.test.ts, multiple Lean Turbophase-complete*tests, repo-graph symlink tests, registration/conformance tests,sast-and-cochanger-tools.test.ts,test-runner-scope-cap.test.ts, andupdate-task-status*.test.tsall had failures that reproduced on cleanorigin/main.tests/unit/services/skill-generator.test.tsfails on cleanorigin/maindue Windows path separator expectations.bun --smol test tests/unit/cli tests/unit/commands tests/unit/config --timeout 120000bucket is also red in this environment with mock/command isolation failures and childbunspawnENOENT; it was not treated as a pass signal.