feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42
feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42TimothyVang wants to merge 2 commits into
Conversation
…A.2] Failing tests for the dual-mode three-way verification strategy (ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8): - cloud agrees with >=1 local AND locals agree with each other -> VETTED_DUAL - cloud disagrees with both locals -> CONTESTED - cloud agrees with 1 local but locals disagree -> CONTESTED (DUAL_REQUIRES_LOCALS_AGREE is the load-bearing second clause; without it dual-mode silently degrades to "cloud + cherry-picked local") - mitre-divergence variants (locals-divergent and cloud-divergent both -> CONTESTED) - empty-set rule carries: empty cloud blocks cloud-vs-local agreement; empty local blocks the locals-agree clause -> CONTESTED in both cases - DUAL_REQUIRES_LOCALS_AGREE flag MUST be consulted (no embedded True) - engine-family distinctness on the local pair: two Qwen3 locals collapses locals-agree to self-consistency and is rejected - the cloud slot must be a cloud-family output (defensive boundary refusal at the dispatch layer) - VerdictResult.notes records all three engines (vetted) and the clause-failure reason (contested) for ledger audit + replan_node routing The pure consensus surface is compute_verdict(cloud, qwen, glm) on EngineOutput records (shared with W3.A.1). verify(...) raises NotImplementedError until W2.B wires the cloud Claude client + two SGLang clients (CLAUDE.md §3.10 backend stub permission). Module under test does not yet exist; collection ERRORs with ModuleNotFoundError. GREEN follows in the next commit.
Implements the dual-mode quorum strategy
(ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8):
- compute_verdict(cloud, qwen, glm) is the pure consensus surface.
- cloud agrees with >=1 local AND locals agree with each other ->
VETTED_DUAL.
- cloud disagrees with both locals -> CONTESTED.
- cloud agrees with 1 local but locals disagree -> CONTESTED.
- "Agree" predicate is identical to air-gap mode: identical
mitre_technique AND Jaccard(artifact_paths) >=
AIRGAP_JACCARD_THRESHOLD (read from strategy.py — never hard-coded).
Reusing the threshold means dual mode is "air-gap quorum on each
pair plus a cloud-anchor requirement" — one knob, one place.
- DUAL_REQUIRES_LOCALS_AGREE flag is read from strategy.py per call
(load-bearing — flipping it to False would silently weaken
dual-mode to "cloud + cherry-picked local" and discard the
locals-agree clause CLAUDE.md §8 names).
- Empty-set rule (ARCHITECTURE.md §1) carries: any participant with
empty artifact_paths is DISAGREEMENT for every pair involving it.
Empty cloud blocks cloud-anchor; empty local blocks locals-agree.
- Boundary refusals at the dispatch layer:
- cloud slot must be a cloud-family output (defensive — a
misrouted Qwen3 in the cloud slot would silently mislabel
VETTED_DUAL).
- local pair must be CROSS-engine (two Qwen3 collapses
locals-agree to self-consistency and breaks the air-gap
independence guarantee that dual mode inherits).
Both raise ValueError so programming errors surface immediately
rather than producing an untrustworthy verdict.
- VerdictResult.notes records all three engines on vetted, and the
clause-failure reason on contested, so the ledger has the full
audit handle and replan_node can route on disagreement type.
verify(...) raises NotImplementedError until W2.B wires the cloud
Claude client + the two SGLang clients + ledger plumbing
(CLAUDE.md §3.10 explicitly permits this backend-level stub: the
consensus logic is real and exercised via compute_verdict; the
forbidden pattern is mocking *internal* logic).
15/15 tests in tests/verification/test_dual_lane_cross_engine.py pass;
45/45 in the full verification suite pass; ruff clean.
Review — W3.A.2 DualLaneCrossEngine [automated reviewer, tier-1]CI result (local run on Consensus-logic correctnessAll spec invariants pass:
Spec compliance on the
Finding:
|
Summary
W3.A.2 implementation of
DualLaneCrossEngine— dual-mode quorum strategythat runs THREE engines in parallel (cloud Claude, Qwen3, GLM-4.5-Air)
and accepts the finding only under the conjunctive rule:
Both clauses are required (CLAUDE.md §8 / ARCHITECTURE.md §1 rows 6-8).
compute_verdict(cloud=, qwen=, glm=)is the pure consensus surface.VETTED_DUALCONTESTEDCONTESTEDmitre_techniqueANDJaccard(
artifact_paths) >=AIRGAP_JACCARD_THRESHOLD. Reuses thethreshold so dual mode = "air-gap quorum on each pair + cloud-anchor".
DUAL_REQUIRES_LOCALS_AGREEflag is read fromstrategy.pyper call.Load-bearing: flipping to False would silently weaken dual to
"cloud + cherry-picked local".
artifact_pathsfrom any participantis DISAGREEMENT for every pair involving it.
ValueError):qwenvsqwen)VerdictResult.notesrecords all three engines on vetted andclause-failure reason on contested for ledger audit + replan_node
disagreement-type routing.
verify(...)raisesNotImplementedErroruntil W2.B wires the cloudClaude client + two SGLang clients + ledger plumbing. CLAUDE.md §3.10
explicitly permits this backend-level stub: the consensus logic is
real and exercised by 15 unit tests against
EngineOutputrecords.Builds on PR #37 (
feat/W3.A.1-airgap-cross-engine) — base set tothat branch so this PR's diff shows only the W3.A.2 surface (one new
file:
verdict/verification/dual_lane_cross_engine.py+ its test).Test plan
tests/verification/test_dual_lane_cross_engine.py— 15 tests passruff check verdict/ tests/cleancommit subjects with
[W3.A.2]per CLAUDE.md §3.7NotImplementedErrorinverify()with realtransport that calls
compute_verdict(...)with three renderedEngineOutputrecords