feat(verification): DualLaneCrossEngine three-way verification [W3.A.2] by TimothyVang · Pull Request #42 · TimothyVang/Verdict

TimothyVang · 2026-05-02T13:20:59Z

Summary

W3.A.2 implementation of DualLaneCrossEngine — dual-mode quorum strategy
that runs THREE engines in parallel (cloud Claude, Qwen3, GLM-4.5-Air)
and accepts the finding only under the conjunctive rule:

cloud agrees with >=1 local  AND  locals agree with each other.

Both clauses are required (CLAUDE.md §8 / ARCHITECTURE.md §1 rows 6-8).

compute_verdict(cloud=, qwen=, glm=) is the pure consensus surface.
- cloud agrees with >=1 local AND locals agree -> VETTED_DUAL
- cloud disagrees with both locals -> CONTESTED
- cloud agrees with 1 local but locals disagree -> CONTESTED
"Agree" predicate is identical to air-gap: same mitre_technique AND
Jaccard(artifact_paths) >= AIRGAP_JACCARD_THRESHOLD. Reuses the
threshold so dual mode = "air-gap quorum on each pair + cloud-anchor".
DUAL_REQUIRES_LOCALS_AGREE flag is read from strategy.py per call.
Load-bearing: flipping to False would silently weaken dual to
"cloud + cherry-picked local".
Empty-set rule carries: empty artifact_paths from any participant
is DISAGREEMENT for every pair involving it.
Boundary refusals (raise ValueError):
- cloud slot must be a cloud-family output
- local pair must be CROSS-engine (refuses qwen vs qwen)
VerdictResult.notes records all three engines on vetted and
clause-failure reason on contested for ledger audit + replan_node
disagreement-type routing.

verify(...) raises NotImplementedError until W2.B wires the cloud
Claude client + two SGLang clients + ledger plumbing. CLAUDE.md §3.10
explicitly permits this backend-level stub: the consensus logic is
real and exercised by 15 unit tests against EngineOutput records.

Builds on PR #37 (feat/W3.A.1-airgap-cross-engine) — base set to
that branch so this PR's diff shows only the W3.A.2 surface (one new
file: verdict/verification/dual_lane_cross_engine.py + its test).

Test plan

tests/verification/test_dual_lane_cross_engine.py — 15 tests pass
Full verification suite green (45/45)
ruff check verdict/ tests/ clean
RED commit lands first, GREEN commit lands second; per-task-ID
commit subjects with [W3.A.2] per CLAUDE.md §3.7
After W2.B: replace NotImplementedError in verify() with real
transport that calls compute_verdict(...) with three rendered
EngineOutput records

…A.2] Failing tests for the dual-mode three-way verification strategy (ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8): - cloud agrees with >=1 local AND locals agree with each other -> VETTED_DUAL - cloud disagrees with both locals -> CONTESTED - cloud agrees with 1 local but locals disagree -> CONTESTED (DUAL_REQUIRES_LOCALS_AGREE is the load-bearing second clause; without it dual-mode silently degrades to "cloud + cherry-picked local") - mitre-divergence variants (locals-divergent and cloud-divergent both -> CONTESTED) - empty-set rule carries: empty cloud blocks cloud-vs-local agreement; empty local blocks the locals-agree clause -> CONTESTED in both cases - DUAL_REQUIRES_LOCALS_AGREE flag MUST be consulted (no embedded True) - engine-family distinctness on the local pair: two Qwen3 locals collapses locals-agree to self-consistency and is rejected - the cloud slot must be a cloud-family output (defensive boundary refusal at the dispatch layer) - VerdictResult.notes records all three engines (vetted) and the clause-failure reason (contested) for ledger audit + replan_node routing The pure consensus surface is compute_verdict(cloud, qwen, glm) on EngineOutput records (shared with W3.A.1). verify(...) raises NotImplementedError until W2.B wires the cloud Claude client + two SGLang clients (CLAUDE.md §3.10 backend stub permission). Module under test does not yet exist; collection ERRORs with ModuleNotFoundError. GREEN follows in the next commit.

Implements the dual-mode quorum strategy (ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8): - compute_verdict(cloud, qwen, glm) is the pure consensus surface. - cloud agrees with >=1 local AND locals agree with each other -> VETTED_DUAL. - cloud disagrees with both locals -> CONTESTED. - cloud agrees with 1 local but locals disagree -> CONTESTED. - "Agree" predicate is identical to air-gap mode: identical mitre_technique AND Jaccard(artifact_paths) >= AIRGAP_JACCARD_THRESHOLD (read from strategy.py — never hard-coded). Reusing the threshold means dual mode is "air-gap quorum on each pair plus a cloud-anchor requirement" — one knob, one place. - DUAL_REQUIRES_LOCALS_AGREE flag is read from strategy.py per call (load-bearing — flipping it to False would silently weaken dual-mode to "cloud + cherry-picked local" and discard the locals-agree clause CLAUDE.md §8 names). - Empty-set rule (ARCHITECTURE.md §1) carries: any participant with empty artifact_paths is DISAGREEMENT for every pair involving it. Empty cloud blocks cloud-anchor; empty local blocks locals-agree. - Boundary refusals at the dispatch layer: - cloud slot must be a cloud-family output (defensive — a misrouted Qwen3 in the cloud slot would silently mislabel VETTED_DUAL). - local pair must be CROSS-engine (two Qwen3 collapses locals-agree to self-consistency and breaks the air-gap independence guarantee that dual mode inherits). Both raise ValueError so programming errors surface immediately rather than producing an untrustworthy verdict. - VerdictResult.notes records all three engines on vetted, and the clause-failure reason on contested, so the ledger has the full audit handle and replan_node can route on disagreement type. verify(...) raises NotImplementedError until W2.B wires the cloud Claude client + the two SGLang clients + ledger plumbing (CLAUDE.md §3.10 explicitly permits this backend-level stub: the consensus logic is real and exercised via compute_verdict; the forbidden pattern is mocking *internal* logic). 15/15 tests in tests/verification/test_dual_lane_cross_engine.py pass; 45/45 in the full verification suite pass; ruff clean.

TimothyVang · 2026-05-02T13:33:11Z

Review — W3.A.2 DualLaneCrossEngine [automated reviewer, tier-1]

CI result (local run on feat/W3.A.2-dual-lane-cross-engine): 45/45 tests pass (30 inherited from #37 + 15 new). ruff check verdict/verification/ clean.

Consensus-logic correctness

All spec invariants pass:

Invariant	Location	Verdict
`VETTED_DUAL` iff cloud-anchor AND locals-agree	`compute_verdict` lines 107–126	PASS
cloud disagrees with both locals → `CONTESTED`	lines 128–148	PASS
cloud agrees with 1 local but locals disagree → `CONTESTED`	lines 128–148	PASS
`DUAL_REQUIRES_LOCALS_AGREE` read from `_strategy` per call, not embedded	line 110	PASS
Empty `artifact_paths` from any participant → DISAGREEMENT for every pair involving it	`_pair_agrees` lines 54–58	PASS
Cloud slot must be `claude`-family, raises `ValueError` otherwise	lines 86–95	PASS
Local pair must be cross-family, raises `ValueError` on same-family	lines 96–106	PASS
`VerdictResult.notes` records all three engines on VETTED, clause-failure reason on CONTESTED	lines 116–126, 130–148	PASS

Spec compliance on the DUAL_REQUIRES_LOCALS_AGREE gate: The implementation reads require_locals_agree = _strategy.DUAL_REQUIRES_LOCALS_AGREE at call time (line 110) and then uses it in if cloud_anchor and (locals_agree or not require_locals_agree):. This correctly gates the second clause. The test test_dual_requires_locals_agree_flag_is_consulted asserts the flag is True and that cloud-agrees-with-1-local-only still produces CONTESTED. This is the most security-load-bearing test in this suite — a silently-degraded dual mode is worse than a rejecting one.

_pair_agrees mitre-first short-circuit: The comment explains this is a cheap-path optimisation. Correct — it also happens to be the right semantic order (technique disagreement is the interpretation-level divergence; Jaccard is the corroboration-level check).

Finding: `_jaccard` duplication

_jaccard is copied verbatim from airgap_cross_engine.py. Both copies are correct and identical today. The comment in dual_lane_cross_engine.py explicitly acknowledges this: "Identical helper to airgap_cross_engine._jaccard; duplicated rather than imported across module boundaries to keep each strategy's consensus rule self-contained." This is an intentional design choice. Acceptable for W3.A.2. A follow-up refactor into a shared utility (e.g. verdict/verification/_consensus_utils.py) would eliminate drift risk at W4+ when more strategies may be added. Registering as a non-blocking observation.

Cloud-family boundary check

The cloud-slot guard checks cloud.family() != "claude". This is a hard-coded family prefix. It is the right check for the current model set (claude-opus-4-5, etc.) and matches the family() implementation. The same forward-compatibility note from #37 applies: a future Claude model named without a hyphen would return its full name from family() and pass. Low risk.

§3.7 commit discipline

Two commits, RED then GREEN, both carry [W3.A.2]. Subjects use test(verification): and feat(verification): — allowed. No --no-verify. No watermarks. The RED commit subject was truncated at the 72-char limit in the GitHub display (test(verification): DualLaneCrossEngine consensus invariants RED [W3.…), but the full message in the commit body is clean.

§3.8 dependencies

No new dependencies.

§3.10 no-mocks

No Mock / patch / replay library in the diff. verify() raises NotImplementedError at the network boundary.

Actionable items

(Non-blocking) Extract _jaccard to a shared module before W4 to prevent silent divergence if the threshold semantics change. A # NOTE: deliberately duplicated from airgap_cross_engine._jaccard comment (already present) is sufficient for now.
(Nit, non-blocking) The CONTESTED notes-builder has a potential empty-string edge: "; ".join(reason_parts) is "" if both branches somehow don't append — this is unreachable given the current logic (at least one of not cloud_anchor or cloud_anchor and require_locals_agree and not locals_agree must be true to reach the CONTESTED return), but a defensive assert reason_parts would make the invariant explicit and catch future edits that add new CONTESTED paths without reason messages.

Recommendation: APPROVE. Conjunctive rule is correctly implemented, DUAL_REQUIRES_LOCALS_AGREE is read live from the module-level constant, boundary refusals are correct, tests cover all spec rows. No mocks, clean commit hygiene.

TimothyVang added 2 commits May 2, 2026 08:18

This was referenced May 2, 2026

feat(verification): Universal Self-Consistency judge [W3.A.3] #47

Draft

feat(verification): AirGapCrossEngine [W3.A.1] #37

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42

feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42
TimothyVang wants to merge 2 commits into
feat/W3.A.1-airgap-cross-enginefrom
feat/W3.A.2-dual-lane-cross-engine

TimothyVang commented May 2, 2026

Uh oh!

TimothyVang commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TimothyVang commented May 2, 2026

Summary

Test plan

Uh oh!

TimothyVang commented May 2, 2026

Review — W3.A.2 DualLaneCrossEngine [automated reviewer, tier-1]

Consensus-logic correctness

Finding: _jaccard duplication

Cloud-family boundary check

§3.7 commit discipline

§3.8 dependencies

§3.10 no-mocks

Actionable items

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finding: `_jaccard` duplication