Skip to content

feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42

Draft
TimothyVang wants to merge 2 commits into
feat/W3.A.1-airgap-cross-enginefrom
feat/W3.A.2-dual-lane-cross-engine
Draft

feat(verification): DualLaneCrossEngine three-way verification [W3.A.2]#42
TimothyVang wants to merge 2 commits into
feat/W3.A.1-airgap-cross-enginefrom
feat/W3.A.2-dual-lane-cross-engine

Conversation

@TimothyVang

Copy link
Copy Markdown
Owner

Summary

W3.A.2 implementation of DualLaneCrossEngine — dual-mode quorum strategy
that runs THREE engines in parallel (cloud Claude, Qwen3, GLM-4.5-Air)
and accepts the finding only under the conjunctive rule:

cloud agrees with >=1 local  AND  locals agree with each other.

Both clauses are required (CLAUDE.md §8 / ARCHITECTURE.md §1 rows 6-8).

  • compute_verdict(cloud=, qwen=, glm=) is the pure consensus surface.
    • cloud agrees with >=1 local AND locals agree -> VETTED_DUAL
    • cloud disagrees with both locals -> CONTESTED
    • cloud agrees with 1 local but locals disagree -> CONTESTED
  • "Agree" predicate is identical to air-gap: same mitre_technique AND
    Jaccard(artifact_paths) >= AIRGAP_JACCARD_THRESHOLD. Reuses the
    threshold so dual mode = "air-gap quorum on each pair + cloud-anchor".
  • DUAL_REQUIRES_LOCALS_AGREE flag is read from strategy.py per call.
    Load-bearing: flipping to False would silently weaken dual to
    "cloud + cherry-picked local".
  • Empty-set rule carries: empty artifact_paths from any participant
    is DISAGREEMENT for every pair involving it.
  • Boundary refusals (raise ValueError):
    • cloud slot must be a cloud-family output
    • local pair must be CROSS-engine (refuses qwen vs qwen)
  • VerdictResult.notes records all three engines on vetted and
    clause-failure reason on contested for ledger audit + replan_node
    disagreement-type routing.

verify(...) raises NotImplementedError until W2.B wires the cloud
Claude client + two SGLang clients + ledger plumbing. CLAUDE.md §3.10
explicitly permits this backend-level stub: the consensus logic is
real and exercised by 15 unit tests against EngineOutput records.

Builds on PR #37 (feat/W3.A.1-airgap-cross-engine) — base set to
that branch so this PR's diff shows only the W3.A.2 surface (one new
file: verdict/verification/dual_lane_cross_engine.py + its test).

Test plan

  • tests/verification/test_dual_lane_cross_engine.py — 15 tests pass
  • Full verification suite green (45/45)
  • ruff check verdict/ tests/ clean
  • RED commit lands first, GREEN commit lands second; per-task-ID
    commit subjects with [W3.A.2] per CLAUDE.md §3.7
  • After W2.B: replace NotImplementedError in verify() with real
    transport that calls compute_verdict(...) with three rendered
    EngineOutput records

TimothyVang added 2 commits May 2, 2026 08:18
…A.2]

Failing tests for the dual-mode three-way verification strategy
(ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8):

- cloud agrees with >=1 local AND locals agree with each other ->
  VETTED_DUAL
- cloud disagrees with both locals -> CONTESTED
- cloud agrees with 1 local but locals disagree -> CONTESTED
  (DUAL_REQUIRES_LOCALS_AGREE is the load-bearing second clause; without
  it dual-mode silently degrades to "cloud + cherry-picked local")
- mitre-divergence variants (locals-divergent and cloud-divergent both
  -> CONTESTED)
- empty-set rule carries: empty cloud blocks cloud-vs-local agreement;
  empty local blocks the locals-agree clause -> CONTESTED in both cases
- DUAL_REQUIRES_LOCALS_AGREE flag MUST be consulted (no embedded True)
- engine-family distinctness on the local pair: two Qwen3 locals
  collapses locals-agree to self-consistency and is rejected
- the cloud slot must be a cloud-family output (defensive boundary
  refusal at the dispatch layer)
- VerdictResult.notes records all three engines (vetted) and the
  clause-failure reason (contested) for ledger audit + replan_node
  routing

The pure consensus surface is compute_verdict(cloud, qwen, glm) on
EngineOutput records (shared with W3.A.1). verify(...) raises
NotImplementedError until W2.B wires the cloud Claude client + two
SGLang clients (CLAUDE.md §3.10 backend stub permission).

Module under test does not yet exist; collection ERRORs with
ModuleNotFoundError. GREEN follows in the next commit.
Implements the dual-mode quorum strategy
(ARCHITECTURE.md §1 quorum-dispatch table rows 6-8; CLAUDE.md §8):

- compute_verdict(cloud, qwen, glm) is the pure consensus surface.
  - cloud agrees with >=1 local AND locals agree with each other ->
    VETTED_DUAL.
  - cloud disagrees with both locals -> CONTESTED.
  - cloud agrees with 1 local but locals disagree -> CONTESTED.
- "Agree" predicate is identical to air-gap mode: identical
  mitre_technique AND Jaccard(artifact_paths) >=
  AIRGAP_JACCARD_THRESHOLD (read from strategy.py — never hard-coded).
  Reusing the threshold means dual mode is "air-gap quorum on each
  pair plus a cloud-anchor requirement" — one knob, one place.
- DUAL_REQUIRES_LOCALS_AGREE flag is read from strategy.py per call
  (load-bearing — flipping it to False would silently weaken
  dual-mode to "cloud + cherry-picked local" and discard the
  locals-agree clause CLAUDE.md §8 names).
- Empty-set rule (ARCHITECTURE.md §1) carries: any participant with
  empty artifact_paths is DISAGREEMENT for every pair involving it.
  Empty cloud blocks cloud-anchor; empty local blocks locals-agree.
- Boundary refusals at the dispatch layer:
  - cloud slot must be a cloud-family output (defensive — a
    misrouted Qwen3 in the cloud slot would silently mislabel
    VETTED_DUAL).
  - local pair must be CROSS-engine (two Qwen3 collapses
    locals-agree to self-consistency and breaks the air-gap
    independence guarantee that dual mode inherits).
  Both raise ValueError so programming errors surface immediately
  rather than producing an untrustworthy verdict.
- VerdictResult.notes records all three engines on vetted, and the
  clause-failure reason on contested, so the ledger has the full
  audit handle and replan_node can route on disagreement type.

verify(...) raises NotImplementedError until W2.B wires the cloud
Claude client + the two SGLang clients + ledger plumbing
(CLAUDE.md §3.10 explicitly permits this backend-level stub: the
consensus logic is real and exercised via compute_verdict; the
forbidden pattern is mocking *internal* logic).

15/15 tests in tests/verification/test_dual_lane_cross_engine.py pass;
45/45 in the full verification suite pass; ruff clean.
@TimothyVang

Copy link
Copy Markdown
Owner Author

Review — W3.A.2 DualLaneCrossEngine [automated reviewer, tier-1]

CI result (local run on feat/W3.A.2-dual-lane-cross-engine): 45/45 tests pass (30 inherited from #37 + 15 new). ruff check verdict/verification/ clean.


Consensus-logic correctness

All spec invariants pass:

Invariant Location Verdict
VETTED_DUAL iff cloud-anchor AND locals-agree compute_verdict lines 107–126 PASS
cloud disagrees with both locals → CONTESTED lines 128–148 PASS
cloud agrees with 1 local but locals disagree → CONTESTED lines 128–148 PASS
DUAL_REQUIRES_LOCALS_AGREE read from _strategy per call, not embedded line 110 PASS
Empty artifact_paths from any participant → DISAGREEMENT for every pair involving it _pair_agrees lines 54–58 PASS
Cloud slot must be claude-family, raises ValueError otherwise lines 86–95 PASS
Local pair must be cross-family, raises ValueError on same-family lines 96–106 PASS
VerdictResult.notes records all three engines on VETTED, clause-failure reason on CONTESTED lines 116–126, 130–148 PASS

Spec compliance on the DUAL_REQUIRES_LOCALS_AGREE gate: The implementation reads require_locals_agree = _strategy.DUAL_REQUIRES_LOCALS_AGREE at call time (line 110) and then uses it in if cloud_anchor and (locals_agree or not require_locals_agree):. This correctly gates the second clause. The test test_dual_requires_locals_agree_flag_is_consulted asserts the flag is True and that cloud-agrees-with-1-local-only still produces CONTESTED. This is the most security-load-bearing test in this suite — a silently-degraded dual mode is worse than a rejecting one.

_pair_agrees mitre-first short-circuit: The comment explains this is a cheap-path optimisation. Correct — it also happens to be the right semantic order (technique disagreement is the interpretation-level divergence; Jaccard is the corroboration-level check).

Finding: _jaccard duplication

_jaccard is copied verbatim from airgap_cross_engine.py. Both copies are correct and identical today. The comment in dual_lane_cross_engine.py explicitly acknowledges this: "Identical helper to airgap_cross_engine._jaccard; duplicated rather than imported across module boundaries to keep each strategy's consensus rule self-contained." This is an intentional design choice. Acceptable for W3.A.2. A follow-up refactor into a shared utility (e.g. verdict/verification/_consensus_utils.py) would eliminate drift risk at W4+ when more strategies may be added. Registering as a non-blocking observation.

Cloud-family boundary check

The cloud-slot guard checks cloud.family() != "claude". This is a hard-coded family prefix. It is the right check for the current model set (claude-opus-4-5, etc.) and matches the family() implementation. The same forward-compatibility note from #37 applies: a future Claude model named without a hyphen would return its full name from family() and pass. Low risk.

§3.7 commit discipline

Two commits, RED then GREEN, both carry [W3.A.2]. Subjects use test(verification): and feat(verification): — allowed. No --no-verify. No watermarks. The RED commit subject was truncated at the 72-char limit in the GitHub display (test(verification): DualLaneCrossEngine consensus invariants RED [W3.…), but the full message in the commit body is clean.

§3.8 dependencies

No new dependencies.

§3.10 no-mocks

No Mock / patch / replay library in the diff. verify() raises NotImplementedError at the network boundary.

Actionable items

  1. (Non-blocking) Extract _jaccard to a shared module before W4 to prevent silent divergence if the threshold semantics change. A # NOTE: deliberately duplicated from airgap_cross_engine._jaccard comment (already present) is sufficient for now.
  2. (Nit, non-blocking) The CONTESTED notes-builder has a potential empty-string edge: "; ".join(reason_parts) is "" if both branches somehow don't append — this is unreachable given the current logic (at least one of not cloud_anchor or cloud_anchor and require_locals_agree and not locals_agree must be true to reach the CONTESTED return), but a defensive assert reason_parts would make the invariant explicit and catch future edits that add new CONTESTED paths without reason messages.

Recommendation: APPROVE. Conjunctive rule is correctly implemented, DUAL_REQUIRES_LOCALS_AGREE is read live from the module-level constant, boundary refusals are correct, tests cover all spec rows. No mocks, clean commit hygiene.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant