Skip to content

feat(graph): planner_critique_node wired + critique_verdict ledger event [W2.D.2]#45

Open
TimothyVang wants to merge 2 commits into
feat/W2.D.1-planner-critique-covefrom
feat/W2.D.2-critique-verdict-schema-ledger
Open

feat(graph): planner_critique_node wired + critique_verdict ledger event [W2.D.2]#45
TimothyVang wants to merge 2 commits into
feat/W2.D.1-planner-critique-covefrom
feat/W2.D.2-critique-verdict-schema-ledger

Conversation

@TimothyVang

Copy link
Copy Markdown
Owner

Summary

  • Adds PlannerCritiqueVerdict schema invariant: route="planner" + failed_questions=[] raises ValidationError (loopback with no reason = infinite loop, W2.D.2 load-bearing).
  • Adds LedgerEntry schema (verdict/schemas/ledger.py) with all 13 event types including "critique_verdict", three-tier ID hierarchy, NIST SP 800-86 exam-env metadata.
  • Adds InMemoryLedger (verdict/ledger/memory.py) — real Ledger Protocol implementation (not a mock, §3.10 compliant) backed by a list with blake3 derive_key HMAC chain.
  • Adds planner_critique_node(plan, ledger) (verdict/graph/nodes.py) — writes critique_verdict LedgerEntry with route + all_questions + failed_questions + hint in payload.

Test plan

  • pytest tests/planning/test_planner_critique_verdict.py — 8 tests GREEN
  • All 17 planning tests GREEN (W2.D.1 + W2.D.2)
  • ruff check verdict/ tests/planning/ — clean
  • RED commit precedes GREEN commit

TimothyVang added 2 commits May 2, 2026 08:20
…t RED [W2.D.2]

Failing tests per BUILD_PLAN W2.D.2.a:
- test_schema_rejects_loopback_with_empty_failed_questions: route="planner"
  + failed_questions=[] must raise ValidationError (W2.D.2 invariant)
- test_schema_accepts_loopback_with_non_empty_failed_questions: valid path
- test_schema_accepts_advance_with_empty_failed_questions: comprehension_gate OK
- test_schema_rejects_unknown_route: Literal enforcement
- test_planner_critique_node_emits_critique_verdict_event_on_pass: ledger
  entry event_type="critique_verdict" after node runs on valid plan
- test_planner_critique_node_emits_critique_verdict_event_on_fail: same for
  loopback case with failed_questions in payload
- test_planner_critique_node_sets_case_id_in_ledger_entry
- test_planner_critique_node_returns_plandner_critique_verdict

Currently RED: verdict.graph.nodes + verdict.ledger.memory missing.
…ion_gate [W2.D.2]

Adds:
- verdict/schemas/ledger.py: LedgerEntry with EventType (13 types including
  "critique_verdict"), three-tier ID hierarchy, HMAC chain fields,
  NIST SP 800-86 exam-env metadata (W1.B.11 from branch)
- verdict/ledger/memory.py: InMemoryLedger — real Ledger Protocol impl
  backed by a list; blake3 derive_key HMAC; full LedgerEntry construction
  (not a mock; §3.10 compliant)
- verdict/ledger/Ledger Protocol (structural, runtime_checkable)
- verdict/graph/nodes.py: planner_critique_node(plan, ledger) — runs
  critique_plan, writes critique_verdict LedgerEntry with route +
  all_questions + failed_questions + hint in payload

Schema invariant (W2.D.2): PlannerCritiqueVerdict(route="planner",
failed_questions=[]) raises ValidationError — loopback with no reason
is an infinite loop.

All 17 planning tests GREEN; ruff clean.
@TimothyVang

Copy link
Copy Markdown
Owner Author

W2.D.2 — PlannerCritiqueVerdict schema + critique_verdict ledger event

CI gate: 17/17 GREEN (9 from W2.D.1 + 8 new). Ruff: clean.

PASS — hard rules

Check Status
PlannerCritiqueVerdict(route="planner", failed_questions=[]) raises ValidationError PASS
PlannerCritiqueVerdict(route="comprehension_gate", failed_questions=[]) valid PASS
Unknown route value raises ValidationError PASS
LedgerEntry schema with all 13 EventType values PASS
13 event types confirmed: case_init, tool_call, finding, approval, rejection, mode_lock, comprehension_check, critique_verdict, pivot, exhausted_replan, evidence_hash_recheck, sandbox_failure, planner_cot PASS
planner_critique_node writes critique_verdict LedgerEntry with route, all_questions, failed_questions, hint in payload PASS
LedgerEntry.case_id matches plan.case_id PASS
InMemoryLedger uses blake3 derive_key_context for HMAC chain, not unittest.mock PASS
§3.7 RED 0428f0b precedes GREEN 71519dc, both carry [W2.D.2] PASS
§3.10 InMemoryLedger is a real Ledger Protocol implementation, not a mock PASS

ISSUE — planner_critique_node hardcodes mode="cloud" in ledger write

verdict/graph/nodes.py line:

active_ledger.write(
    ...
    mode="cloud",  # mode is locked at case_init; W3 wires the real mode
)

The comment acknowledges this is deferred. However, a ledger entry written with the wrong mode violates §3.4 mode lock: LedgerEntry.mode_at_case_init is supposed to be the mode from case_init, not hardcoded. When the W3 wiring lands, this will silently corrupt air-gap and dual ledger chains if the fixup isn't tracked. The planner_critique_node signature should accept a mode: Mode parameter now and propagate it, even if the caller passes the same hardcoded default:

def planner_critique_node(
    plan: InvestigationPlan,
    *,
    ledger: Ledger | None = None,
    config: CritiqueConfig | None = None,
    mode: Mode = "cloud",   # injected by graph state at case_init
) -> PlannerCritiqueVerdict:

This is non-blocking for W2.D.2 merge if a follow-up task is logged, but the signature change has zero risk and should be done now.

MINOR — _make_entry_id uses SHA-256 prefix (16 hex chars = 8 bytes) vs ULID

The docstring correctly notes "Production ledger uses ULID." The 16-hex-char SHA-256 prefix is collision-safe for test volumes but the comment should say "tests only; production uses ULID from python-ulid or ulid2" so no future contributor copies it into JournalLedger.

COSMETIC — Function name typo in test

test_planner_critique_node_returns_plandner_critique_verdictplandner should be planner. The test is collected and passes correctly; this is a readability issue only.

CARRY-OVER from #44EVTX_4624 and USNJRNL still absent from ArtifactClass

Not introduced by this PR, but not fixed either. Must be resolved before the caveat-validator PRs (W1.B.9/W1.B.10) merge.

Non-blocking overall if (a) mode parameter follow-up is logged and (b) the plandner typo is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant