Skip to content

Verifier FAIL verdicts don't block spec completion or trigger corrective action #517

@mickume

Description

@mickume

Problem

The verifier archetype produces PASS/FAIL verdicts against spec requirements, but the engine treats verifier session completion as unconditional success. A spec can have all its verifier verdicts set to FAIL and still be marked completed with no corrective action taken.

Verifier nodes are currently decorative — they produce a report stored in verification_results, but the engine cannot react to the content.

Observed Impact

During a parking-fee-service orchestration run (9 specs, 83 plan nodes, $160 total spend):

04_cloud_gateway_client completed all 10 groups plus its verifier node (04_cloud_gateway_client:10). The verifier produced 6 FAIL and 18 PASS verdicts. All 6 failures share the same root cause: the Rust service writes logs to stdout via tracing_subscriber, but smoke/integration tests assert on stderr.

Despite these failures, the engine:

  • Marked the verifier node status as completed with reason "session completed successfully"
  • Marked the entire spec as done
  • Took no corrective action (no retry, no blocking, no issue filed)
  • Spent $23.40 on 11 sessions for this spec with the final verification proving 25% of requirements are unmet

Similarly, 01_project_setup had 3 FAIL verdicts (phantom test files claimed in tasks.md but never created), also with no engine response.

Root Cause

In result_handler.py, blocking evaluation explicitly excludes the verifier archetype:

# result_handler.py ~line 123
elif archetype not in ("skeptic", "oracle"):
    return BlockDecision(should_block=False)

Only skeptic and oracle trigger evaluate_review_blocking(). The verifier is classified as a review archetype for knowledge extraction (it's in _REVIEW_ARCHETYPES in session_lifecycle.py), but is excluded from the blocking code path.

The _handle_success() method (line ~448-487) checks session status (did the process run?) but never verdict content (did the requirements pass?). There is no bridge from verification_results FAIL rows back into the scheduling/blocking machinery.

Call chain on verifier completion

_handle_success()
  → mark_completed(node_id)        # unconditional
  → check_skeptic_blocking()       # only for skeptic/oracle
  → (nothing for verifier)

Supporting evidence from the database

-- blocking_history only has reviewer entries, never verifier
SELECT DISTINCT archetype FROM blocking_history;
-- → 'reviewer'

-- All verifier nodes marked completed regardless of verdicts
SELECT id, status FROM plan_nodes WHERE archetype = 'verifier';
-- → all 'completed'

-- But verification_results has unaddressed FAILs
SELECT spec_name, verdict, count(*) FROM verification_results
WHERE superseded_by IS NULL GROUP BY spec_name, verdict;
-- 04_cloud_gateway_client: FAIL=6, PASS=18
-- 01_project_setup: FAIL=3, PASS=7

Unimplemented Spec Requirements (26-REQ-9)

Requirement Expected Behavior Current State
26-REQ-9.2 FAIL verdict → file GitHub issue Not implemented
26-REQ-9.3 FAIL verdict + retry_predecessor=true → reset coder node for retry Only triggers on session crash, not on FAIL verdicts
26-REQ-9.4 Retry cycle limit enforcement Depends on 9.3, inert

Suggested Fix

After a verifier session completes in _handle_success():

  1. Query the freshly-persisted verification_results for FAIL verdicts
  2. If any FAIL verdicts exist and the predecessor coder node has retry_predecessor=true and retries remaining: reset the coder node to pending and mark the verifier back to pending
  3. If retries are exhausted or retry_predecessor=false: record a blocking decision in blocking_history and (per 26-REQ-9.2) file a GitHub issue summarizing the unmet requirements
  4. Only mark the verifier node as completed when all verdicts are PASS, or when the retry/issue-filing path has been executed

This keeps the existing blocking infrastructure intact and extends it to cover verifier verdicts alongside skeptic/oracle findings.

Key Files

  • agent_fox/engine/result_handler.py — blocking exclusion at ~line 123, success handling at ~line 448
  • agent_fox/engine/review_persistence.py — verdict parsing and storage, audit logging without action
  • agent_fox/engine/blocking_history.py — only tracks skeptic/oracle decisions
  • agent_fox/session/session_lifecycle.py — verifier in _REVIEW_ARCHETYPES but not in blocking path

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions