Skip to content

fix(orchestration): PlanVerifier timeout arms increment consecutive_failures but skip the >= 3 escalation check #3868

@bug-ops

Description

@bug-ops

Description

PR #3860 added timeout arms to PlanVerifier::verify, verify_plan, and replan/replan_from_plan. The timeout arms correctly increment self.consecutive_failures (fail-open policy), but they do not include the >= 3 consecutive failures escalation check that the Ok(Err(e)) arms have.

This means a misconfigured or consistently overloaded verify_provider that always times out will silently fail-open for ever — an operator will see repeated warn! entries but no error! escalation. The consecutive counter exists precisely to surface this condition.

Reproduction Steps

  1. Configure verify_provider to a slow provider where every call exceeds verifier_timeout_secs.
  2. Execute a plan with 3+ tasks.
  3. Observe logs: only warn! for each task, never error! despite 3+ consecutive failures.

Expected Behavior

After 3+ consecutive timeout-or-error failures the verifier should emit an error! log (same as the existing Ok(Err(e)) path) advising the operator to check verify_provider configuration.

Actual Behavior

Timeout arms increment consecutive_failures but never branch on the >= 3 threshold; only LLM errors trigger the escalation.

Affected Code

crates/zeph-orchestration/src/verifier.rs lines ~181–188 (verify) and ~319–326 (verify_plan).

Environment

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions