Note: This issue was created by an agent (Claude) during a codebase review against multi-agent system design literature.
Problem 1: Templates decompose by role, not by context
Built-in templates (e.g. feature-dev, review-loop) split work by job title — planner, developer, reviewer. The downstream agents don't carry the upstream agent's reasoning, only its output string.
In practice: the reviewer in feature-dev sees what was implemented but not why. If the implementer deviated from the plan, the reviewer cannot detect it. The tester has no idea what the plan said. Context is lost at every handoff.
This is the core failure mode described in multi-agent design literature: role-based decomposition creates invisible context gaps. The right decomposition is by what information a step actually needs, not by what role is executing it.
What fixing this enables: reviewers and downstream steps that can reason about intent vs. outcome, not just output vs. output.
Problem 2: review-loop loop is a logical fiction
address-feedback has retries: 2, maxIterations: 3 — but these only re-run address-feedback in isolation. The reviewers run exactly once. After address-feedback makes changes, no re-review happens.
The template is structured as an evaluator-optimizer loop but executes as a single-pass pipeline. There is no primitive in WorkflowStep to express "re-run this group of steps until a condition is met."
What fixing this enables: a genuine quality-convergence loop — implement, review, fix, re-review — that terminates when reviewers pass, not when an iteration count expires.
Problem 1: Templates decompose by role, not by context
Built-in templates (e.g.
feature-dev,review-loop) split work by job title — planner, developer, reviewer. The downstream agents don't carry the upstream agent's reasoning, only its output string.In practice: the reviewer in
feature-devsees what was implemented but not why. If the implementer deviated from the plan, the reviewer cannot detect it. The tester has no idea what the plan said. Context is lost at every handoff.This is the core failure mode described in multi-agent design literature: role-based decomposition creates invisible context gaps. The right decomposition is by what information a step actually needs, not by what role is executing it.
What fixing this enables: reviewers and downstream steps that can reason about intent vs. outcome, not just output vs. output.
Problem 2:
review-looploop is a logical fictionaddress-feedbackhasretries: 2, maxIterations: 3— but these only re-runaddress-feedbackin isolation. The reviewers run exactly once. Afteraddress-feedbackmakes changes, no re-review happens.The template is structured as an evaluator-optimizer loop but executes as a single-pass pipeline. There is no primitive in
WorkflowStepto express "re-run this group of steps until a condition is met."What fixing this enables: a genuine quality-convergence loop — implement, review, fix, re-review — that terminates when reviewers pass, not when an iteration count expires.