Skip to content

feat: composer workflow restructure and factory hardening#126

Merged
FrkAk merged 47 commits into
mainfrom
worktree-composer-workflow-restructure
Jun 19, 2026
Merged

feat: composer workflow restructure and factory hardening#126
FrkAk merged 47 commits into
mainfrom
worktree-composer-workflow-restructure

Conversation

@FrkAk

@FrkAk FrkAk commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Task Reference: [MYMR-237]

Reworks the Piyaz composer (Claude-Code-only) into an end-to-end task factory built on the Claude Code Workflow harness: a Piyaz task goes research → plan → implement → CI gate → review → bounded fix loop → opened PR, and, when the operator authorizes it at run start, merge and continue to the next task. The orchestrator is a thin loop; each task's phase sequencing runs in a deterministic per-task workflow off the main-loop context. Crash-safe recovery, a GitHub-feedback rework round-trip, slim per-phase rule extracts, and per-phase model selection keep it reliable and token-frugal.

Per-task Workflow harness (skills/composer/workflows/compose-task.js, new)

  • The orchestrator (skills/composer/SKILL.md) now owns only the interactive seams: pick the task, launch the per-task workflow, resolve gates, run the merge gate, propagate. The token-heavy phase sequencing moved into deterministic JS, so orchestration stops growing the main-loop context.
  • The workflow dispatches the four phase agents by agentType with per-phase model / effort / schema and worktree isolation on the implementer, runs the bounded fix loop (≤2 rotations), and returns one validated structured object the orchestrator branches on (DONE / NEEDS_DECISION / BLOCKED).
  • A cheap CI-gate stage (haiku) watches gh pr checks so no opus agent idles; review is dispatched with a verdict schema (no edit to the shared review.md).
  • args tolerate object-or-string delivery; projectId is threaded into every dispatch (the Piyaz MCP is stateless).

Merge gate (new authority)

  • A one-time run-start policy: never (default; HOTL owns the merge) / ask-each / auto-on-approve. Merge fires only on an approve verdict with green CI; on a clean merge composer runs gh pr merge --squash --delete-branch and writes status='done' — the one case the orchestrator writes a status transition, authorized by the operator's run-start choice. never preserves today's HOTL-owns-merge behavior.

Research hardening

  • Researcher self-verification pass (binary acceptance criteria, real file paths, grounded citations, confidence-gating); the planner acts as a foundation guard that routes a wrong or ungrounded task back to re-research instead of planning on it; the research model floor is raised to sonnet minimum (never haiku) — a mis-refined task wastes far more downstream opus tokens than a cheaper research model saves.

Skill, four phase agents, slim extracts

  • skills/composer/SKILL.md rewritten as a lean orchestrator: shared STATUS / result vocabulary, todo-anchored loop with digraph, structural stop conditions, red-flags table, run log, and recovery.
  • The four phase agents (composer-researcher, composer-planner, composer-implementer, and the shared review) load slim per-phase reference extracts under skills/composer/references/ instead of force-loading the full Piyaz specs (researcher spec context ~4,950 → ~2,650 words; reviewer ~6,500 → ~900). references/sources.json pins the canonical sources by hash.

Reliability, recovery, rework

  • Crash-safe append-only run log at .piyaz/composer-<project>.md; recovery after compaction via the workflow runId (resumeFromRunId) plus Piyaz status — Piyaz authoritative on status, the log on history and merge policy.
  • Implementer runs worktree-isolated (isolation: worktree); default-branch derivation replaces hardcoded main; merge-forward before PR and each fix rotation; branch-collision and foreign-commit handling; claim ownership with branch-evidence fallback.
  • Rework mode (/piyaz:composer rework <taskRef|pr-url>): reviewer-led intake fetches unresolved GitHub review threads (GraphQL, outdated-anchor re-location), re-verifies against HEAD, and feeds the fix loop with a fresh 2-rotation budget.
  • Flag-gated research-ahead pipelining (--pipelined): lookahead 1, 7-row brief-invalidation table, kill switch after two consecutive invalidations.
  • Estimate-based model selection: planner and reviewer always opus; opus-forcing guardrails for security/safety/compliance tags, large or missing estimates, fix rotations, retries, urgent priority, and risk flags.

App and tooling

  • App fix: lib/context/format.ts now renders acceptance-criterion ids, closing a bug where the researcher's documented by-id AC rewrite appended duplicates (TDD'd in tests/context/format.test.ts; 5 golden snapshots regenerated).
  • scripts/check-plugins.ts gains two CI gates: @-include target resolution across every plugin, and canonical hash pins for the composer extracts (any edit to a pinned Piyaz reference fails CI until the extracts are reviewed and the pin refreshed via bun run sync:plugins).
  • biome.jsonc excludes plugins/**/workflows/**: the harness script uses top-level return/await, legal only under the harness async wrapper, which biome cannot parse. Local-formatter-only; no packaging effect.
  • Platform mirrors (codex / cursor / antigravity) synced for the review skill, the reviewer-rules extract, and the Piyaz references.
  • Plugin version bumped to 0.1.1.

Type of change

  • New feature
  • Bug fix
  • Refactor / cleanup

Testing

  • bun run check:plugins (mirror sync, @-include resolution, extract pins), bun run format:check, and bun run check:version pass.
  • Workflow script syntax validated under the harness async wrapper.
  • lib/context/format.ts AC-id fix is TDD-backed (tests/context/format.test.ts); 5 context snapshots regenerated (bun test tests/context).
  • Live dispatch test: ran compose-task.js via the Workflow tool against a throwaway draft task (plannableOnly, research → plan); confirmed DONE / outcome=planned and the draft → planned write landing in Piyaz. The runs surfaced and fixed three real defects (args-as-string, missing projectId, haiku research too weak for the role).

The 20-scenario regression suite (tests/plugins/composer-scenarios.md) was rewritten to the workflow architecture and cross-checked against the implementation (0 contradictions in a three-lens audit).

Notes for reviewer

  • The implement → CI → review → fix → merge stages are validated by static reasoning plus the shared agent() dispatch primitive (proven by the research → plan live run), not yet by a full end-to-end run — that is deferred to a real project. Review those stages from the code.
  • compose-task.js ships to users: the plugin installs via git-subdir (the whole plugins/claude-code subdir), the file is git-tracked, and plugin.json has no files allowlist; it resolves at ${CLAUDE_PLUGIN_ROOT}/skills/composer/workflows/compose-task.js.
  • The merge-marks-done authority is the only orchestrator status write; it is gated on an approve verdict, green CI, and an authorizing run-start policy.
  • Deliberate future work: decompose-* agents still force-load full specs; the rework GraphQL intake reads the first 100 review threads (no pagination yet); a caller-id MCP surface for assigneeIds-based claims.

@FrkAk FrkAk changed the title feat: restructure composer as workflow with slim agent extracts feat: composer workflow restructure and factory hardening Jun 12, 2026
Comment thread scripts/check-plugins.ts Fixed
@FrkAk FrkAk self-assigned this Jun 12, 2026
@FrkAk FrkAk merged commit 4f93fba into main Jun 19, 2026
4 checks passed
@FrkAk FrkAk deleted the worktree-composer-workflow-restructure branch June 19, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants