Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
63b7e91
feat: add researcher rules extract for composer
FrkAk Jun 12, 2026
726098c
feat: add planner rules extract for composer
FrkAk Jun 12, 2026
fe7d3cc
feat: add implementer rules extract for composer
FrkAk Jun 12, 2026
dc6edc6
feat: add reviewer rules extract for composer
FrkAk Jun 12, 2026
7fcf1a5
docs: point canonical mymir refs at composer extracts
FrkAk Jun 12, 2026
20d6e92
refactor: researcher loads slim extract and returns status
FrkAk Jun 12, 2026
ac18d89
refactor: planner loads slim extract and returns status
FrkAk Jun 12, 2026
045595e
feat: add fix mode and status line to composer implementer
FrkAk Jun 12, 2026
8208b41
refactor: review agent loads slim extract and returns status
FrkAk Jun 12, 2026
07b940d
feat: restructure composer skill as workflow loop
FrkAk Jun 12, 2026
a3c8017
docs: update mymir skill for composer structural stops
FrkAk Jun 12, 2026
4e3c51d
fix: close composer loophole found in pressure test
FrkAk Jun 12, 2026
9b5e173
fix: end composer iteration after planning plannable picks
FrkAk Jun 12, 2026
7f66d4e
fix: add transport stop and claimed-task entry rules to composer
FrkAk Jun 12, 2026
99333e6
fix: add recovery, retry, headless, and propagation rules
FrkAk Jun 12, 2026
0f946cc
fix: harden implementer against env failures and foreign edits
FrkAk Jun 12, 2026
345f8a7
fix: reviewer env failures and working-depth doc drift
FrkAk Jun 12, 2026
72a8849
feat: isolate composer implementer in a git worktree
FrkAk Jun 12, 2026
e839681
feat: derive default branch and handle branch collisions
FrkAk Jun 12, 2026
1e67696
feat: merge default branch forward before pr and fix rotations
FrkAk Jun 12, 2026
f65bebd
feat: add claim ownership semantics to composer implementer
FrkAk Jun 12, 2026
2b997fe
feat: gate composer review dispatch on pr checks
FrkAk Jun 12, 2026
22dc083
feat: add crash-safe run log to composer bootstrap
FrkAk Jun 12, 2026
7c0fe21
feat: recover composer state from the run log
FrkAk Jun 12, 2026
a56a76c
feat: add rework mode to composer skill
FrkAk Jun 12, 2026
7fd749c
feat: add rework intake mode to review agent
FrkAk Jun 12, 2026
b7efd3d
feat: extend implementer fix mode for rework dispatches
FrkAk Jun 12, 2026
bb2be52
refactor: single agent-depth fetch for composer researcher
FrkAk Jun 12, 2026
d51de0f
fix: render acceptance-criterion ids in context bundles
FrkAk Jun 12, 2026
0324c7e
feat: add estimate-based model selection to composer dispatches
FrkAk Jun 12, 2026
e231751
test: add composer regression scenario suite
FrkAk Jun 12, 2026
e89e023
feat: add flag-gated research-ahead pipelining to composer
FrkAk Jun 12, 2026
1acde66
fix: add plannable-exit red flag from scenario suite
FrkAk Jun 12, 2026
173414d
fix: guard implement step against plannable-only picks
FrkAk Jun 12, 2026
4426360
chore: sync platform plugin mirrors
FrkAk Jun 12, 2026
97c9c15
fix: address composer code-review findings in skill and agents
FrkAk Jun 12, 2026
ec47fa0
feat: gate plugin includes and composer extract pins in sync
FrkAk Jun 12, 2026
4bf9001
fix: format sync script and remove toctou race in pin check
FrkAk Jun 12, 2026
74eb69b
fix: address fresh-eyes composer review findings
FrkAk Jun 12, 2026
cf3fcb9
chore: sync platform plugin mirrors
FrkAk Jun 12, 2026
83d4386
chore: bump plugin version to 1.9.0
FrkAk Jun 12, 2026
412dfbe
fix: align composer plugin with hotl gate and slim rule extracts
FrkAk Jun 12, 2026
5d92076
Merge branch 'main' into worktree-composer-workflow-restructure
FrkAk Jun 12, 2026
bbcb046
Merge branch 'main' into worktree-composer-workflow-restructure
FrkAk Jun 19, 2026
a31bdb4
feat: restructure composer onto per-task workflow with merge gate
FrkAk Jun 19, 2026
cda705b
fix: thread projectId into composer dispatches and harden args
FrkAk Jun 19, 2026
cccb444
chore: bump plugin version to 0.1.1
FrkAk Jun 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,8 @@ In Codex, Cursor, and Antigravity each workflow is a skill invoked by slash comm

| Component | What it does |
| --- | --- |
| **`/piyaz:composer` skill** | End-to-end task orchestrator. Picks the highest-value ready task (or one named ref), drives it through research → plan → implement → propagate via three dispatched subagents per task in clean per-phase contexts, loops until queue empty or user stops. Requires `/goal` harness for backlog mode (composer emits it on first turn; user pastes). |
| **Composer subagents** | `piyaz:composer-researcher` gathers grounded context and refines the task; `piyaz:composer-planner` writes the unabridged implementation plan; `piyaz:composer-implementer` ships the code, opens a PR, and marks the task done. |
| **`/piyaz:composer` skill** | End-to-end task orchestrator. Picks the highest-value ready task (or one named ref), drives it through research → plan → implement → review → propagate via a per-task workflow that dispatches phase subagents in clean per-phase contexts, merges the PR and continues when the user authorizes it, and loops until queue empty or user stops. Requires `/goal` harness for backlog mode (composer emits it on first turn; user pastes). |
| **Composer subagents** | `piyaz:composer-researcher` gathers grounded context and refines the task; `piyaz:composer-planner` writes the unabridged implementation plan; `piyaz:composer-implementer` ships the code, opens a PR, and marks the task `in_review`; `piyaz:review` returns the verdict that drives the bounded fix loop. |
| **`piyaz:decompose-task` agent** | Splits an existing oversize task in an active project into 2 to N children, rewires every dependency edge touching the parent, cancels the parent with rationale citing the children. Composer's oversize handler routes here. |
| **`piyaz:decompose-feature` agent** | Adds a new feature or capability cluster to an active project. Reuses existing categories and tag vocabulary; creates 5 to 20 tasks plus internal and integration edges. |

Expand Down Expand Up @@ -169,7 +169,7 @@ Piyaz ships as a Next.js web app plus vendor-native plugins for Claude Code, Cod
❯ Priority is urgent, draft ACs are enough, and monorepo detection should ask the user.
```

**Drive end-to-end (Claude Code).** Once a project is active and tasks are ready, composer can take over. Pick the next task off the critical path, research it in context, plan it, implement it, open the PR, propagate the result, and loop:
**Drive end-to-end (Claude Code).** Once a project is active and tasks are ready, composer can take over. Pick the next task off the critical path, research it in context, plan it, implement it, open the PR, review and fix until it is ready, propagate the result (and merge when you authorize it), and loop:

```text
❯ /piyaz:composer
Expand All @@ -181,7 +181,7 @@ Or take one specific task all the way to a PR:
❯ /piyaz:composer PYZ-101
```

Composer dispatches three subagents per task in clean per-phase contexts (researcher → planner → implementer). The orchestrator stays out of the work itself and only picks tasks, hands off, and propagates.
Composer runs a per-task workflow that dispatches phase subagents in clean per-phase contexts (researcher → planner → implementer → review), with a bounded fix loop until the PR is ready. The orchestrator stays out of the work itself: it picks tasks, resolves gates, merges when authorized, and propagates.

**Tune in the UI.** Inspect edges, read execution records, and edit descriptions, ACs, tags, or dependencies directly. The agent loop and the UI write to the same store, so edits land by the next tool call.

Expand Down
3 changes: 2 additions & 1 deletion biome.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@
"!cloudflare-env.d.ts",
"!bun.lock",
"!migrations/**",
"!drizzle/**"
"!drizzle/**",
"!plugins/**/workflows/**"
]
},
"formatter": {
Expand Down
9 changes: 7 additions & 2 deletions lib/context/format.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ export function formatDecisions(decisions: Decision[]): string {
* - all checked: "All criteria met:" label followed by the checked list
* - mixed: "Remaining:" section first (primacy for pending work), then "Done:"
*
* Each line carries the criterion's backticked id so agents can target the
* documented by-id rewrite (`acceptanceCriteria=[{id, text}]`) without
* appending duplicates.
*
* @param criteria - Array of acceptance criteria.
* @returns Formatted checklist string, possibly grouped by checked state.
*/
Expand All @@ -97,8 +101,9 @@ export function formatCriteria(criteria: AcceptanceCriterion[]): string {
const remaining = criteria.filter((c) => !c.checked);
const done = criteria.filter((c) => c.checked);
const renderRemaining = () =>
remaining.map((c) => `- [ ] ${c.text}`).join("\n");
const renderDone = () => done.map((c) => `- [x] ${c.text}`).join("\n");
remaining.map((c) => `- [ ] \`${c.id}\` ${c.text}`).join("\n");
const renderDone = () =>
done.map((c) => `- [x] \`${c.id}\` ${c.text}`).join("\n");

if (done.length === 0) return renderRemaining();
if (remaining.length === 0) return `All criteria met:\n${renderDone()}`;
Expand Down
2 changes: 1 addition & 1 deletion lib/mcp/create-server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ export function createMcpServer(ctx: AuthContext): McpServer {
{
name: "piyaz",
title: "Piyaz",
version: "0.1.0",
version: "0.1.1",
websiteUrl: "https://www.piyaz.ai",
icons: [
{
Expand Down
2 changes: 1 addition & 1 deletion plugins/antigravity/plugin.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "piyaz",
"version": "0.1.0",
"version": "0.1.1",
"description": "Persistent context network for coding projects. Tracks tasks, dependencies, and decisions across sessions."
}
144 changes: 144 additions & 0 deletions plugins/antigravity/skills/composer/references/reviewer-rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Reviewer rules (composer Phase 4 extract)

Slim extract of the canonical piyaz references for the review agent.
Mirrors: `skills/piyaz/references/conventions.md` §1,
`skills/piyaz/references/lifecycle.md` §2.2, §2.3, §2.4, §3, and
`skills/piyaz/references/artifacts.md` §1 (`executionRecord`,
`decisions`), §6. Headings carry their canonical file and section number
so citations like `lifecycle §2.2` resolve unambiguously. When editing a
mirrored section, edit BOTH files.

The reviewer verifies the Completion Protocol was honored; it does not
execute it. §2.2 and §2.3 below are what the implementer was required to
do; §3 is what the orchestrator runs after your verdict, fed by your
downstream-impact list.

---

## conventions §1 — The Iron Law of grounding

```
Never write what you cannot cite or do not know.
```

Applies wherever an agent generates `executionRecord`, `decisions`, `description`, or `files`. For the reviewer it applies to the verdict: every finding cites a real file path and line, every AC evaluation cites the diff or the executionRecord. When uncertain, write less. A short, true verdict is more valuable than a rich, fabricated one.

---

## lifecycle §2.2 — Populate the required fields

`executionRecord`, `decisions`, `files`, `acceptanceCriteria`, plus `prUrl` when a PR was opened (backend upserts a `task_links` row with `kind='pull_request'` so the review subagent and detail UI can resolve the PR). The MCP server returns `_hints` if any are missing.

For pure spec-review / docs / decision-only / Piyaz-only refinement tasks that touched no repo files, `files=[]` is the correct positive answer to "what changed in the repo?", not the absence of an answer.

## lifecycle §2.3 — Open a PR if the work changed code (what the implementer owed)

If `files` is non-empty AND the work was a real code change (not research, not decision-only, not Piyaz-only refinement), the implementer must have opened a PR:

- PR body follows the repo's PR template when one exists (`.github/PULL_REQUEST_TEMPLATE.md` and variants), the canonical concise default otherwise.
- The `taskRef` appears in `[BRACKETS]` (e.g. `[MYMR-83]`) exactly once, for the ONE primary task the PR builds. Bracket form triggers Piyaz PR-status tracking. Related tasks are referenced as plain links, no brackets.
- Summary maps from `executionRecord` (2 to 3 sentences); test plan maps from checked `acceptanceCriteria`; notes-for-reviewer maps from `decisions`.
- Sections are concise; empty optional sections beat fabricated content.

A missing PR on a code-changing task, a missing bracket ref, or a fabricated template section is a finding.

## lifecycle §2.4 — Skip the PR for these task types

A missing PR is legitimate (not a finding) for:

- Research / investigation tasks (no code change).
- Decision-only tasks.
- Pure-Piyaz refinement tasks (no repo changes).
- Tasks the user explicitly said "no PR" on.
- Data and BA work without a code repo (dashboard tweaks, workbooks, metric sign-offs, ad-hoc SQL attached to a ticket). The deliverable lives outside git; the artifact link or path belongs in `executionRecord` and `files`. When the data work IS in a git repo (a dbt project, a versioned SQL or notebook repo), the standard PR rules apply.

---

## lifecycle §3 — Propagate after every change (Iron Law)

```
A change that does not propagate did not happen.
```

The graph is Piyaz's value. Skip once and it lies: ready tasks that aren't ready, blockers pointing at shipped work, every future session picking the wrong next step.

After any status change or significant refinement:

1. `piyaz_query type='edges'` on the changed task. Current relationships.
2. `piyaz_analyze type='downstream'`. Who depends on this task.
3. For each downstream task, evaluate:
- Do edge notes need updating to reflect new decisions?
- Are there NEW relationships revealed by this change?
- Are there STALE relationships that no longer hold?
- Do downstream descriptions need updating based on the decisions made?
4. Create, update, or remove edges as needed.

The reviewer does not execute propagation. Your downstream-impact list names the edges that will need attention; the orchestrator (or the human) executes the rewires.

---

## artifacts §1 — Task artifact quality

### `executionRecord` (only on `in_review`, `done`, and `cancelled`)

The implementer writes this field at the `in_review` transition; you verify it against the diff.

- **Length:** 3 to 5 sentences.
- **Distinct from `description`:** description = scope + role; executionRecord = HOW it was built (or WHY it was abandoned).
- **Include:** function names, file paths, endpoints, data formats.
- **Exclude:** debugging stories, false starts, filler.
- **For `cancelled`:** rationale (why abandoned), approaches tried, decisions learned. Same shape as a done record, just for non-shipping outcomes.
- **Draft tasks must NOT carry an `executionRecord`.** That field implies the task shipped.

### `decisions`

One-liner per decision. Format: **CHOICE + WHY**.

```
GOOD (web): "Chose Redis for refresh tokens. Need fast revocation lookups."
GOOD (sim): "Use std::vector for the Queue backing storage. Cheap front() lookup, fast tail insert; spec is silent on container choice."

BAD: "Used Drizzle"
BAD: "We picked Redis because it's good"
BAD: "Decided to do it that way"
```

Never invent. An implementer `decisions` entry that is not grounded in the diff, the plan, or the conversation is a finding.

---

## artifacts §6 — Markdown formatting and tone

Applies to everything you write into the verdict.

### Structure

- Bullet lists (`-`) for 3 or more items. Never run-on prose.
- Backticks for code references: file paths, function names, endpoints, variables, package names.
- Paragraph breaks between distinct topics.

### Tone: never sound like AI

**Do not use:**

- Em dashes (the `—` character). Use periods, commas, parentheses, or colons.
- Hedging openers: "I think", "perhaps", "seems to", "might be", "arguably".
- Enthusiasm: "Great question", "Awesome", "Exciting", "Love this".
- Throat-clearing: "Let me dive into", "I hope this helps", "Here's the thing", "To be honest".
- Marketing words: "comprehensive", "robust", "powerful", "leverage", "utilize", "ensure", "facilitate", "seamless", "game-changer", "best-in-class".
- Adverb-heavy openers: "Importantly", "Crucially", "Notably", "Essentially", "Basically".
- Empty filler: "It's worth noting that", "It should be mentioned", "As a matter of fact".
- Performative summaries at the end: "I hope this helps!", "Let me know if you need anything else!"

**Do:**

- Subject, verb, object.
- Active voice.
- Concrete over abstract. "Adds 50ms p99" beats "improves performance".
- Specific over vague. "Stripe webhook handler" beats "payment integration".
- Cut adverbs.
- One idea per sentence.

### Length

Concision over padding. No filler, no repetition. The rule is "no fluff", not "no length".
4 changes: 2 additions & 2 deletions plugins/antigravity/skills/piyaz/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ You handle most Piyaz interactions inline. The four agents are escalations for h
| Decompose a project: large, multi-domain, or sensitive | Dispatch **`piyaz:decompose`** for the gated 4-phase pipeline |
| Split a single existing oversize task into children within an active project ("split this task", "decompose HGT-17", composer's oversize handler) | Dispatch **`piyaz:decompose-task`** for the gated split + edge-rewiring + parent-cancel pipeline |
| Add a new feature or capability cluster to an active project ("add a feature for X", "decompose this idea into tasks", "extend the project with Y") | Dispatch **`piyaz:decompose-feature`** for the gated feature-addition pipeline |
| Drive tasks end-to-end through research + plan + implement + review + propagate ("ship the backlog", "run the next task", "compose through my queue", "loop through piyaz tasks", a named task ref to take all the way to a PR) | Suggest user invoke **`/piyaz:composer`** (backlog mode) or **`/piyaz:composer <taskRef>`** (single-task mode). Composer is a slash-command skill that orchestrates four dispatched subagents per task in clean per-phase contexts; the user has to type the slash command (and paste the `/goal` harness composer emits on first turn) for it to start. |
| Drive tasks end-to-end through research + plan + implement + review + propagate ("ship the backlog", "run the next task", "compose through my queue", "loop through piyaz tasks", a named task ref to take all the way to a PR) | Suggest user invoke **`/piyaz:composer`** (backlog mode), **`/piyaz:composer <taskRef>`** (single-task mode), or **`/piyaz:composer rework <taskRef|pr-url>`** (round GitHub review feedback back through the fix loop). Composer is a slash-command skill that orchestrates four dispatched subagents per task in clean per-phase contexts; the user has to type the slash command for it to start; composer then runs continuously and stops on structural conditions (queue drained, failure budget, user stop). |
| Review an `in_review` task or a PR by URL ("review LNS-12", "review this PR", "review `<PR URL>`", "what does the review subagent think of LNS-12") | Dispatch **`piyaz:review`** for a five-lens structured verdict (`approve` / `request-changes` / `block`). The verdict is advisory; HOTL still owns the `in_review → done` transition on GitHub. |
| Status, next task, mark done, plan a draft, refine, dispatch, create or delete task | Handle inline. **Do not** dispatch `piyaz:manage` for these; they are day-to-day. |
| Strategic review, rebalance the graph, audit dependencies, prune orphans, connect missing edges, audit blockers, consolidate categories or tags, graph-health check, "is this project on track?" | Dispatch **`piyaz:manage`** for deep CTO mode |
Expand Down Expand Up @@ -189,7 +189,7 @@ Lead with slim tools.
- `piyaz_analyze type='plannable'`. Drafts ready to plan.
- Pick one on the critical path. **§ Plan a draft task**.

**For end-to-end automation across the queue:** suggest `/piyaz:composer` (backlog mode). Composer picks the highest-value ready task each iteration, drives it through research + plan + implement + propagate via dispatched subagents in clean per-phase contexts, then loops until the queue is empty or the user stops. The user paces it via `/goal` (composer emits the harness on first turn; user pastes it). Use this when the user wants the queue shipped without picking each task manually; use the inline picker above when the user wants per-task agency.
**For end-to-end automation across the queue:** suggest `/piyaz:composer` (backlog mode). Composer picks the highest-value ready task each iteration, drives it through research + plan + implement + review + propagate via dispatched subagents in clean per-phase contexts, then loops until the queue is empty or the user stops. When HOTL requests changes on a composer PR instead of merging, `/piyaz:composer rework <taskRef|pr-url>` rounds that feedback back through the fix loop. It runs continuously without per-task check-ins, gates only on genuine decisions (oversize tasks, proposed rewrites, open questions), runs a bounded review→fix loop per task, and stops structurally when the queue drains or the user says stop. Use this when the user wants the queue shipped without picking each task manually; use the inline picker above when the user wants per-task agency.

### Refine a task

Expand Down
29 changes: 4 additions & 25 deletions plugins/antigravity/skills/piyaz/references/artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Quality bar for everything an agent writes into Piyaz: titles, descriptions, acc

Agents read this file when about to create, refine, or audit an artifact. The Iron Law of grounding (`conventions.md` §1) applies at every step.

> Sections of this file are mirrored by the composer phase extracts in the claude-code plugin (`plugins/claude-code/skills/composer/references/`); when you edit a mirrored section, update those extracts and bump the pin in their `sources.json`.

## Contents

- §1 Task artifact quality: title, description, acceptanceCriteria, executionRecord, decisions, files
Expand Down Expand Up @@ -140,7 +142,7 @@ BAD:

Single-AC tasks are rejected. Tasks with vague ACs ("works correctly", "is complete", "performs well") are rejected.

### `executionRecord` (only on `done` and `cancelled`)
### `executionRecord` (only on `in_review`, `done`, and `cancelled`)

- **Length:** 3 to 5 sentences.
- **Distinct from `description`:** description = scope + role; executionRecord = HOW it was built (or WHY it was abandoned).
Expand Down Expand Up @@ -186,7 +188,7 @@ Never invent. If a decision is not grounded in conversation, code, or the artifa

## 2. Tag dimensions and first-class fields

Every task, in every status, must carry tags across the three tag dimensions below. Reuse existing tags from `piyaz_query type='overview'` before coining new ones.
Every task, in every status, must carry tags across the three tag dimensions below. Reuse existing tags from `piyaz_query type='meta'` before coining new ones.

| Dimension | Count | Vocabulary |
|---|---|---|
Expand Down Expand Up @@ -409,29 +411,6 @@ The text you write into Piyaz is read by other engineers. It must read like an e
- Cut adverbs.
- One idea per sentence.

### Em-dash replacements

```
BAD (web): "Custom auth — months of work — is off the table."
GOOD: "Custom auth is off the table. Months of work, easy to leak data."

BAD (web): "The API uses Bearer tokens — validated against the users table."
GOOD: "The API validates Bearer tokens against the users table."

BAD (sim): "Rejected — see line 42 of the spec."
GOOD: "Rejected. See line 42 of the spec."

BAD (agentic): "The agent loop dispatches tools — validated against the
registry — then streams the model output."
GOOD: "The agent loop validates each tool against the registry
before dispatching, then streams the model output."

BAD (firmware):"BMP280 returns 0xFF — the i2c clock-stretch fix is not
backported."
GOOD: "BMP280 returns 0xFF. The i2c clock-stretch fix is not
backported."
```

### Length

Concision over padding. No filler, no AI throat-clearing, no repetition. But do not sacrifice clarity for brevity. If a task genuinely needs 6 to 8 sentences in its description because the architecture has multiple components, the bug has a complex cause, or the research question is multi-part, write them. The rule is "no fluff", not "no length". A 6-sentence description that helps a reader is better than a 2-sentence one that loses them.
Loading