danshapiro · mattleaverton · Apr 24, 2026 · Apr 1, 2026 · Apr 1, 2026 · Apr 1, 2026
diff --git a/docs/plans/2026-04-24-kilroy-fixes-from-feedback.md b/docs/plans/2026-04-24-kilroy-fixes-from-feedback.md
@@ -0,0 +1,277 @@
+---
+date: 2026-04-24
+status: open
+audience: kilroy engine + workflow contributors
+---
+
+# Kilroy fixes — from `quick-launch` and `pr-review` agent feedback
+
+Two agent sessions hit real bugs running `quick-launch` and `pr-review` against live work
+(the pr-review session reviewed PRs across `danshapiro/freshell` and other repos; the
+quick-launch session ran 10+ parallel research agents against `bitbucket/svg-compose`).
+This doc captures every actionable fix from both, ranked by impact, with the workflow-level
+mitigations that have already shipped called out so the engine work can be scoped cleanly.
+
+The `pr-review` and `quick-launch` skills + workflow packages now live in
+`gf-software-factory/{skills,workflows}` (the kilroy-side copies were removed in the same
+pass that produced this doc). Engine-level fixes still belong in this repo.
+
+---
+
+## Blockers — must fix before broader rollout
+
+These caused user-visible damage or silent success-on-failure in real runs.
+
+### 1. Agents leak out of the worktree when prompts contain absolute paths
+
+**What happened.** A user's prompt opened with "You are working on `/Users/matt/bitbucket/svg-compose`."
+Ten parallel agents followed the literal path: they `cd`'d to the user's main source tree
+and ran `git -C /Users/matt/bitbucket/svg-compose ...` and `Edit` against absolute paths,
+clobbering the shared tree while their isolated worktrees sat untouched. Their final commits
+on `attractor/run/...` branches contained only `.gitignore` updates. End result: 10 agents
+racing on the user's actual repo.
+
+**Why the docs didn't catch it.** The skill says "the engine auto-detects and creates an
+isolated run branch + worktree, so the user's source tree is never touched." That's true
+of `cwd`, not of the paths the LLM writes in `Bash`/`Edit` calls.
+
+**Fix path, in order of impact.**
+
+1. **Agent system-prompt prepend** (highest leverage, low effort): "You are running inside
+   an isolated Kilroy worktree at `{path}`. All work must happen relative to `cwd`; do not
+   `cd` elsewhere or pass `-C` to git with paths outside `cwd`." Owners: tmux agent
+   harness in `internal/agents/...` (or wherever the system-prompt assembly lives).
+2. **Bash-tool path guard** (heaviest, optional): wrap `Bash` so writes outside the worktree
+   require `--confirm-out-of-tree-writes` or fail loudly. Heavy-handed but appropriate for
+   fire-and-forget agents.
+3. **SKILL.md sharp-edge note** (already shipped in `gf-software-factory/skills/quick-launch/SKILL.md`
+   — see Step 0 preflight; consider also adding to `pr-review`).
+
+### 2. Detached runs don't register in the run DB until terminal
+
+**What happened.** `runs wait --latest --label`, `runs wait <id>`, `status --latest`, and
+`runs list` all fail to find an active detached run — they only see completed ones. The
+first version of the `pr-review` skill modeled `runs wait --label` on `quick-launch`'s docs;
+it broke on the first live run.
+
+**Workaround in shipped skills.** Both `pr-review` and (partially) `quick-launch` now
+capture `logs_root=...` from the launch stdout and poll `final.json`. That's mechanical
+and works, but it's a regression from the documented `runs wait` UX.
+
+**Fix.** Write the run row to the DB at launch with `status=running` so `wait`/`show`/`list`
+operate on active runs, not just terminated ones. Owners: `cmd/kilroy/attractor_run.go`
+and the run-DB writer in `internal/runs/...`.
+
+### 3. Reaching any terminal node = `status=success`
+
+**What happened.** PR #303 v1 hit 4× transient `git fetch` failures in setup. The original
+`pr-review` graph followed the unconditional `setup -> done` fallback edge, so the engine
+reached the `done` node and reported `status=success` at the run level — with no review
+produced. Every graph author hits this trap.
+
+**Workflow-level mitigation (already shipped).** `gf-software-factory/workflows/pr-review/graph.dot`
+now routes `setup -> fail_report` and `collect_diff -> fail_report` instead of `-> done`,
+matching `build_test -> fail_report`. `write-fail-report.sh` infers the failed phase from
+artifact presence and writes a phase-specific `review-report.md`.
+
+**Engine fix (still wanted).** The trap is graph-author-shaped, not workflow-shaped.
+Either:
+
+- **(a)** track "any stage failed?" in the engine and surface a distinct terminal state
+  (e.g. `status=completed_with_failures`) on `final.json`; or
+- **(b)** require an explicit `condition=` on every inbound edge to a terminal node, and
+  fail validation otherwise.
+
+(b) catches the bug at validate time, before runs happen. (a) is a runtime safety net.
+Both are useful; (b) is cheaper.
+
+### 4. Agents don't inherit MCP servers from the parent Claude Code session
+
+**What happened.** The user told their parent Claude session to verify work in Chrome
+via `mcp__claude-in-chrome__*`. When that conversation launched 10 detached `quick-launch`
+agents, those tools didn't exist in the children. The agents reached for `osascript`,
+`screencapture`, `chrome --headless`, and `curl` via Bash to compensate — macOS-level
+automation the user did not consent to. That's the "wild things happening with Chrome"
+the user reported.
+
+**Fix path.**
+
+1. **Document prominently** (cheapest): add a sentence to `quick-launch` SKILL.md and
+   `pr-review` SKILL.md: "Kilroy agents do not inherit the parent Claude Code session's
+   MCP servers. Only Bash, Edit, Read are available unless you forward MCP via run config."
+2. **Optional opt-in forwarding**: a `--forward-mcp <name>` flag (or `mcp_forward:` in
+   `run.yaml`) that copies a parent server's config into the child agent. Not every parent
+   server makes sense to forward; opt-in is the right default.
+
+### 5. `install-skills.sh` doesn't rebuild the binary
+
+**What happened.** The user re-ran `install-skills.sh` to relink the workflow symlink at
+their checkout. The `kilroy` binary on PATH was 17 days stale (pre-`--tmux`). The first
+sweep loop produced no launches because a `grep '^logs_root='` matched nothing on error
+output that no longer included that line.
+
+**Fix.** Either (a) the install script should run `go build ./cmd/kilroy` before linking,
+or (b) it should `stat` the binary against `find . -name '*.go' -newer ...` and bail
+loudly when stale. (a) is one extra line and handles every case. The kilroy-side
+`scripts/install-skills.sh` does not currently build; the new factory-side
+`gf-software-factory/scripts/install-kilroy-host.sh` does build.
+
+---
+
+## Important — fix before more graph authors arrive
+
+### 6. `--help` is missing half the CLI surface
+
+`kilroy attractor run --help` doesn't document `--tmux`, `--prompt-file`, `--label`,
+`--workspace`, `--input`, `--package`, `--detach`. `kilroy attractor runs --help` lists
+only `list` and `prune` — no `show`, no `wait`. Every flag the shipped skills use is
+load-bearing but undiscoverable.
+
+**Fix.** Regenerate help text from the actual flag parser in `cmd/kilroy/main.go`. If the
+parser is hand-rolled (likely), align the printed usage with the parsed flags as a unit
+test — drift here is the recurring class.
+
+### 7. Progress stream has no terminal event
+
+`progress.ndjson` emits `stage_attempt_*`, `edge_selected`, `input_materialization_*`,
+`tmux_session_*`. Nothing for run completion. The canonical terminal signal today is the
+appearance of `final.json`. That invariant isn't documented anywhere I could find.
+
+**Fix.** Emit a `run_completed` / `run_failed` event as the final line of `progress.ndjson`,
+plus document `final.json` as the public completion contract.
+
+### 8. Worktree cleanup isn't tied to run completion
+
+After a long working session, `git worktree list` showed 40+ orphaned worktrees from
+attractor runs going back weeks — all physically gone from disk, stale git admin refs.
+The user pruned manually with `git worktree prune`.
+
+**Fix.** `git worktree remove` the run's worktree on successful terminal state. Keep it
+on failure for forensics. Wire to whatever runs the existing per-run cleanup (or add it).
+
+### 9. `result.md` contract is brittle under interruption
+
+When the user `stop --force`'d 10 agents mid-flight, only 1 of 10 had written `result.md`.
+Several had done substantive committed work but hadn't reached the "write the summary"
+step. The `outputs=` contract surfaces this as missing `result.md` — a generic error that
+obscures what actually happened.
+
+**Fix options.**
+
+- Auto-flush a partial `result.md` on graceful termination, with a `__stopped_early__: true`
+  marker and the last assistant text block.
+- Or: make `result.md` optional and let `runs show --print result.md` fall back to a
+  generated one-line summary from the transcript.
+
+### 10. `pr-review` workflow has no GitHub-review publish step
+
+The workflow writes `review-report.md` to disk but never posts a GitHub review. Branch
+protection requires an approved review, so the workflow's verdict is invisible to the
+merge gate. We hit this on the first merge batch — all 5 PRs blocked by `REVIEW_REQUIRED`.
+
+**Fix.** Add a node `post_review` after `holistic_review` that runs
+`gh pr review <N> --approve|--request-changes --body "$(cat review-report.md)"`, gated by
+the holistic decision token: `MERGE` → approve; `MERGE-FIX`/`FIX-MERGE` → request-changes
+with checklist; `REJECT` → request-changes with rejection comment. Lives in the workflow,
+not the engine. Owner: `gf-software-factory/workflows/pr-review/`.
+
+### 11. `pr-review` setup uses HTTPS with no auth fallback
+
+PR #303 v1 failed 4× with `remote: Internal Server Error` (HTTP 500) on `git fetch upstream
+pull/N/head`. When the workspace is an existing clone, `setup-pr.sh` inherits whatever URL
+is on the upstream remote — plain HTTPS, no credential helper. Flaky under GitHub pressure.
+
+**Fix.** Use `gh pr checkout <N> --repo <owner/repo>` (handles auth via `gh`) or add an
+explicit retry-with-SSH-URL fallback. Workflow-level fix; lives in
+`gf-software-factory/workflows/pr-review/scripts/setup-pr.sh`.
+
+### 12. Cross-repo PRs aren't handled
+
+PR #297 was a fork PR from `Glowforge/freshell`. Nothing in the graph detects "is this
+same-repo?" — a follow-up fix agent tried to push and hit HTTP 403.
+
+**Fix.** `setup-pr.sh` should record `headRepositoryOwner` in `pr-meta.json`, and
+`fail_report` should surface a distinct "cross-repo, push blocked" case so downstream
+agents/humans know not to try. Workflow-level fix.
+
+---
+
+## Quality of life
+
+### 13. `KILROY_PREDECESSOR_NODE` env for failure handlers
+
+Today `write-fail-report.sh` infers the failed phase from filesystem state (`pr-meta.json`
+exists? `build-report.json` exists?). That's a workaround for `fail_report` not knowing
+its predecessor node. Adding `KILROY_PREDECESSOR_NODE` (and maybe `KILROY_PREDECESSOR_OUTCOME`)
+to the env when a node fires off a non-primary edge would let failure handlers be
+deterministic instead of probing.
+
+### 14. `runs show --worktree-path` accessor
+
+Today you `runs show <id>` and grep for `worktree_dir`. A scripting-friendly
+`runs show --print-worktree-path` would clean up the bash patterns in skills.
+
+### 15. Concurrency cap
+
+10 parallel `attractor run`s worked but generated a lot of simultaneous LLM usage.
+A `--max-concurrent N` flag (or a config setting) would help users self-pace without
+writing their own throttle.
+
+### 16. Long worktree paths
+
+`/Users/matt/.local/state/kilroy/attractor/runs/01K.../worktree/` is a mouthful.
+A `runs cd <id>` helper or a documented shell-function template ("here, source this
+into your zshrc") would smooth post-mortem inspection.
+
+### 17. `progress.ndjson` vs `agent_output.jsonl` documentation gap
+
+Progress shows lifecycle events (start, input-materialization, tmux-session-start,
+terminal); the actual tool-call log is in `agent/agent_output.jsonl`. Several minutes
+were wasted reading the wrong file when debugging. A short paragraph in
+`docs/runs-layout.md` (or wherever) would prevent that.
+
+### 18. `Write` tool for agents
+
+Agents have `Edit` + `Bash` heredoc only — no `Write` tool. Functionally fine, but
+unconventional vs Claude Code's standard tool set. Worth either (a) adding `Write`,
+or (b) noting in skill docs so prompt authors don't expect it.
+
+---
+
+## Things that worked — protect these
+
+The pr-review session reported the following, with no kilroy-side failures across 8
+parallel runs and 3 sequential batches:
+
+- **`--prompt-file`** — writing prose to a file beat JSON-escaping multi-paragraph
+  prompts. Keep it.
+- **Labeling + `--latest --label task=X` lookup** — made multi-run orchestration
+  tractable across 10+ concurrent runs.
+- **`runs wait --timeout`** behavior — streams status to stderr, sane exit codes.
+  (Note: the active-run lookup gap from item #2 is orthogonal.)
+- **`stop --force`** across many concurrent runs — reliable, no orphaned tmux sessions.
+- **CLI backend via Claude subscription** — no API keys, no rate-limit surprises.
+- **`agent_output.jsonl`** — complete transcript saved a post-mortem investigation.
+- **Git-worktree-per-run isolation** — the right design when it's actually used (see
+  item #1 for when it isn't).
+- **Auto-detect provider CLIs + auto-build default config when no `run.yaml` is supplied** —
+  removed a lot of setup friction for the skill author.
+
+---
+
+## Suggested rollout order
+
+If a single contributor picks this up:
+
+1. **Item #1** (worktree isolation prompt prepend) and **#5** (install-skills builds binary).
+   Both are one-line patches with high leverage. Do them first.
+2. **Item #2** (DB-write at launch) and **#6** (help-text alignment). These unlock the
+   documented UX.
+3. **Item #7** (`run_completed` event) and **#8** (worktree cleanup). Polish that
+   compounds across every future skill.
+4. **Item #3** (engine-side terminal-state guard). Pick (b) — validator-time. Less
+   runtime surface area to verify.
+5. **Items #10–#12** (workflow-level pr-review fixes). These live in
+   `gf-software-factory/workflows/pr-review/` now.
+6. **Item #4** (MCP forwarding). Nice but optional.
+7. **Items #13–#18**. Quality of life as time allows.
diff --git a/internal/attractor/agents/tmux_handler.go b/internal/attractor/agents/tmux_handler.go
@@ -60,6 +60,13 @@ func (h *TmuxAgentHandler) Execute(ctx context.Context, exec *engine.Execution,
 	if prompt == "" {
 		prompt = node.Label()
 	}
+	if wtPreamble := strings.TrimSpace(engine.BuildWorktreeContextPreamble(exec.WorktreeDir)); wtPreamble != "" {
+		if strings.TrimSpace(prompt) == "" {
+			prompt = wtPreamble
+		} else {
+			prompt = wtPreamble + "\n\n" + strings.TrimSpace(prompt)
+		}
+	}
 
 	// Session name: kilroy-{runID}-{nodeID} (unique per node execution).
 	runID := ""

diff --git a/internal/attractor/agents/tmux_handler_test.go b/internal/attractor/agents/tmux_handler_test.go
@@ -122,7 +122,8 @@ func TestTmuxAgentHandler_FakeAgent_SuccessfulExecution(t *testing.T) {
 		t.Fatalf("response = %q, want to contain FAKE_AGENT_OUTPUT_OK", string(resp))
 	}
 
-	// Verify prompt.md was written.
+	// Verify prompt.md was written — the user prompt survives, and the
+	// worktree-context preamble is prepended so the agent stays in cwd.
 	promptPath := filepath.Join(logsRoot, "test_node", "prompt.md")
 	prompt, err := os.ReadFile(promptPath)
 	if err != nil {
@@ -131,6 +132,12 @@ func TestTmuxAgentHandler_FakeAgent_SuccessfulExecution(t *testing.T) {
 	if !strings.Contains(string(prompt), "do something") {
 		t.Fatalf("prompt = %q, want 'do something'", string(prompt))
 	}
+	if !strings.Contains(string(prompt), "WORKTREE CONTEXT") {
+		t.Fatalf("prompt = %q, want worktree-context preamble", string(prompt))
+	}
+	if !strings.Contains(string(prompt), workDir) {
+		t.Fatalf("prompt = %q, want worktree path %q", string(prompt), workDir)
+	}
 }
 
 func TestTmuxAgentHandler_FakeAgent_FailedExecution(t *testing.T) {

diff --git a/internal/attractor/engine/handlers.go b/internal/attractor/engine/handlers.go
@@ -487,6 +487,24 @@ func BuildManualBoxFanInPromptPreamble(exec *Execution, node *model.Node) string
 	return strings.TrimSpace(b.String())
 }
 
+// BuildWorktreeContextPreamble returns a short preamble that pins the agent to
+// its isolated worktree. Without it, an agent following a prompt that mentions
+// absolute paths outside the worktree will happily `cd` out and clobber the
+// user's source tree. Returns "" if worktreeDir is empty.
+func BuildWorktreeContextPreamble(worktreeDir string) string {
+	worktreeDir = strings.TrimSpace(worktreeDir)
+	if worktreeDir == "" {
+		return ""
+	}
+	var b strings.Builder
+	b.WriteString("WORKTREE CONTEXT\n")
+	b.WriteString(fmt.Sprintf("- You are running inside an isolated Kilroy worktree at %s.\n", worktreeDir))
+	b.WriteString("- All work must happen relative to cwd. Do not `cd` elsewhere.\n")
+	b.WriteString("- Do not pass `-C <path>` to git with a path outside cwd.\n")
+	b.WriteString("- If the task description mentions absolute paths outside this worktree, treat them as informational only — do not read, write, or run commands against them.\n")
+	return b.String()
+}
+
 func (h *CodergenHandler) Execute(ctx context.Context, exec *Execution, node *model.Node) (runtime.Outcome, error) {
 	stageDir := filepath.Join(exec.LogsRoot, node.ID)
 	stageStatusPath := filepath.Join(stageDir, "status.json")
@@ -538,6 +556,15 @@ func (h *CodergenHandler) Execute(ctx context.Context, exec *Execution, node *mo
 			promptText = preamble + "\n\n" + strings.TrimSpace(promptText)
 		}
 	}
+	if exec != nil {
+		if wtPreamble := strings.TrimSpace(BuildWorktreeContextPreamble(exec.WorktreeDir)); wtPreamble != "" {
+			if strings.TrimSpace(promptText) == "" {
+				promptText = wtPreamble
+			} else {
+				promptText = wtPreamble + "\n\n" + strings.TrimSpace(promptText)
+			}
+		}
+	}
 	if env := BuildStageRuntimeEnv(exec, node.ID); len(env) > 0 {
 		if manifestPath := strings.TrimSpace(env[inputsManifestEnvKey]); manifestPath != "" {
 			preamble := strings.TrimSpace(MustRenderInputMaterializationPromptPreamble(manifestPath))

diff --git a/internal/attractor/engine/worktree_preamble_test.go b/internal/attractor/engine/worktree_preamble_test.go
@@ -0,0 +1,36 @@
+// Tests for BuildWorktreeContextPreamble — the per-run isolation notice that
+// pins agents to their worktree so prompts with stray absolute paths don't
+// pull them into the user's source tree.
+package engine
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestBuildWorktreeContextPreamble_EmptyWhenNoWorktree(t *testing.T) {
+	if got := BuildWorktreeContextPreamble(""); got != "" {
+		t.Fatalf("empty worktreeDir: got %q, want empty", got)
+	}
+	if got := BuildWorktreeContextPreamble("   "); got != "" {
+		t.Fatalf("whitespace worktreeDir: got %q, want empty", got)
+	}
+}
+
+func TestBuildWorktreeContextPreamble_IncludesPathAndGuidance(t *testing.T) {
+	worktree := "/tmp/kilroy-run-abc/worktree"
+	got := BuildWorktreeContextPreamble(worktree)
+	if !strings.Contains(got, worktree) {
+		t.Fatalf("preamble missing worktree path %q: %s", worktree, got)
+	}
+	for _, want := range []string{
+		"WORKTREE CONTEXT",
+		"isolated Kilroy worktree",
+		"Do not `cd` elsewhere",
+		"informational only",
+	} {
+		if !strings.Contains(got, want) {
+			t.Fatalf("preamble missing %q: %s", want, got)
+		}
+	}
+}