From 20386f7d0fe7044ef0fbb07b6d64b6634d02518d Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Wed, 1 Apr 2026 16:15:55 -0500 Subject: [PATCH 1/6] docs(plan): add Phase 0.9 first-run friction fixes, input/output contracts From first real PR review workflow run: - Phase 0.9: config/auto-detect conflict, require_clean default, missing env vars in tool nodes, CLI headless warning, error message hints, worktree file-not-found context - Phase 3.7-3.9: run input contract (--input), output contract, node data passing conventions - Future work: iteration patterns (dynamic loops over collections) Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/plans/2026-03-27-kilroy-stabilization.md | 94 +++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/docs/plans/2026-03-27-kilroy-stabilization.md b/docs/plans/2026-03-27-kilroy-stabilization.md index f2a93872..8cdeb07a 100644 --- a/docs/plans/2026-03-27-kilroy-stabilization.md +++ b/docs/plans/2026-03-27-kilroy-stabilization.md @@ -190,6 +190,41 @@ architecture to even start. Most of this has one reasonable default. - Config file remains available for overriding any default - The test graph suite runs with zero config +### 0.9 First-Run Friction Fixes + +**What:** A batch of quick fixes for sharp edges discovered during the first real PR review +workflow run. These are all bugs, wrong defaults, or missing error context — not new features. + +**Context:** First attempt at running a real PR review graph exposed several issues that all +stem from "the tool assumes you already know how it works." Each fix is small but together +they dramatically improve the first-run experience. + +**Done when:** + +1. **Auto-detection fills gaps in partial configs** — if a config file is present but doesn't + declare providers, auto-detection should fill the gaps. Config file existence should not + disable auto-detection entirely. This is a fix to 0.7. + +2. **`require_clean` defaults to false** — the worktree isolates the run from the parent + repo state. Requiring a clean working directory is pointless friction. Default to false, + opt-in to true for strict setups. + +3. **KILROY_RUN_ID injected into tool nodes** — tool_command nodes don't receive the same + environment variables as agent nodes. This breaks the `.ai/runs/$KILROY_RUN_ID/` data + passing convention. Every node type must get the same core env vars. + +4. **CLI headless warning non-interactive** — the `claude` CLI's interactive Y/n prompt + about account suspension kills headless runs. Fix with one-time acknowledgment stored + in config, a `--accept-cli-risk` flag, or by using `yes |` in the invocation template. + +5. **Error messages suggest fixes** — "missing provider backend" should suggest "remove + --config to use auto-detection, or add llm.providers.X.backend". "File not found in + worktree" should suggest committing the file. Validation errors should name alternatives. + +6. **Worktree file-not-found context** — when a `tool_command` references a path that doesn't + exist in the worktree, the error should explain that worktrees only contain committed files, + and suggest `git add && git commit`. + --- ## Phase 0.8: CLI-Only Agent Backend (Deprecate API) @@ -693,6 +728,56 @@ keeps its context) and "fresh start" (new context with a summary of what happene - Thread management is handled by the agent handler, not the engine - Existing graphs with fidelity attributes still work (backward compat during transition) +### 3.7 Run Input Contract + +**What:** A `--input` flag that accepts a file or directory of structured startup context. +Contents are available to all nodes via variable expansion and the filesystem. + +**Context:** Currently the only way to pass runtime data into a graph is `$goal` or ambient +environment variables. A PR review needs: repo, PR number, branch, review criteria. An +exploration run needs: list of repos, search terms. The input contract is the bridge between +whatever launches kilroy and the graph's execution. + +**Done when:** +- `kilroy run --graph pr-review.dot --input run-input.yaml` loads structured context +- Input values are available via variable expansion in prompts: `$input.pr_number`, + `$input.repo`, `$input.task` +- Input files (in a directory) are available at `$INPUT_DIR` in tool_command nodes +- The graph can declare required inputs and validation rejects missing ones +- Works with both config-file and zero-config modes + +### 3.8 Run Output Contract + +**What:** A way for graphs to declare output artifacts and a known location for results. +After a run, `kilroy status` should tell you where the outputs are. + +**Context:** Currently, outputs are scattered through the worktree and stage logs. Finding +the PR review report requires spelunking through nested directories. The run should have a +declared output location or manifest. + +**Done when:** +- Nodes can declare `output_file="review-report.md"` or similar +- Declared outputs are collected to a known location after the run +- `kilroy status --run ` shows where outputs are +- Alternatively: a convention like `$OUTPUT_DIR` that nodes write to, and the engine copies + to a discoverable location post-run + +### 3.9 Node Data Passing Conventions + +**What:** Document and enforce clear conventions for how nodes pass data to each other. +Currently this is ad-hoc filesystem usage with no validation or documentation. + +**Context:** Nodes pass data by writing files to paths like `.ai/runs/$KILROY_RUN_ID/` and +hoping subsequent nodes find them. This broke during the PR review workflow because tool +nodes didn't get `KILROY_RUN_ID`. Even when it works, the convention is undiscoverable. + +**Done when:** +- A clear, documented convention exists for inter-node data: where to write, how to name + files, how subsequent nodes find them +- The engine validates that files referenced in prompts (via `$input_dir` or similar) exist +- Context updates in outcomes can pass small structured data between nodes +- The convention works for both tool_command nodes and agent nodes + --- ## Phase 4: Clean Handler Interface @@ -838,3 +923,12 @@ Writing DOT files by hand is error-prone. Future tooling could include: a graph that shows the execution topology, a graph editor with validation feedback, graph templates for common patterns (linear+verify, fan-out consensus), and prompt scaffolding that generates the boilerplate for new nodes. + +### Iteration Patterns (Dynamic Loops Over Collections) + +Some workflows need to process a dynamic list: "review each changed file," "test each PR," +"chew through this task list until empty." The engine supports loops (edge back to earlier +node) but not dynamic iteration over a collection where the loop count isn't known at graph +authoring time. This is probably a pattern built on context variables (set a list, loop node +pops items, exit condition checks if list is empty) rather than a new engine primitive. +Worth investigating what patterns emerge from real usage. From 6cd282449cf3757f87202e951f598c4c769c5dc2 Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Wed, 1 Apr 2026 16:37:29 -0500 Subject: [PATCH 2/6] wip: PR review workflow graph (v2) Setup/build/test tool nodes + single review agent node. Experimental workflow for automated PR triage. Co-Authored-By: Claude Opus 4.6 (1M context) --- workflows/pr-review/pr-review.dot | 90 +++++++++++++++++++++++ workflows/pr-review/run.yaml | 17 +++++ workflows/pr-review/scripts/build-test.sh | 40 ++++++++++ workflows/pr-review/scripts/setup-pr.sh | 58 +++++++++++++++ 4 files changed, 205 insertions(+) create mode 100644 workflows/pr-review/pr-review.dot create mode 100644 workflows/pr-review/run.yaml create mode 100644 workflows/pr-review/scripts/build-test.sh create mode 100644 workflows/pr-review/scripts/setup-pr.sh diff --git a/workflows/pr-review/pr-review.dot b/workflows/pr-review/pr-review.dot new file mode 100644 index 00000000..02547514 --- /dev/null +++ b/workflows/pr-review/pr-review.dot @@ -0,0 +1,90 @@ +// Single-PR review workflow: setup repo, build/test, review, produce report. +// Usage: PR_REPO=owner/repo PR_NUMBER=123 kilroy attractor run --graph pr-review.dot --config run.yaml + +digraph pr_review { + graph [ + goal="Review GitHub pull request: checkout, build, test, and assess for merge readiness", + model_stylesheet=" + * { llm_model: claude-sonnet-4.6; llm_provider: anthropic; } + " + ] + + start [shape=Mdiamond, label="Start"] + exit [shape=Msquare, label="Exit"] + + // Checkout PR branch, merge main, fetch metadata and diff + setup_pr [ + shape=parallelogram, + label="Setup PR", + tool_command="sh workflows/pr-review/scripts/setup-pr.sh" + ] + + // Build, test, format check — always exit 0, results saved to files + build_test [ + shape=parallelogram, + label="Build & Test", + tool_command="sh workflows/pr-review/scripts/build-test.sh" + ] + + // Single agent: review code + produce final report + review [ + shape=box, + label="Review & Report", + auto_status=true, + prompt="You are a PR reviewer. The PR branch is already checked out in your working directory. Build and test results are available. + +READ THESE FILES FIRST: +- .ai/pr-data/pr-view.txt (PR description) +- .ai/pr-data/pr-meta.json (metadata: files, author, branches) +- .ai/pr-data/pr-diff.patch (the diff) +- .ai/pr-data/merge-result.txt (clean or conflict) +- .ai/pr-data/build-result.txt (pass or fail) +- .ai/pr-data/test-result.txt (pass or fail) +- .ai/pr-data/test-output.txt (full test output) +- .ai/pr-data/gofmt-result.txt (pass or fail) + +You also have the full codebase available. Use it. Read the actual source files that were changed — not just the diff. Understand the surrounding context. + +YOUR JOB: +Assess whether this PR should be merged. Be concise and actionable. + +Write .ai/pr-data/review-report.md in EXACTLY this format: + +--- +## Decision: [one of: MERGE | MERGE-FIX | FIX-MERGE | CHERRY-PICK | SPLIT | REJECT] + +## Rationale +[3-5 sentences max. Why this decision? What does the PR do and is it worth merging?] + +## Build & Test +- Build: [pass/fail] +- Tests: [pass/fail, note any failures and whether they are pre-existing or introduced by this PR] +- Format: [pass/fail] + +## Blockers +[List anything that MUST be fixed before merge. If none, write 'None'.] + +## Follow-up Tasks +[Discrete tasks that should happen after merge — each with enough context that another agent could implement it without re-reading this PR. If none, write 'None'.] + +Example follow-up format: +- FIX: Remove duplicate test entry {\"gpt-5.4\", false} in internal/attractor/engine/cli_only_models_test.go line 13-14. The same case appears twice. Remove one and consider replacing it with a test for gpt-5.4-spark. +--- + +RULES: +- Bias toward landing contributions. Prefer MERGE-FIX over FIX-MERGE for minor issues. +- Do NOT pad the report. No tables, no severity tags, no dimensions matrix. Just the sections above. +- Every follow-up task must specify: the file, what to change, and why. +- If build/test fails, check whether failures are pre-existing on main or introduced by this PR. +- Verify claims in the PR description against what the code actually does. + +On failure, write to $KILROY_STAGE_STATUS_PATH (fallback $KILROY_STAGE_STATUS_FALLBACK_PATH): +{\"status\":\"fail\",\"failure_reason\":\"...\",\"failure_class\":\"deterministic\"}" + ] + + start -> setup_pr + setup_pr -> build_test [condition="outcome=success"] + setup_pr -> exit + build_test -> review + review -> exit +} diff --git a/workflows/pr-review/run.yaml b/workflows/pr-review/run.yaml new file mode 100644 index 00000000..f51ab251 --- /dev/null +++ b/workflows/pr-review/run.yaml @@ -0,0 +1,17 @@ +version: 1 + +repo: + path: . + +llm: + providers: + anthropic: + backend: api + +git: + require_clean: false + commit_per_node: false + +runtime_policy: + stage_timeout_ms: 0 + stall_timeout_ms: 300000 diff --git a/workflows/pr-review/scripts/build-test.sh b/workflows/pr-review/scripts/build-test.sh new file mode 100644 index 00000000..f7720da5 --- /dev/null +++ b/workflows/pr-review/scripts/build-test.sh @@ -0,0 +1,40 @@ +#!/bin/sh +# Run build and tests, capture results. Failure is a finding, not a blocker. + +SCRATCH=".ai/pr-data" +mkdir -p "$SCRATCH" + +echo "=== Build ===" +if go build ./... 2>"$SCRATCH/build-stderr.txt"; then + echo "BUILD=pass" > "$SCRATCH/build-result.txt" + echo "Build: PASS" +else + echo "BUILD=fail" > "$SCRATCH/build-result.txt" + echo "Build: FAIL (see $SCRATCH/build-stderr.txt)" + cat "$SCRATCH/build-stderr.txt" +fi + +echo "" +echo "=== Tests ===" +if go test ./... 2>&1 | tee "$SCRATCH/test-output.txt"; then + echo "TESTS=pass" > "$SCRATCH/test-result.txt" + echo "Tests: PASS" +else + echo "TESTS=fail" > "$SCRATCH/test-result.txt" + echo "Tests: FAIL (see $SCRATCH/test-output.txt)" +fi + +echo "" +echo "=== Format Check ===" +GOFMT_OUT=$(gofmt -l . 2>/dev/null || true) +if [ -z "$GOFMT_OUT" ]; then + echo "GOFMT=pass" > "$SCRATCH/gofmt-result.txt" + echo "gofmt: PASS" +else + echo "GOFMT=fail" > "$SCRATCH/gofmt-result.txt" + echo "$GOFMT_OUT" > "$SCRATCH/gofmt-files.txt" + echo "gofmt: FAIL — $(echo "$GOFMT_OUT" | wc -l | tr -d ' ') files" +fi + +# Always exit 0 — build/test failure is a finding, not a graph failure +exit 0 diff --git a/workflows/pr-review/scripts/setup-pr.sh b/workflows/pr-review/scripts/setup-pr.sh new file mode 100644 index 00000000..bcffffa1 --- /dev/null +++ b/workflows/pr-review/scripts/setup-pr.sh @@ -0,0 +1,58 @@ +#!/bin/sh +# Checkout a PR branch into the current worktree and merge main. + +set -e + +: "${PR_REPO:?PR_REPO must be set (owner/repo format)}" +: "${PR_NUMBER:?PR_NUMBER must be set}" + +SCRATCH=".ai/pr-data" +mkdir -p "$SCRATCH" + +echo "=== Setting up PR #${PR_NUMBER} from ${PR_REPO} ===" + +# Add the PR's repo as a remote if needed +REMOTE_URL="https://github.com/${PR_REPO}.git" +if ! git remote get-url pr-origin >/dev/null 2>&1; then + git remote add pr-origin "$REMOTE_URL" +fi + +# Fetch PR metadata via gh +gh pr view "$PR_NUMBER" --repo "$PR_REPO" \ + --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,changedFiles \ + > "$SCRATCH/pr-meta.json" +gh pr view "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-view.txt" +gh pr diff "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-diff.patch" + +# Checkout the PR branch +gh pr checkout "$PR_NUMBER" --repo "$PR_REPO" --force + +# Record the PR branch state +PR_SHA=$(git rev-parse HEAD) +echo "PR branch HEAD: $PR_SHA" + +# Merge main into the PR branch (no rebase — preserve history) +BASE_BRANCH=$(cat "$SCRATCH/pr-meta.json" | python3 -c "import sys,json; print(json.load(sys.stdin)['baseRefName'])") +echo "Merging ${BASE_BRANCH} into PR branch..." +git fetch pr-origin "$BASE_BRANCH" +if git merge "pr-origin/${BASE_BRANCH}" --no-edit 2>"$SCRATCH/merge-stderr.txt"; then + echo "MERGE_STATUS=clean" > "$SCRATCH/merge-result.txt" + echo "Merge: clean" +else + echo "MERGE_STATUS=conflict" > "$SCRATCH/merge-result.txt" + git diff --name-only --diff-filter=U > "$SCRATCH/conflict-files.txt" 2>/dev/null || true + git merge --abort || true + echo "Merge: CONFLICTS detected (see $SCRATCH/conflict-files.txt)" +fi + +# Stats +TITLE=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['title'])") +FILES=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['changedFiles'])") +ADDS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['additions'])") +DELS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['deletions'])") + +echo "" +echo "PR #${PR_NUMBER}: ${TITLE}" +echo " Files changed: ${FILES} (+${ADDS} / -${DELS})" +echo " Diff: $(wc -l < "$SCRATCH/pr-diff.patch" | tr -d ' ') lines" +echo " Data: $SCRATCH/" From 159c0f495d010fdb5c66183ad1c9f581a69852b5 Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Wed, 1 Apr 2026 16:43:02 -0500 Subject: [PATCH 3/6] wip: fix build-test script survival across branch switch Setup script now copies build-test.sh to .ai/ before gh pr checkout changes the branch and removes workflow files. Co-Authored-By: Claude Opus 4.6 (1M context) --- workflows/pr-review/pr-review.dot | 2 +- workflows/pr-review/scripts/setup-pr.sh | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/workflows/pr-review/pr-review.dot b/workflows/pr-review/pr-review.dot index 02547514..e007edca 100644 --- a/workflows/pr-review/pr-review.dot +++ b/workflows/pr-review/pr-review.dot @@ -23,7 +23,7 @@ digraph pr_review { build_test [ shape=parallelogram, label="Build & Test", - tool_command="sh workflows/pr-review/scripts/build-test.sh" + tool_command="sh .ai/pr-data/build-test.sh" ] // Single agent: review code + produce final report diff --git a/workflows/pr-review/scripts/setup-pr.sh b/workflows/pr-review/scripts/setup-pr.sh index bcffffa1..11a194cb 100644 --- a/workflows/pr-review/scripts/setup-pr.sh +++ b/workflows/pr-review/scripts/setup-pr.sh @@ -11,6 +11,10 @@ mkdir -p "$SCRATCH" echo "=== Setting up PR #${PR_NUMBER} from ${PR_REPO} ===" +# Preserve workflow scripts before branch switch (PR branch won't have them) +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +cp "$SCRIPT_DIR/build-test.sh" "$SCRATCH/build-test.sh" 2>/dev/null || true + # Add the PR's repo as a remote if needed REMOTE_URL="https://github.com/${PR_REPO}.git" if ! git remote get-url pr-origin >/dev/null 2>&1; then From 54a2b10b6eb3849b465818283654147f2804a52c Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Thu, 2 Apr 2026 12:11:53 -0500 Subject: [PATCH 4/6] =?UTF-8?q?wip:=20PR=20review=20workflow=20v3=20?= =?UTF-8?q?=E2=80=94=20investigate/decide=20split?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Raw git checkout (no gh pr checkout) with unique branch names for parallel safety - Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents - Tighter output contract: next actions instead of follow-up tasks - Setup script preserves all workflow scripts to .ai/ before branch switch Co-Authored-By: Claude Opus 4.6 (1M context) --- workflows/pr-review/pr-review.dot | 152 +++++++++++++++------- workflows/pr-review/scripts/build-test.sh | 12 +- workflows/pr-review/scripts/setup-pr.sh | 56 ++++---- 3 files changed, 145 insertions(+), 75 deletions(-) diff --git a/workflows/pr-review/pr-review.dot b/workflows/pr-review/pr-review.dot index e007edca..efc37b54 100644 --- a/workflows/pr-review/pr-review.dot +++ b/workflows/pr-review/pr-review.dot @@ -1,9 +1,9 @@ -// Single-PR review workflow: setup repo, build/test, review, produce report. +// Single-PR review workflow: setup, investigate, decide. // Usage: PR_REPO=owner/repo PR_NUMBER=123 kilroy attractor run --graph pr-review.dot --config run.yaml digraph pr_review { graph [ - goal="Review GitHub pull request: checkout, build, test, and assess for merge readiness", + goal="Review GitHub pull request for merge readiness", model_stylesheet=" * { llm_model: claude-sonnet-4.6; llm_provider: anthropic; } " @@ -12,79 +12,135 @@ digraph pr_review { start [shape=Mdiamond, label="Start"] exit [shape=Msquare, label="Exit"] - // Checkout PR branch, merge main, fetch metadata and diff - setup_pr [ + // Deterministic setup: fetch metadata, checkout PR branch, merge base, run build/test + setup [ shape=parallelogram, - label="Setup PR", + label="Setup", tool_command="sh workflows/pr-review/scripts/setup-pr.sh" ] - // Build, test, format check — always exit 0, results saved to files - build_test [ - shape=parallelogram, - label="Build & Test", - tool_command="sh .ai/pr-data/build-test.sh" + // Agentic investigation: deep review with full tool access + investigate [ + shape=box, + label="Investigate", + auto_status=true, + prompt="You are investigating a GitHub pull request. The PR branch has been checked out +and merged with the base branch. Build and test results are available. + +START by reading these files: +- .ai/pr-data/pr-view.txt (PR description and discussion) +- .ai/pr-data/pr-meta.json (structured metadata: files changed, author, branches) +- .ai/pr-data/pr-diff.patch (the full diff) +- .ai/pr-data/merge-result.txt (whether the merge was clean or had conflicts) +- .ai/pr-data/build-result.txt (build pass/fail) +- .ai/pr-data/build-stderr.txt (build errors if any) +- .ai/pr-data/test-result.txt (test pass/fail) +- .ai/pr-data/test-output.txt (full test output) +- .ai/pr-data/gofmt-result.txt (format check pass/fail) + +THEN investigate the actual codebase. You have the full repo checked out. Use it: +- Read the source files that were changed. Read the surrounding code for context. +- If tests failed, determine whether failures are pre-existing on the base branch or + introduced by this PR. Check git blame, read the failing test code, understand why. +- If the PR claims to fix something, verify the fix actually works. +- If the PR adds functionality, check if it has tests and if they cover the key cases. +- Look for: bugs, security issues, missing error handling, naming problems, scope creep. +- Run additional targeted commands if needed (specific test files, linters, etc). + +Write your findings to .ai/pr-data/investigation.md as a structured evidence document: + +## PR Overview +[1-2 sentences: what this PR does] + +## Verification Results +- Build: [pass/fail] +- Tests: [pass/fail — list any failures and whether pre-existing or introduced] +- Format: [pass/fail — list affected files] +- Claims verified: [did you confirm the PR does what it says?] + +## Code Review Findings +[List each finding as a bullet. For each: what you found, where (file:line), and why it matters. +Include positive findings too — things done well that should be preserved.] + +## Scope Assessment +[Is this PR well-scoped? Does every changed file serve the stated purpose? +Flag any unrelated changes that should be in a separate PR.] + +## Risk Factors +[What could go wrong if merged? Edge cases, migration concerns, behavioral changes. +If low risk, say so briefly and move on.] + +Do NOT make recommendations or decisions. Your job is to gather evidence. The next step +will use your findings to make the merge decision. + +On failure, write to $KILROY_STAGE_STATUS_PATH (fallback $KILROY_STAGE_STATUS_FALLBACK_PATH): +{\"status\":\"fail\",\"failure_reason\":\"...\",\"failure_class\":\"deterministic\"}" ] - // Single agent: review code + produce final report - review [ + // Decision agent: reads evidence, produces actionable verdict + decide [ shape=box, - label="Review & Report", + label="Decide", auto_status=true, - prompt="You are a PR reviewer. The PR branch is already checked out in your working directory. Build and test results are available. - -READ THESE FILES FIRST: -- .ai/pr-data/pr-view.txt (PR description) -- .ai/pr-data/pr-meta.json (metadata: files, author, branches) -- .ai/pr-data/pr-diff.patch (the diff) -- .ai/pr-data/merge-result.txt (clean or conflict) -- .ai/pr-data/build-result.txt (pass or fail) -- .ai/pr-data/test-result.txt (pass or fail) -- .ai/pr-data/test-output.txt (full test output) -- .ai/pr-data/gofmt-result.txt (pass or fail) + prompt="You are making a merge decision for a GitHub pull request based on evidence +gathered by the investigation step. -You also have the full codebase available. Use it. Read the actual source files that were changed — not just the diff. Understand the surrounding context. +Read these files: +- .ai/pr-data/investigation.md (detailed findings from the investigator) +- .ai/pr-data/pr-view.txt (PR description for context) +- .ai/pr-data/pr-meta.json (metadata) -YOUR JOB: -Assess whether this PR should be merged. Be concise and actionable. +Do NOT run commands or read source code. The investigation is complete. Your job is to +decide and direct. Write .ai/pr-data/review-report.md in EXACTLY this format: ---- ## Decision: [one of: MERGE | MERGE-FIX | FIX-MERGE | CHERRY-PICK | SPLIT | REJECT] ## Rationale -[3-5 sentences max. Why this decision? What does the PR do and is it worth merging?] +[3-5 sentences max. What does the PR do, is it worth merging, and why this decision.] -## Build & Test +## Verification Summary - Build: [pass/fail] -- Tests: [pass/fail, note any failures and whether they are pre-existing or introduced by this PR] +- Tests: [pass/fail — note pre-existing vs introduced failures] - Format: [pass/fail] ## Blockers -[List anything that MUST be fixed before merge. If none, write 'None'.] - -## Follow-up Tasks -[Discrete tasks that should happen after merge — each with enough context that another agent could implement it without re-reading this PR. If none, write 'None'.] - -Example follow-up format: -- FIX: Remove duplicate test entry {\"gpt-5.4\", false} in internal/attractor/engine/cli_only_models_test.go line 13-14. The same case appears twice. Remove one and consider replacing it with a test for gpt-5.4-spark. ---- +[Things that MUST be fixed before merge. If none, write 'None'.] + +## Next Actions +[Ordered list of what should happen now. Be specific and directive but not prescriptive — +the downstream agent knows git mechanics. Tell it WHAT to do and WHY, not HOW to type it. + +Examples of good next actions: +- Merge PR #74 into main. +- Comment on PR with: 'Reviewed — merging. Follow-up PR incoming for cleanup items below.' +- Create a follow-up PR from main that fixes gofmt drift in engine.go around line 806 + (pre-existing from PR #76, map literal alignment) and removes the duplicate test entry + at cli_only_models_test.go line 14. +- Reject PR #99. Comment: 'This PR bundles auth refactoring with unrelated UI changes. + Please split into separate PRs — happy to review the auth changes independently.' + +Examples of bad next actions: +- Run `git checkout -b fix/cleanup && gofmt -w file.go && git add . && git commit...` + (too prescriptive — the downstream knows how to make a branch and commit) +- Address the issues found in the review. + (too vague — which issues? what specifically should change?) +] RULES: - Bias toward landing contributions. Prefer MERGE-FIX over FIX-MERGE for minor issues. -- Do NOT pad the report. No tables, no severity tags, no dimensions matrix. Just the sections above. -- Every follow-up task must specify: the file, what to change, and why. -- If build/test fails, check whether failures are pre-existing on main or introduced by this PR. -- Verify claims in the PR description against what the code actually does. +- Every action must be specific enough that someone unfamiliar with the PR can execute it. +- Keep the whole report under 40 lines. No padding, no tables, no filler. +- If the investigation found nothing concerning, say so briefly and move on. On failure, write to $KILROY_STAGE_STATUS_PATH (fallback $KILROY_STAGE_STATUS_FALLBACK_PATH): {\"status\":\"fail\",\"failure_reason\":\"...\",\"failure_class\":\"deterministic\"}" ] - start -> setup_pr - setup_pr -> build_test [condition="outcome=success"] - setup_pr -> exit - build_test -> review - review -> exit + start -> setup + setup -> investigate [condition="outcome=success"] + setup -> exit + investigate -> decide + decide -> exit } diff --git a/workflows/pr-review/scripts/build-test.sh b/workflows/pr-review/scripts/build-test.sh index f7720da5..30471ba5 100644 --- a/workflows/pr-review/scripts/build-test.sh +++ b/workflows/pr-review/scripts/build-test.sh @@ -1,5 +1,6 @@ #!/bin/sh -# Run build and tests, capture results. Failure is a finding, not a blocker. +# Run build, tests, and format checks. Captures results for the review agent. +# Always exits 0 — failures are findings, not blockers. SCRATCH=".ai/pr-data" mkdir -p "$SCRATCH" @@ -10,18 +11,19 @@ if go build ./... 2>"$SCRATCH/build-stderr.txt"; then echo "Build: PASS" else echo "BUILD=fail" > "$SCRATCH/build-result.txt" - echo "Build: FAIL (see $SCRATCH/build-stderr.txt)" + echo "Build: FAIL" cat "$SCRATCH/build-stderr.txt" fi echo "" echo "=== Tests ===" -if go test ./... 2>&1 | tee "$SCRATCH/test-output.txt"; then +go test ./... 2>&1 | tee "$SCRATCH/test-output.txt" +if [ "${PIPESTATUS[0]:-${pipestatus[1]:-1}}" -eq 0 ]; then echo "TESTS=pass" > "$SCRATCH/test-result.txt" echo "Tests: PASS" else echo "TESTS=fail" > "$SCRATCH/test-result.txt" - echo "Tests: FAIL (see $SCRATCH/test-output.txt)" + echo "Tests: FAIL" fi echo "" @@ -34,7 +36,7 @@ else echo "GOFMT=fail" > "$SCRATCH/gofmt-result.txt" echo "$GOFMT_OUT" > "$SCRATCH/gofmt-files.txt" echo "gofmt: FAIL — $(echo "$GOFMT_OUT" | wc -l | tr -d ' ') files" + cat "$SCRATCH/gofmt-files.txt" fi -# Always exit 0 — build/test failure is a finding, not a graph failure exit 0 diff --git a/workflows/pr-review/scripts/setup-pr.sh b/workflows/pr-review/scripts/setup-pr.sh index 11a194cb..18b0b15b 100644 --- a/workflows/pr-review/scripts/setup-pr.sh +++ b/workflows/pr-review/scripts/setup-pr.sh @@ -1,5 +1,5 @@ #!/bin/sh -# Checkout a PR branch into the current worktree and merge main. +# Checkout a PR branch into the current worktree via raw git, merge base branch. set -e @@ -13,50 +13,62 @@ echo "=== Setting up PR #${PR_NUMBER} from ${PR_REPO} ===" # Preserve workflow scripts before branch switch (PR branch won't have them) SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -cp "$SCRIPT_DIR/build-test.sh" "$SCRATCH/build-test.sh" 2>/dev/null || true +for f in "$SCRIPT_DIR"/*.sh; do + cp "$f" "$SCRATCH/" 2>/dev/null || true +done -# Add the PR's repo as a remote if needed -REMOTE_URL="https://github.com/${PR_REPO}.git" -if ! git remote get-url pr-origin >/dev/null 2>&1; then - git remote add pr-origin "$REMOTE_URL" -fi - -# Fetch PR metadata via gh +# Fetch PR metadata via gh (read-only, no state changes) gh pr view "$PR_NUMBER" --repo "$PR_REPO" \ --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,changedFiles \ > "$SCRATCH/pr-meta.json" gh pr view "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-view.txt" gh pr diff "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-diff.patch" -# Checkout the PR branch -gh pr checkout "$PR_NUMBER" --repo "$PR_REPO" --force +# Extract metadata +BASE_BRANCH=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['baseRefName'])") +HEAD_BRANCH=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['headRefName'])") +TITLE=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['title'])") + +# Add upstream remote for the PR's repo +REMOTE_URL="https://github.com/${PR_REPO}.git" +if ! git remote get-url upstream >/dev/null 2>&1; then + git remote add upstream "$REMOTE_URL" +fi + +# Fetch PR ref and base branch via raw git +echo "Fetching PR #${PR_NUMBER} head and base (${BASE_BRANCH})..." +git fetch upstream "pull/${PR_NUMBER}/head" --quiet +git fetch upstream "$BASE_BRANCH" --quiet + +# Checkout PR at FETCH_HEAD on a unique branch name for this run +# Unique name avoids git worktree branch-lock conflicts with parallel runs +RUN_TAG=$(echo "${KILROY_RUN_ID:-$$}" | tail -c 9) +LOCAL_BRANCH="pr-review/${PR_NUMBER}-${RUN_TAG}" +git checkout -b "$LOCAL_BRANCH" FETCH_HEAD --quiet -# Record the PR branch state PR_SHA=$(git rev-parse HEAD) -echo "PR branch HEAD: $PR_SHA" +echo "PR branch HEAD: $PR_SHA (local: $LOCAL_BRANCH)" -# Merge main into the PR branch (no rebase — preserve history) -BASE_BRANCH=$(cat "$SCRATCH/pr-meta.json" | python3 -c "import sys,json; print(json.load(sys.stdin)['baseRefName'])") -echo "Merging ${BASE_BRANCH} into PR branch..." -git fetch pr-origin "$BASE_BRANCH" -if git merge "pr-origin/${BASE_BRANCH}" --no-edit 2>"$SCRATCH/merge-stderr.txt"; then +# Merge base branch into PR branch (no rebase — preserve history) +echo "Merging upstream/${BASE_BRANCH} into PR branch..." +if git merge "upstream/${BASE_BRANCH}" --no-edit 2>"$SCRATCH/merge-stderr.txt"; then echo "MERGE_STATUS=clean" > "$SCRATCH/merge-result.txt" echo "Merge: clean" else echo "MERGE_STATUS=conflict" > "$SCRATCH/merge-result.txt" git diff --name-only --diff-filter=U > "$SCRATCH/conflict-files.txt" 2>/dev/null || true git merge --abort || true - echo "Merge: CONFLICTS detected (see $SCRATCH/conflict-files.txt)" + echo "Merge: CONFLICTS (see $SCRATCH/conflict-files.txt)" fi -# Stats -TITLE=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['title'])") +# Write setup summary FILES=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['changedFiles'])") ADDS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['additions'])") DELS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['deletions'])") echo "" echo "PR #${PR_NUMBER}: ${TITLE}" +echo " Author: $(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['author']['login'])")" +echo " Branch: ${HEAD_BRANCH} -> ${BASE_BRANCH}" echo " Files changed: ${FILES} (+${ADDS} / -${DELS})" echo " Diff: $(wc -l < "$SCRATCH/pr-diff.patch" | tr -d ' ') lines" -echo " Data: $SCRATCH/" From 1827e16ceed13c710c58053aa47c89b932695ee6 Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Thu, 2 Apr 2026 12:54:40 -0500 Subject: [PATCH 5/6] wip: make PR review workflow repo-agnostic - Absolute path for setup script (works from any repo's worktree) - Remove Go-specific assumptions from investigate prompt - Agent discovers build system and runs appropriate checks - Add freshell run config Co-Authored-By: Claude Opus 4.6 (1M context) --- workflows/pr-review/pr-review.dot | 17 ++++++++--------- workflows/pr-review/run-freshell.yaml | 17 +++++++++++++++++ 2 files changed, 25 insertions(+), 9 deletions(-) create mode 100644 workflows/pr-review/run-freshell.yaml diff --git a/workflows/pr-review/pr-review.dot b/workflows/pr-review/pr-review.dot index efc37b54..ff3dccc8 100644 --- a/workflows/pr-review/pr-review.dot +++ b/workflows/pr-review/pr-review.dot @@ -16,7 +16,7 @@ digraph pr_review { setup [ shape=parallelogram, label="Setup", - tool_command="sh workflows/pr-review/scripts/setup-pr.sh" + tool_command="sh /Users/matt/sw/personal/kilroy/workflows/pr-review/scripts/setup-pr.sh" ] // Agentic investigation: deep review with full tool access @@ -32,20 +32,19 @@ START by reading these files: - .ai/pr-data/pr-meta.json (structured metadata: files changed, author, branches) - .ai/pr-data/pr-diff.patch (the full diff) - .ai/pr-data/merge-result.txt (whether the merge was clean or had conflicts) -- .ai/pr-data/build-result.txt (build pass/fail) -- .ai/pr-data/build-stderr.txt (build errors if any) -- .ai/pr-data/test-result.txt (test pass/fail) -- .ai/pr-data/test-output.txt (full test output) -- .ai/pr-data/gofmt-result.txt (format check pass/fail) -THEN investigate the actual codebase. You have the full repo checked out. Use it: +THEN investigate the actual codebase. You have the full repo checked out at the PR branch +merged with the latest base branch. Use it: - Read the source files that were changed. Read the surrounding code for context. -- If tests failed, determine whether failures are pre-existing on the base branch or +- Determine the project's build system and test framework (package.json, go.mod, Makefile, etc). +- Build the project. Run the test suite. If the full suite is too slow, run tests targeted + at the packages/modules touched by the PR. +- If tests fail, determine whether failures are pre-existing on the base branch or introduced by this PR. Check git blame, read the failing test code, understand why. - If the PR claims to fix something, verify the fix actually works. - If the PR adds functionality, check if it has tests and if they cover the key cases. - Look for: bugs, security issues, missing error handling, naming problems, scope creep. -- Run additional targeted commands if needed (specific test files, linters, etc). +- Run additional targeted commands if needed (linters, type checking, etc). Write your findings to .ai/pr-data/investigation.md as a structured evidence document: diff --git a/workflows/pr-review/run-freshell.yaml b/workflows/pr-review/run-freshell.yaml new file mode 100644 index 00000000..84170687 --- /dev/null +++ b/workflows/pr-review/run-freshell.yaml @@ -0,0 +1,17 @@ +version: 1 + +repo: + path: /Users/matt/sw/personal/freshell + +llm: + providers: + anthropic: + backend: api + +git: + require_clean: false + commit_per_node: false + +runtime_policy: + stage_timeout_ms: 0 + stall_timeout_ms: 600000 From 196d84cb2e4de1ba269f4dc20372bed2f54e2359 Mon Sep 17 00:00:00 2001 From: Matt Leaverton Date: Fri, 24 Apr 2026 17:15:35 -0500 Subject: [PATCH 6/6] Round 0 fixes: install-skills builds binary, agents pinned to worktree MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From docs/plans/2026-04-24-kilroy-fixes-from-feedback.md items #5 and #1 (engine half) — the two defaults bugs that must land before we can safely dogfood the remaining fixes through quick-launch. #5 install-skills.sh builds before linking. A stale binary on PATH was silently breaking skills that depend on newer flags (--tmux, --prompt-file, --label, --package). One `go build` at install time is cheap insurance. #1 Agent system prompts now include a worktree-context preamble: "You are running inside an isolated Kilroy worktree at . Do not cd elsewhere or pass -C to git with paths outside cwd." Applied in both the tmux agent handler (primary) and the CodergenHandler API/subprocess path. Without this, a prompt that casually mentions a user's source-tree path would pull agents out of their worktree and clobber the real checkout. Also folds in the pre-existing cleanup that moved quick-launch and pr-review skills + workflow packages to gf-software-factory, and adds the feedback plan doc that drives the rest of this initiative. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-24-kilroy-fixes-from-feedback.md | 277 ++++++++++++++++++ internal/attractor/agents/tmux_handler.go | 7 + .../attractor/agents/tmux_handler_test.go | 9 +- internal/attractor/engine/handlers.go | 27 ++ .../engine/worktree_preamble_test.go | 36 +++ scripts/install-skills.sh | 36 ++- skills/quick-launch/SKILL.md | 199 ------------- skills/quick-launch/commands/kilroy-quick.md | 19 -- workflows/pr-review/graph.dot | 73 ----- workflows/pr-review/pr-review.dot | 145 --------- workflows/pr-review/prompts/code-review.md | 51 ---- .../pr-review/prompts/holistic-review.md | 65 ---- workflows/pr-review/run-freshell.yaml | 17 -- workflows/pr-review/run.yaml | 17 -- workflows/pr-review/scripts/build-test.sh | 83 ------ workflows/pr-review/scripts/collect-diff.sh | 33 --- workflows/pr-review/scripts/setup-pr.sh | 80 ----- .../pr-review/scripts/write-fail-report.sh | 59 ---- workflows/pr-review/workflow.toml | 21 -- workflows/quick-launch/graph.codex.dot | 27 -- workflows/quick-launch/graph.dot | 40 --- workflows/quick-launch/graph.gemini.dot | 27 -- workflows/quick-launch/workflow.toml | 26 -- 23 files changed, 372 insertions(+), 1002 deletions(-) create mode 100644 docs/plans/2026-04-24-kilroy-fixes-from-feedback.md create mode 100644 internal/attractor/engine/worktree_preamble_test.go delete mode 100644 skills/quick-launch/SKILL.md delete mode 100644 skills/quick-launch/commands/kilroy-quick.md delete mode 100644 workflows/pr-review/graph.dot delete mode 100644 workflows/pr-review/pr-review.dot delete mode 100644 workflows/pr-review/prompts/code-review.md delete mode 100644 workflows/pr-review/prompts/holistic-review.md delete mode 100644 workflows/pr-review/run-freshell.yaml delete mode 100644 workflows/pr-review/run.yaml delete mode 100644 workflows/pr-review/scripts/build-test.sh delete mode 100644 workflows/pr-review/scripts/collect-diff.sh delete mode 100644 workflows/pr-review/scripts/setup-pr.sh delete mode 100644 workflows/pr-review/scripts/write-fail-report.sh delete mode 100644 workflows/pr-review/workflow.toml delete mode 100644 workflows/quick-launch/graph.codex.dot delete mode 100644 workflows/quick-launch/graph.dot delete mode 100644 workflows/quick-launch/graph.gemini.dot delete mode 100644 workflows/quick-launch/workflow.toml diff --git a/docs/plans/2026-04-24-kilroy-fixes-from-feedback.md b/docs/plans/2026-04-24-kilroy-fixes-from-feedback.md new file mode 100644 index 00000000..e9b1aef7 --- /dev/null +++ b/docs/plans/2026-04-24-kilroy-fixes-from-feedback.md @@ -0,0 +1,277 @@ +--- +date: 2026-04-24 +status: open +audience: kilroy engine + workflow contributors +--- + +# Kilroy fixes — from `quick-launch` and `pr-review` agent feedback + +Two agent sessions hit real bugs running `quick-launch` and `pr-review` against live work +(the pr-review session reviewed PRs across `danshapiro/freshell` and other repos; the +quick-launch session ran 10+ parallel research agents against `bitbucket/svg-compose`). +This doc captures every actionable fix from both, ranked by impact, with the workflow-level +mitigations that have already shipped called out so the engine work can be scoped cleanly. + +The `pr-review` and `quick-launch` skills + workflow packages now live in +`gf-software-factory/{skills,workflows}` (the kilroy-side copies were removed in the same +pass that produced this doc). Engine-level fixes still belong in this repo. + +--- + +## Blockers — must fix before broader rollout + +These caused user-visible damage or silent success-on-failure in real runs. + +### 1. Agents leak out of the worktree when prompts contain absolute paths + +**What happened.** A user's prompt opened with "You are working on `/Users/matt/bitbucket/svg-compose`." +Ten parallel agents followed the literal path: they `cd`'d to the user's main source tree +and ran `git -C /Users/matt/bitbucket/svg-compose ...` and `Edit` against absolute paths, +clobbering the shared tree while their isolated worktrees sat untouched. Their final commits +on `attractor/run/...` branches contained only `.gitignore` updates. End result: 10 agents +racing on the user's actual repo. + +**Why the docs didn't catch it.** The skill says "the engine auto-detects and creates an +isolated run branch + worktree, so the user's source tree is never touched." That's true +of `cwd`, not of the paths the LLM writes in `Bash`/`Edit` calls. + +**Fix path, in order of impact.** + +1. **Agent system-prompt prepend** (highest leverage, low effort): "You are running inside + an isolated Kilroy worktree at `{path}`. All work must happen relative to `cwd`; do not + `cd` elsewhere or pass `-C` to git with paths outside `cwd`." Owners: tmux agent + harness in `internal/agents/...` (or wherever the system-prompt assembly lives). +2. **Bash-tool path guard** (heaviest, optional): wrap `Bash` so writes outside the worktree + require `--confirm-out-of-tree-writes` or fail loudly. Heavy-handed but appropriate for + fire-and-forget agents. +3. **SKILL.md sharp-edge note** (already shipped in `gf-software-factory/skills/quick-launch/SKILL.md` + — see Step 0 preflight; consider also adding to `pr-review`). + +### 2. Detached runs don't register in the run DB until terminal + +**What happened.** `runs wait --latest --label`, `runs wait `, `status --latest`, and +`runs list` all fail to find an active detached run — they only see completed ones. The +first version of the `pr-review` skill modeled `runs wait --label` on `quick-launch`'s docs; +it broke on the first live run. + +**Workaround in shipped skills.** Both `pr-review` and (partially) `quick-launch` now +capture `logs_root=...` from the launch stdout and poll `final.json`. That's mechanical +and works, but it's a regression from the documented `runs wait` UX. + +**Fix.** Write the run row to the DB at launch with `status=running` so `wait`/`show`/`list` +operate on active runs, not just terminated ones. Owners: `cmd/kilroy/attractor_run.go` +and the run-DB writer in `internal/runs/...`. + +### 3. Reaching any terminal node = `status=success` + +**What happened.** PR #303 v1 hit 4× transient `git fetch` failures in setup. The original +`pr-review` graph followed the unconditional `setup -> done` fallback edge, so the engine +reached the `done` node and reported `status=success` at the run level — with no review +produced. Every graph author hits this trap. + +**Workflow-level mitigation (already shipped).** `gf-software-factory/workflows/pr-review/graph.dot` +now routes `setup -> fail_report` and `collect_diff -> fail_report` instead of `-> done`, +matching `build_test -> fail_report`. `write-fail-report.sh` infers the failed phase from +artifact presence and writes a phase-specific `review-report.md`. + +**Engine fix (still wanted).** The trap is graph-author-shaped, not workflow-shaped. +Either: + +- **(a)** track "any stage failed?" in the engine and surface a distinct terminal state + (e.g. `status=completed_with_failures`) on `final.json`; or +- **(b)** require an explicit `condition=` on every inbound edge to a terminal node, and + fail validation otherwise. + +(b) catches the bug at validate time, before runs happen. (a) is a runtime safety net. +Both are useful; (b) is cheaper. + +### 4. Agents don't inherit MCP servers from the parent Claude Code session + +**What happened.** The user told their parent Claude session to verify work in Chrome +via `mcp__claude-in-chrome__*`. When that conversation launched 10 detached `quick-launch` +agents, those tools didn't exist in the children. The agents reached for `osascript`, +`screencapture`, `chrome --headless`, and `curl` via Bash to compensate — macOS-level +automation the user did not consent to. That's the "wild things happening with Chrome" +the user reported. + +**Fix path.** + +1. **Document prominently** (cheapest): add a sentence to `quick-launch` SKILL.md and + `pr-review` SKILL.md: "Kilroy agents do not inherit the parent Claude Code session's + MCP servers. Only Bash, Edit, Read are available unless you forward MCP via run config." +2. **Optional opt-in forwarding**: a `--forward-mcp ` flag (or `mcp_forward:` in + `run.yaml`) that copies a parent server's config into the child agent. Not every parent + server makes sense to forward; opt-in is the right default. + +### 5. `install-skills.sh` doesn't rebuild the binary + +**What happened.** The user re-ran `install-skills.sh` to relink the workflow symlink at +their checkout. The `kilroy` binary on PATH was 17 days stale (pre-`--tmux`). The first +sweep loop produced no launches because a `grep '^logs_root='` matched nothing on error +output that no longer included that line. + +**Fix.** Either (a) the install script should run `go build ./cmd/kilroy` before linking, +or (b) it should `stat` the binary against `find . -name '*.go' -newer ...` and bail +loudly when stale. (a) is one extra line and handles every case. The kilroy-side +`scripts/install-skills.sh` does not currently build; the new factory-side +`gf-software-factory/scripts/install-kilroy-host.sh` does build. + +--- + +## Important — fix before more graph authors arrive + +### 6. `--help` is missing half the CLI surface + +`kilroy attractor run --help` doesn't document `--tmux`, `--prompt-file`, `--label`, +`--workspace`, `--input`, `--package`, `--detach`. `kilroy attractor runs --help` lists +only `list` and `prune` — no `show`, no `wait`. Every flag the shipped skills use is +load-bearing but undiscoverable. + +**Fix.** Regenerate help text from the actual flag parser in `cmd/kilroy/main.go`. If the +parser is hand-rolled (likely), align the printed usage with the parsed flags as a unit +test — drift here is the recurring class. + +### 7. Progress stream has no terminal event + +`progress.ndjson` emits `stage_attempt_*`, `edge_selected`, `input_materialization_*`, +`tmux_session_*`. Nothing for run completion. The canonical terminal signal today is the +appearance of `final.json`. That invariant isn't documented anywhere I could find. + +**Fix.** Emit a `run_completed` / `run_failed` event as the final line of `progress.ndjson`, +plus document `final.json` as the public completion contract. + +### 8. Worktree cleanup isn't tied to run completion + +After a long working session, `git worktree list` showed 40+ orphaned worktrees from +attractor runs going back weeks — all physically gone from disk, stale git admin refs. +The user pruned manually with `git worktree prune`. + +**Fix.** `git worktree remove` the run's worktree on successful terminal state. Keep it +on failure for forensics. Wire to whatever runs the existing per-run cleanup (or add it). + +### 9. `result.md` contract is brittle under interruption + +When the user `stop --force`'d 10 agents mid-flight, only 1 of 10 had written `result.md`. +Several had done substantive committed work but hadn't reached the "write the summary" +step. The `outputs=` contract surfaces this as missing `result.md` — a generic error that +obscures what actually happened. + +**Fix options.** + +- Auto-flush a partial `result.md` on graceful termination, with a `__stopped_early__: true` + marker and the last assistant text block. +- Or: make `result.md` optional and let `runs show --print result.md` fall back to a + generated one-line summary from the transcript. + +### 10. `pr-review` workflow has no GitHub-review publish step + +The workflow writes `review-report.md` to disk but never posts a GitHub review. Branch +protection requires an approved review, so the workflow's verdict is invisible to the +merge gate. We hit this on the first merge batch — all 5 PRs blocked by `REVIEW_REQUIRED`. + +**Fix.** Add a node `post_review` after `holistic_review` that runs +`gh pr review --approve|--request-changes --body "$(cat review-report.md)"`, gated by +the holistic decision token: `MERGE` → approve; `MERGE-FIX`/`FIX-MERGE` → request-changes +with checklist; `REJECT` → request-changes with rejection comment. Lives in the workflow, +not the engine. Owner: `gf-software-factory/workflows/pr-review/`. + +### 11. `pr-review` setup uses HTTPS with no auth fallback + +PR #303 v1 failed 4× with `remote: Internal Server Error` (HTTP 500) on `git fetch upstream +pull/N/head`. When the workspace is an existing clone, `setup-pr.sh` inherits whatever URL +is on the upstream remote — plain HTTPS, no credential helper. Flaky under GitHub pressure. + +**Fix.** Use `gh pr checkout --repo ` (handles auth via `gh`) or add an +explicit retry-with-SSH-URL fallback. Workflow-level fix; lives in +`gf-software-factory/workflows/pr-review/scripts/setup-pr.sh`. + +### 12. Cross-repo PRs aren't handled + +PR #297 was a fork PR from `Glowforge/freshell`. Nothing in the graph detects "is this +same-repo?" — a follow-up fix agent tried to push and hit HTTP 403. + +**Fix.** `setup-pr.sh` should record `headRepositoryOwner` in `pr-meta.json`, and +`fail_report` should surface a distinct "cross-repo, push blocked" case so downstream +agents/humans know not to try. Workflow-level fix. + +--- + +## Quality of life + +### 13. `KILROY_PREDECESSOR_NODE` env for failure handlers + +Today `write-fail-report.sh` infers the failed phase from filesystem state (`pr-meta.json` +exists? `build-report.json` exists?). That's a workaround for `fail_report` not knowing +its predecessor node. Adding `KILROY_PREDECESSOR_NODE` (and maybe `KILROY_PREDECESSOR_OUTCOME`) +to the env when a node fires off a non-primary edge would let failure handlers be +deterministic instead of probing. + +### 14. `runs show --worktree-path` accessor + +Today you `runs show ` and grep for `worktree_dir`. A scripting-friendly +`runs show --print-worktree-path` would clean up the bash patterns in skills. + +### 15. Concurrency cap + +10 parallel `attractor run`s worked but generated a lot of simultaneous LLM usage. +A `--max-concurrent N` flag (or a config setting) would help users self-pace without +writing their own throttle. + +### 16. Long worktree paths + +`/Users/matt/.local/state/kilroy/attractor/runs/01K.../worktree/` is a mouthful. +A `runs cd ` helper or a documented shell-function template ("here, source this +into your zshrc") would smooth post-mortem inspection. + +### 17. `progress.ndjson` vs `agent_output.jsonl` documentation gap + +Progress shows lifecycle events (start, input-materialization, tmux-session-start, +terminal); the actual tool-call log is in `agent/agent_output.jsonl`. Several minutes +were wasted reading the wrong file when debugging. A short paragraph in +`docs/runs-layout.md` (or wherever) would prevent that. + +### 18. `Write` tool for agents + +Agents have `Edit` + `Bash` heredoc only — no `Write` tool. Functionally fine, but +unconventional vs Claude Code's standard tool set. Worth either (a) adding `Write`, +or (b) noting in skill docs so prompt authors don't expect it. + +--- + +## Things that worked — protect these + +The pr-review session reported the following, with no kilroy-side failures across 8 +parallel runs and 3 sequential batches: + +- **`--prompt-file`** — writing prose to a file beat JSON-escaping multi-paragraph + prompts. Keep it. +- **Labeling + `--latest --label task=X` lookup** — made multi-run orchestration + tractable across 10+ concurrent runs. +- **`runs wait --timeout`** behavior — streams status to stderr, sane exit codes. + (Note: the active-run lookup gap from item #2 is orthogonal.) +- **`stop --force`** across many concurrent runs — reliable, no orphaned tmux sessions. +- **CLI backend via Claude subscription** — no API keys, no rate-limit surprises. +- **`agent_output.jsonl`** — complete transcript saved a post-mortem investigation. +- **Git-worktree-per-run isolation** — the right design when it's actually used (see + item #1 for when it isn't). +- **Auto-detect provider CLIs + auto-build default config when no `run.yaml` is supplied** — + removed a lot of setup friction for the skill author. + +--- + +## Suggested rollout order + +If a single contributor picks this up: + +1. **Item #1** (worktree isolation prompt prepend) and **#5** (install-skills builds binary). + Both are one-line patches with high leverage. Do them first. +2. **Item #2** (DB-write at launch) and **#6** (help-text alignment). These unlock the + documented UX. +3. **Item #7** (`run_completed` event) and **#8** (worktree cleanup). Polish that + compounds across every future skill. +4. **Item #3** (engine-side terminal-state guard). Pick (b) — validator-time. Less + runtime surface area to verify. +5. **Items #10–#12** (workflow-level pr-review fixes). These live in + `gf-software-factory/workflows/pr-review/` now. +6. **Item #4** (MCP forwarding). Nice but optional. +7. **Items #13–#18**. Quality of life as time allows. diff --git a/internal/attractor/agents/tmux_handler.go b/internal/attractor/agents/tmux_handler.go index 30fd4a62..82f8ac59 100644 --- a/internal/attractor/agents/tmux_handler.go +++ b/internal/attractor/agents/tmux_handler.go @@ -60,6 +60,13 @@ func (h *TmuxAgentHandler) Execute(ctx context.Context, exec *engine.Execution, if prompt == "" { prompt = node.Label() } + if wtPreamble := strings.TrimSpace(engine.BuildWorktreeContextPreamble(exec.WorktreeDir)); wtPreamble != "" { + if strings.TrimSpace(prompt) == "" { + prompt = wtPreamble + } else { + prompt = wtPreamble + "\n\n" + strings.TrimSpace(prompt) + } + } // Session name: kilroy-{runID}-{nodeID} (unique per node execution). runID := "" diff --git a/internal/attractor/agents/tmux_handler_test.go b/internal/attractor/agents/tmux_handler_test.go index aa20a01c..ce5f7833 100644 --- a/internal/attractor/agents/tmux_handler_test.go +++ b/internal/attractor/agents/tmux_handler_test.go @@ -122,7 +122,8 @@ func TestTmuxAgentHandler_FakeAgent_SuccessfulExecution(t *testing.T) { t.Fatalf("response = %q, want to contain FAKE_AGENT_OUTPUT_OK", string(resp)) } - // Verify prompt.md was written. + // Verify prompt.md was written — the user prompt survives, and the + // worktree-context preamble is prepended so the agent stays in cwd. promptPath := filepath.Join(logsRoot, "test_node", "prompt.md") prompt, err := os.ReadFile(promptPath) if err != nil { @@ -131,6 +132,12 @@ func TestTmuxAgentHandler_FakeAgent_SuccessfulExecution(t *testing.T) { if !strings.Contains(string(prompt), "do something") { t.Fatalf("prompt = %q, want 'do something'", string(prompt)) } + if !strings.Contains(string(prompt), "WORKTREE CONTEXT") { + t.Fatalf("prompt = %q, want worktree-context preamble", string(prompt)) + } + if !strings.Contains(string(prompt), workDir) { + t.Fatalf("prompt = %q, want worktree path %q", string(prompt), workDir) + } } func TestTmuxAgentHandler_FakeAgent_FailedExecution(t *testing.T) { diff --git a/internal/attractor/engine/handlers.go b/internal/attractor/engine/handlers.go index 3687e711..daad382d 100644 --- a/internal/attractor/engine/handlers.go +++ b/internal/attractor/engine/handlers.go @@ -487,6 +487,24 @@ func BuildManualBoxFanInPromptPreamble(exec *Execution, node *model.Node) string return strings.TrimSpace(b.String()) } +// BuildWorktreeContextPreamble returns a short preamble that pins the agent to +// its isolated worktree. Without it, an agent following a prompt that mentions +// absolute paths outside the worktree will happily `cd` out and clobber the +// user's source tree. Returns "" if worktreeDir is empty. +func BuildWorktreeContextPreamble(worktreeDir string) string { + worktreeDir = strings.TrimSpace(worktreeDir) + if worktreeDir == "" { + return "" + } + var b strings.Builder + b.WriteString("WORKTREE CONTEXT\n") + b.WriteString(fmt.Sprintf("- You are running inside an isolated Kilroy worktree at %s.\n", worktreeDir)) + b.WriteString("- All work must happen relative to cwd. Do not `cd` elsewhere.\n") + b.WriteString("- Do not pass `-C ` to git with a path outside cwd.\n") + b.WriteString("- If the task description mentions absolute paths outside this worktree, treat them as informational only — do not read, write, or run commands against them.\n") + return b.String() +} + func (h *CodergenHandler) Execute(ctx context.Context, exec *Execution, node *model.Node) (runtime.Outcome, error) { stageDir := filepath.Join(exec.LogsRoot, node.ID) stageStatusPath := filepath.Join(stageDir, "status.json") @@ -538,6 +556,15 @@ func (h *CodergenHandler) Execute(ctx context.Context, exec *Execution, node *mo promptText = preamble + "\n\n" + strings.TrimSpace(promptText) } } + if exec != nil { + if wtPreamble := strings.TrimSpace(BuildWorktreeContextPreamble(exec.WorktreeDir)); wtPreamble != "" { + if strings.TrimSpace(promptText) == "" { + promptText = wtPreamble + } else { + promptText = wtPreamble + "\n\n" + strings.TrimSpace(promptText) + } + } + } if env := BuildStageRuntimeEnv(exec, node.ID); len(env) > 0 { if manifestPath := strings.TrimSpace(env[inputsManifestEnvKey]); manifestPath != "" { preamble := strings.TrimSpace(MustRenderInputMaterializationPromptPreamble(manifestPath)) diff --git a/internal/attractor/engine/worktree_preamble_test.go b/internal/attractor/engine/worktree_preamble_test.go new file mode 100644 index 00000000..5b40a604 --- /dev/null +++ b/internal/attractor/engine/worktree_preamble_test.go @@ -0,0 +1,36 @@ +// Tests for BuildWorktreeContextPreamble — the per-run isolation notice that +// pins agents to their worktree so prompts with stray absolute paths don't +// pull them into the user's source tree. +package engine + +import ( + "strings" + "testing" +) + +func TestBuildWorktreeContextPreamble_EmptyWhenNoWorktree(t *testing.T) { + if got := BuildWorktreeContextPreamble(""); got != "" { + t.Fatalf("empty worktreeDir: got %q, want empty", got) + } + if got := BuildWorktreeContextPreamble(" "); got != "" { + t.Fatalf("whitespace worktreeDir: got %q, want empty", got) + } +} + +func TestBuildWorktreeContextPreamble_IncludesPathAndGuidance(t *testing.T) { + worktree := "/tmp/kilroy-run-abc/worktree" + got := BuildWorktreeContextPreamble(worktree) + if !strings.Contains(got, worktree) { + t.Fatalf("preamble missing worktree path %q: %s", worktree, got) + } + for _, want := range []string{ + "WORKTREE CONTEXT", + "isolated Kilroy worktree", + "Do not `cd` elsewhere", + "informational only", + } { + if !strings.Contains(got, want) { + t.Fatalf("preamble missing %q: %s", want, got) + } + } +} diff --git a/scripts/install-skills.sh b/scripts/install-skills.sh index eea44a13..37ad359d 100755 --- a/scripts/install-skills.sh +++ b/scripts/install-skills.sh @@ -12,31 +12,35 @@ # Workflow package root (stable path for --package): # ~/.local/share/kilroy/workflows -> /workflows # -# Claude Code skills + slash command: -# ~/.claude/skills/quick-launch -> /skills/quick-launch +# Claude Code skills: # ~/.claude/skills/using-kilroy -> /skills/using-kilroy -# ~/.claude/commands/kilroy-quick.md -> /skills/quick-launch/commands/kilroy-quick.md # # Codex skills (codex discovers from ~/.agents/skills/, not ~/.codex/skills/): -# ~/.agents/skills/quick-launch -> /skills/quick-launch # ~/.agents/skills/using-kilroy -> /skills/using-kilroy # # Opencode skills (user-level opencode config dir): -# ~/.config/opencode/skills/quick-launch -> /skills/quick-launch # ~/.config/opencode/skills/using-kilroy -> /skills/using-kilroy # -# Codex and Opencode have no user-level slash-command system, so there is no -# `/kilroy-quick` in those agents — invoke by asking the agent to use the -# quick-launch skill by name. +# The `quick-launch` and `pr-review` skills (and their workflow packages) now +# live in the gf-software-factory repo at sibling location +# `gf-software-factory/skills/{quick-launch,pr-review}` and +# `gf-software-factory/workflows/{quick-launch,pr-review}`. Install those via +# `gf-software-factory/scripts/install-kilroy-host.sh`. set -euo pipefail REPO="$(cd "$(dirname "$0")/.." && pwd)" BINARY="$REPO/kilroy" +# Always build before installing so the binary on PATH reflects the current +# checkout. Stale binaries are a silent failure mode for skills that depend on +# newer flags (--tmux, --prompt-file, --label, ...). One extra compile at +# install time is cheap insurance. +echo "building kilroy binary..." +(cd "$REPO" && go build -o "$BINARY" ./cmd/kilroy) + if [ ! -x "$BINARY" ]; then - echo "error: kilroy binary not found or not executable at $BINARY" >&2 - echo "build it first: (cd $REPO && go build -o ./kilroy ./cmd/kilroy)" >&2 + echo "error: kilroy binary missing after build at $BINARY" >&2 exit 1 fi @@ -58,18 +62,14 @@ link "$REPO/workflows" "$HOME/.local/share/kilroy/workflows" echo echo "claude code" -link "$REPO/skills/quick-launch" "$HOME/.claude/skills/quick-launch" -link "$REPO/skills/using-kilroy" "$HOME/.claude/skills/using-kilroy" -link "$REPO/skills/quick-launch/commands/kilroy-quick.md" "$HOME/.claude/commands/kilroy-quick.md" +link "$REPO/skills/using-kilroy" "$HOME/.claude/skills/using-kilroy" echo echo "codex (~/.agents/skills/ is the native discovery path)" -link "$REPO/skills/quick-launch" "$HOME/.agents/skills/quick-launch" link "$REPO/skills/using-kilroy" "$HOME/.agents/skills/using-kilroy" echo echo "opencode (user-level)" -link "$REPO/skills/quick-launch" "$HOME/.config/opencode/skills/quick-launch" link "$REPO/skills/using-kilroy" "$HOME/.config/opencode/skills/using-kilroy" echo @@ -81,7 +81,5 @@ else fi echo -echo "done. invoke with:" -echo " claude: /kilroy-quick " -echo " codex: ask codex to use the quick-launch skill — mention it by name" -echo " opencode: ask opencode to use the quick-launch skill — mention it by name" +echo "done. for the quick-launch and pr-review skills, install them from" +echo "gf-software-factory: gf-software-factory/scripts/install-kilroy-host.sh" diff --git a/skills/quick-launch/SKILL.md b/skills/quick-launch/SKILL.md deleted file mode 100644 index 375f4359..00000000 --- a/skills/quick-launch/SKILL.md +++ /dev/null @@ -1,199 +0,0 @@ ---- -name: quick-launch -description: "Start a one-shot Kilroy agent run from within another agent's conversation. Tags the run for later lookup, backgrounds it in a tmux session, and gives you the exact commands to check status and retrieve result.md when it's done. Use this for delegating research, investigations, diffs against a repo snapshot, or any single-agent task you want to fire-and-forget." ---- - -# Quick-Launch a Kilroy Run - -Delegate a one-shot task to a Kilroy agent without blocking your own conversation. The run happens in its own git worktree, tagged with labels so you can find it again, and writes a single `result.md` you can read back. - -Use this when: -- You want an agent to investigate something in parallel while you keep working. -- The task fits in one prompt + optional context file and one output file. -- You'd rather come back in a few minutes than wait synchronously. - -Don't use this for: -- Multi-stage workflows that need more than "stage context → run agent → done". Author a custom graph instead (see `skills/create-dotfile`). -- Long-running build/test pipelines. Use a real workflow package. -- Tasks where you need live conversational back-and-forth with the agent. - -## Step 0: Preflight checklist - -Before invoking the skill, confirm: - -- `kilroy` is on PATH. (`which kilroy` — if missing, the skill is not installed; tell the user and stop.) -- `~/.local/share/kilroy/workflows/quick-launch/` exists. This is the canonical workflow package path; all invocations below reference it directly. -- Current working directory is a git repo. The engine auto-detects and creates an isolated run branch + worktree, so the user's source tree is never touched. If cwd is not a git repo, the run still works but isolation degrades to plain-directory mode — warn the user. - -## Step 1: Pick the agent - -The quick-launch package ships three graph variants: - -- `graph.dot` — claude (default) -- `graph.codex.dot` — OpenAI codex CLI -- `graph.gemini.dot` — gemini CLI - -The agent is encoded in the graph file. Pick one with `--graph` when launching; otherwise the package loader uses `graph.dot`. - -## Step 2: Prepare the task - -All inputs are written to `.kilroy/INPUT.md` in the worktree at run start, with one `## ` section per input. The agent reads that file to find the task and any context. - -Two ways to pass the task (pick one): - -**Short tasks — inline JSON:** -```bash ---input '{"prompt":"Investigate X and summarize findings"}' -``` - -**Longer or multi-line tasks — write the prompt to a file and use `--prompt-file`:** -```bash ---prompt-file /abs/path/to/request.md -``` -This reads the file contents verbatim into the `prompt` input. No JSON escaping, no quoting nightmares. **Strongly prefer this when the task is more than ~100 words or contains anything that would be painful to inline: multi-paragraph briefings, code blocks, markdown tables, lists with embedded quotes.** The extra step of writing a file is worth it the moment you reach for `\n` escapes. - -Optional extras (pass alongside either form via `--input '{"...":"..."}`): - -- `context_file` — absolute path to a file for background context. The agent reads it from its original location; no copy happens. Use when the context is a document that already exists on disk (briefing, design doc, log file). Example: `--input '{"context_file":"/abs/path/briefing.md"}'` combined with `--prompt-file`. -- `context` — an inline string for background context. Use when the context is short enough to inline but you want to keep it separate from the task description. - -If the task is short and self-contained (a few sentences), skip the context entirely. - -## Step 3: Launch - -From the user's current working directory (which should be a git repo — see Step 0): - -```bash -kilroy attractor run --detach --tmux \ - --package ~/.local/share/kilroy/workflows/quick-launch \ - --label task= \ - --prompt-file -``` - -Or the inline-JSON form for a short prompt: -```bash -kilroy attractor run --detach --tmux \ - --package ~/.local/share/kilroy/workflows/quick-launch \ - --label task= \ - --input '{"prompt":""}' -``` - -That is the whole invocation. No `--config` needed — when no run.yaml is supplied, kilroy auto-builds a default config for cwd and auto-detects installed provider CLIs. No `--no-cxdb` or `--skip-cli-headless-warning` needed either — those are applied automatically when there's no config and stdin isn't interactive. The `workflow=quick-launch` label is added automatically by the package's `workflow.toml`. - -Optional additions: -- `--graph ~/.local/share/kilroy/workflows/quick-launch/graph.codex.dot` to pick codex. Swap `.gemini.dot` for gemini. Omit for claude. -- Additional `--label KEY=VALUE` flags to tag owner, ticket id, run group, etc. The `task=` tag is the minimum — use a specific slug so `runs show --latest --label task=` finds exactly this run later. -- `--workspace ` to run against a different directory than cwd. Use this when the user wants the run to operate against a repo they're not currently in. - -On success the command prints `run_id=` and `logs_root=...` and returns immediately. The run continues in a detached tmux session. Print the `run_id` (or the `--latest --label task=` form) to the user so they can follow up. - -## Step 4: Wait for the run - -The fastest way — block until the run reaches a terminal state, then return: - -```bash -kilroy attractor runs wait --latest --label task= --timeout 10m -``` - -`runs wait` polls the run DB every ~2s (configurable with `--interval`), prints each status transition to stderr, and exits 0 on `success` / 1 on `fail`/`canceled` / 2 on `--timeout` expiry. Use this whenever you want synchronous behavior on top of the detached run. - -If you don't want to block, check status on demand instead: - -```bash -kilroy attractor runs show --latest --label task= -``` - -or list all runs matching a tag: - -```bash -kilroy attractor runs list --label task= -``` - -Show output accepts `--json` for machine-readable detail. Interesting fields for a caller: -- `status` — `running`, `success`, `fail`, `canceled` -- `worktree_dir` — where the agent worked; still on disk with the final git state -- `outputs[].path` — absolute path to each collected output file (e.g. `result.md`) - -## Step 5: Retrieve result.md - -Once status is `success`: - -```bash -kilroy attractor runs show --latest --label task= --print result.md -``` - -(or use a run id / prefix instead of `--latest`). That streams the file straight to stdout — pipe it, redirect it, or read it into your current conversation. - -If the agent wrote additional files you care about, list them: - -```bash -kilroy attractor runs show 01KP646Y --outputs -``` - -…and `--print ` any of them. Only files declared in `outputs=` on the graph or the node are collected into `logs_root/outputs/` — everything else stays in the worktree (`worktree_dir` from `runs show`). - -## Minimal run.yaml (only if you need it) - -You almost never need a run.yaml for quick-launch — the default behavior handles it. Supply one only when you need non-default settings: specific model selection beyond what the graph's stylesheet declares, a remote cxdb, custom runtime policy, or a non-cwd workspace via config instead of `--workspace`. A working minimum: - -```yaml -version: 1 - -repo: - path: /absolute/path/to/some/workspace - -llm: - cli_profile: real - providers: - anthropic: - backend: cli - openai: - backend: cli - google: - backend: cli - -git: - require_clean: false - run_branch_prefix: attractor/run - commit_per_node: true - -runtime_policy: - stall_timeout_ms: 600000 - stall_check_interval_ms: 5000 - max_llm_retries: 2 -``` - -## Inspecting a finished run - -Everything lives in two places: - -- `logs_root` — per-run operational state. `outputs/` holds collected files, `progress.ndjson` has the event stream, `/` dirs hold per-stage artifacts. -- `worktree_dir` — the git worktree the agent worked in. Still present after the run finishes; you can `cd` there and inspect the final state, or `git log` the run branch. - -Both paths are in `runs show` output. - -For a GUI view of the graph, per-node prompts/responses, tool-call log, and output files, open: - -``` -http://localhost:9700/ui/#/run/ -``` - -(Start the server with `kilroy attractor serve` if it isn't running.) - -## Failure handling - -If launch itself fails (bad flags, missing CLI binary, config error), you get an error on stderr and no run is registered. - -If launch succeeds but the run fails mid-flight, `runs show` reports `status=fail` with `failure_reason`. The worktree and logs are still on disk — go look. The per-node `Detail` and `Log` tabs in the UI are the fastest way to see what the agent actually tried. - -Common failures: -- **`provider_prompt_probe fail`** — the provider CLI couldn't complete its probe. Check the provider's own config (`~/.codex/config.toml`, `~/.claude/settings.json`, etc.) for stale or invalid model/reasoning settings. -- **`missing llm.providers.X.backend`** — the graph references a provider that isn't declared in your run.yaml. Add `llm.providers.X.backend: cli` (or `api`). -- **`output contract: missing result.md`** — the agent finished without writing `result.md`. The Detail tab shows why; rerun with a clearer prompt, or drop the `outputs=` attribute on the agent node if the task genuinely doesn't produce a file. -- **`preflight aborted: declined provider CLI headless-risk warning`** — you forgot `--skip-cli-headless-warning` on a detached run. - -## Do not - -- Modify files under `workflows/quick-launch/` to tweak a single run. The package is shared. If you need a different prompt template or a non-trivial prepare step, author a custom graph and point `--graph` at it. -- Relaunch a failed run "just to see if it works now" before reading the failure. The logs are on disk; diagnose first. -- Invent new flags. The invocation template above is the whole interface — every flag in it is load-bearing. If something isn't covered, ask the user rather than guessing. diff --git a/skills/quick-launch/commands/kilroy-quick.md b/skills/quick-launch/commands/kilroy-quick.md deleted file mode 100644 index 578d3d51..00000000 --- a/skills/quick-launch/commands/kilroy-quick.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -description: Delegate a one-shot task to a Kilroy agent (fire-and-forget, tagged, tmux-backgrounded). Task description follows. -disable-model-invocation: true ---- - -Invoke the `quick-launch` skill at `skills/quick-launch/SKILL.md` and follow it exactly as written. - -The task to delegate: $ARGUMENTS - -If the user did not supply a task description, ask them what they want the Kilroy agent to do before launching anything. Do not invent a task. - -Key reminders from the skill (do not skip): - -- Use the invocation template in Step 3 verbatim. Every flag is load-bearing. -- The workflow package lives at `~/.local/share/kilroy/workflows/quick-launch`. -- Tag the run with at least one specific `--label task=` so you can find it later with `--latest --label task=`. -- For any task longer than ~100 words or anything that would need `\n` escaping in JSON, write the prompt to a file first and use `--prompt-file ` instead of `--input '{"prompt":"..."}'`. This is the default, not the exception. -- After launching, print the `run_id` (and the `runs wait --latest --label` command) so the user can follow up. -- If the user asked you to wait for the result, use `kilroy attractor runs wait --latest --label task= --timeout ` — don't poll by hand. diff --git a/workflows/pr-review/graph.dot b/workflows/pr-review/graph.dot deleted file mode 100644 index ff5aa0f2..00000000 --- a/workflows/pr-review/graph.dot +++ /dev/null @@ -1,73 +0,0 @@ -// PR review workflow: setup, build/test, code review, holistic review, report. -// Inputs: pr_repo (owner/repo), pr_number. Output: review-report.md. - -digraph pr_review { - graph [ - inputs="pr_repo,pr_number", - outputs="review-report.md", - model_stylesheet=" - * { llm_model: claude-sonnet-4.6; llm_provider: anthropic; } - " - ] - - start [shape=Mdiamond, label="Start"] - done [shape=Msquare, label="Done"] - - // Tool: checkout PR branch, fetch metadata, merge base - setup [ - shape=parallelogram, - label="Setup PR", - tool_command="sh .kilroy/package/scripts/setup-pr.sh" - ] - - // Tool: detect build system, run build and tests - build_test [ - shape=parallelogram, - label="Build & Test", - tool_command="sh .kilroy/package/scripts/build-test.sh", - timeout="600s" - ] - - // Tool: write a failure report when build/test failed - fail_report [ - shape=parallelogram, - label="Write Failure Report", - tool_command="sh .kilroy/package/scripts/write-fail-report.sh" - ] - - // Tool: collect the PR diff for review agents - collect_diff [ - shape=parallelogram, - label="Collect Diff", - tool_command="sh .kilroy/package/scripts/collect-diff.sh" - ] - - // Agent: review the diff for code quality, bugs, test coverage - code_review [ - shape=box, - label="Code Review", - agent_tool="claude", - prompt="You are reviewing a GitHub PR. Read .ai/pr-data/pr-view.txt, pr-meta.json, pr-diff.patch, changed-files.txt, and build-report.json. Read the actual changed source files for context. Assess correctness, security, error handling, test coverage, naming, and scope. Write findings to .ai/pr-data/code-review-findings.md with sections: Critical Issues, Warnings, Positive Observations, File-by-File Notes. Be specific with file:line citations. Then write {\"status\":\"success\"} to $KILROY_STAGE_STATUS_PATH or $KILROY_STAGE_STATUS_FALLBACK_PATH if that fails.", - outputs=".ai/pr-data/code-review-findings.md" - ] - - // Agent: holistic assessment and final report - holistic_review [ - shape=box, - label="Holistic Review", - agent_tool="claude", - prompt="You are making a final PR assessment. Read .ai/pr-data/code-review-findings.md, pr-view.txt, pr-meta.json, build-report.json, and skim pr-diff.patch. Assess net value, completeness, risk, and make a decision: MERGE, MERGE-FIX, FIX-MERGE, or REJECT. Write review-report.md in the workspace root with: Decision, Rationale, Build & Test Results, Critical Issues, Warnings, Positive Observations, Next Actions. Keep under 50 lines. Bias toward landing contributions. Then write {\"status\":\"success\"} to $KILROY_STAGE_STATUS_PATH or $KILROY_STAGE_STATUS_FALLBACK_PATH if that fails.", - outputs="review-report.md" - ] - - start -> setup - setup -> build_test [condition="outcome=success"] - setup -> done - build_test -> collect_diff [condition="outcome=success"] - build_test -> fail_report - fail_report -> done - collect_diff -> code_review [condition="outcome=success"] - collect_diff -> done - code_review -> holistic_review - holistic_review -> done -} diff --git a/workflows/pr-review/pr-review.dot b/workflows/pr-review/pr-review.dot deleted file mode 100644 index ff3dccc8..00000000 --- a/workflows/pr-review/pr-review.dot +++ /dev/null @@ -1,145 +0,0 @@ -// Single-PR review workflow: setup, investigate, decide. -// Usage: PR_REPO=owner/repo PR_NUMBER=123 kilroy attractor run --graph pr-review.dot --config run.yaml - -digraph pr_review { - graph [ - goal="Review GitHub pull request for merge readiness", - model_stylesheet=" - * { llm_model: claude-sonnet-4.6; llm_provider: anthropic; } - " - ] - - start [shape=Mdiamond, label="Start"] - exit [shape=Msquare, label="Exit"] - - // Deterministic setup: fetch metadata, checkout PR branch, merge base, run build/test - setup [ - shape=parallelogram, - label="Setup", - tool_command="sh /Users/matt/sw/personal/kilroy/workflows/pr-review/scripts/setup-pr.sh" - ] - - // Agentic investigation: deep review with full tool access - investigate [ - shape=box, - label="Investigate", - auto_status=true, - prompt="You are investigating a GitHub pull request. The PR branch has been checked out -and merged with the base branch. Build and test results are available. - -START by reading these files: -- .ai/pr-data/pr-view.txt (PR description and discussion) -- .ai/pr-data/pr-meta.json (structured metadata: files changed, author, branches) -- .ai/pr-data/pr-diff.patch (the full diff) -- .ai/pr-data/merge-result.txt (whether the merge was clean or had conflicts) - -THEN investigate the actual codebase. You have the full repo checked out at the PR branch -merged with the latest base branch. Use it: -- Read the source files that were changed. Read the surrounding code for context. -- Determine the project's build system and test framework (package.json, go.mod, Makefile, etc). -- Build the project. Run the test suite. If the full suite is too slow, run tests targeted - at the packages/modules touched by the PR. -- If tests fail, determine whether failures are pre-existing on the base branch or - introduced by this PR. Check git blame, read the failing test code, understand why. -- If the PR claims to fix something, verify the fix actually works. -- If the PR adds functionality, check if it has tests and if they cover the key cases. -- Look for: bugs, security issues, missing error handling, naming problems, scope creep. -- Run additional targeted commands if needed (linters, type checking, etc). - -Write your findings to .ai/pr-data/investigation.md as a structured evidence document: - -## PR Overview -[1-2 sentences: what this PR does] - -## Verification Results -- Build: [pass/fail] -- Tests: [pass/fail — list any failures and whether pre-existing or introduced] -- Format: [pass/fail — list affected files] -- Claims verified: [did you confirm the PR does what it says?] - -## Code Review Findings -[List each finding as a bullet. For each: what you found, where (file:line), and why it matters. -Include positive findings too — things done well that should be preserved.] - -## Scope Assessment -[Is this PR well-scoped? Does every changed file serve the stated purpose? -Flag any unrelated changes that should be in a separate PR.] - -## Risk Factors -[What could go wrong if merged? Edge cases, migration concerns, behavioral changes. -If low risk, say so briefly and move on.] - -Do NOT make recommendations or decisions. Your job is to gather evidence. The next step -will use your findings to make the merge decision. - -On failure, write to $KILROY_STAGE_STATUS_PATH (fallback $KILROY_STAGE_STATUS_FALLBACK_PATH): -{\"status\":\"fail\",\"failure_reason\":\"...\",\"failure_class\":\"deterministic\"}" - ] - - // Decision agent: reads evidence, produces actionable verdict - decide [ - shape=box, - label="Decide", - auto_status=true, - prompt="You are making a merge decision for a GitHub pull request based on evidence -gathered by the investigation step. - -Read these files: -- .ai/pr-data/investigation.md (detailed findings from the investigator) -- .ai/pr-data/pr-view.txt (PR description for context) -- .ai/pr-data/pr-meta.json (metadata) - -Do NOT run commands or read source code. The investigation is complete. Your job is to -decide and direct. - -Write .ai/pr-data/review-report.md in EXACTLY this format: - -## Decision: [one of: MERGE | MERGE-FIX | FIX-MERGE | CHERRY-PICK | SPLIT | REJECT] - -## Rationale -[3-5 sentences max. What does the PR do, is it worth merging, and why this decision.] - -## Verification Summary -- Build: [pass/fail] -- Tests: [pass/fail — note pre-existing vs introduced failures] -- Format: [pass/fail] - -## Blockers -[Things that MUST be fixed before merge. If none, write 'None'.] - -## Next Actions -[Ordered list of what should happen now. Be specific and directive but not prescriptive — -the downstream agent knows git mechanics. Tell it WHAT to do and WHY, not HOW to type it. - -Examples of good next actions: -- Merge PR #74 into main. -- Comment on PR with: 'Reviewed — merging. Follow-up PR incoming for cleanup items below.' -- Create a follow-up PR from main that fixes gofmt drift in engine.go around line 806 - (pre-existing from PR #76, map literal alignment) and removes the duplicate test entry - at cli_only_models_test.go line 14. -- Reject PR #99. Comment: 'This PR bundles auth refactoring with unrelated UI changes. - Please split into separate PRs — happy to review the auth changes independently.' - -Examples of bad next actions: -- Run `git checkout -b fix/cleanup && gofmt -w file.go && git add . && git commit...` - (too prescriptive — the downstream knows how to make a branch and commit) -- Address the issues found in the review. - (too vague — which issues? what specifically should change?) -] - -RULES: -- Bias toward landing contributions. Prefer MERGE-FIX over FIX-MERGE for minor issues. -- Every action must be specific enough that someone unfamiliar with the PR can execute it. -- Keep the whole report under 40 lines. No padding, no tables, no filler. -- If the investigation found nothing concerning, say so briefly and move on. - -On failure, write to $KILROY_STAGE_STATUS_PATH (fallback $KILROY_STAGE_STATUS_FALLBACK_PATH): -{\"status\":\"fail\",\"failure_reason\":\"...\",\"failure_class\":\"deterministic\"}" - ] - - start -> setup - setup -> investigate [condition="outcome=success"] - setup -> exit - investigate -> decide - decide -> exit -} diff --git a/workflows/pr-review/prompts/code-review.md b/workflows/pr-review/prompts/code-review.md deleted file mode 100644 index 3cac012c..00000000 --- a/workflows/pr-review/prompts/code-review.md +++ /dev/null @@ -1,51 +0,0 @@ -You are reviewing a GitHub pull request for code quality, bugs, and test coverage. - -The PR has already been checked out, built, and tested. Build/test results and the -diff are available in `.ai/pr-data/`. - -## Your inputs - -Read these files first: -- `.ai/pr-data/pr-view.txt` — PR description and discussion -- `.ai/pr-data/pr-meta.json` — structured metadata (files, author, branches) -- `.ai/pr-data/pr-diff.patch` — the full diff -- `.ai/pr-data/changed-files.txt` — list of changed files -- `.ai/pr-data/build-report.json` — build/test results -- `.ai/pr-data/merge-result.txt` — merge status - -## What to do - -Read the actual source files that were changed. Read surrounding code for context. -For each changed file, assess: - -1. **Correctness**: Does the code do what the PR claims? Are there logic errors, off-by-one - bugs, race conditions, nil pointer risks? -2. **Security**: SQL injection, XSS, command injection, hardcoded secrets, unsafe deserialization? -3. **Error handling**: Are errors checked? Are failure modes handled gracefully? -4. **Test coverage**: Do tests exist for the changed code? Do they test the right things? - Are edge cases covered? Do tests test real behavior (not mocked behavior)? -5. **Naming and clarity**: Are names descriptive? Is the code readable without comments? -6. **Scope**: Does every change serve the PR's stated purpose? Flag unrelated changes. - -## Output - -Write your findings to `.ai/pr-data/code-review-findings.md`: - -```markdown -## Code Review Findings - -### Critical Issues -[Issues that must be fixed before merge. Empty section if none.] - -### Warnings -[Issues worth noting but not blocking. Empty section if none.] - -### Positive Observations -[Things done well that should be preserved.] - -### File-by-File Notes -[For each changed file with findings, list file:line and what you found.] -``` - -Be specific. Cite file paths and line numbers. Focus on substance, not style nitpicks. -If the code is clean, say so briefly — don't manufacture findings. diff --git a/workflows/pr-review/prompts/holistic-review.md b/workflows/pr-review/prompts/holistic-review.md deleted file mode 100644 index c5d107f4..00000000 --- a/workflows/pr-review/prompts/holistic-review.md +++ /dev/null @@ -1,65 +0,0 @@ -You are making a final assessment of a GitHub pull request. The code review is complete. - -## Your inputs - -Read these files: -- `.ai/pr-data/code-review-findings.md` — detailed code review from the previous step -- `.ai/pr-data/pr-view.txt` — PR description and discussion -- `.ai/pr-data/pr-meta.json` — structured metadata -- `.ai/pr-data/build-report.json` — build/test results -- `.ai/pr-data/pr-diff.patch` — the full diff (skim for overall scope) - -## What to assess - -1. **Net value**: Is this PR a net-positive change? Does it improve the codebase more than - it costs in complexity, risk, or maintenance burden? -2. **Completeness**: Does the PR achieve what it claims? Are there missing pieces? -3. **Risk**: What could go wrong if merged? Migration risks, behavioral changes, edge cases? -4. **Decision**: Should this be merged, merged with fixes, or sent back? - -## Output - -Write `review-report.md` in the workspace root (NOT in .ai/pr-data/): - -```markdown -# PR Review Report - -**PR**: # -**Repo**: <owner/repo> -**Author**: <author> - -## Decision: [MERGE | MERGE-FIX | FIX-MERGE | REJECT] - -- MERGE: Ship it as-is -- MERGE-FIX: Merge now, fix minor issues in a follow-up -- FIX-MERGE: Issues must be fixed before merging -- REJECT: Fundamental problems, needs rethinking - -## Rationale -[3-5 sentences. What does the PR do, is it worth merging, why this decision.] - -## Build & Test Results - -| Check | Status | -|-------|--------| -| Build | pass/fail | -| Tests | pass/fail | - -## Critical Issues -[From code review. Must be fixed. "None" if clean.] - -## Warnings -[Non-blocking concerns. "None" if clean.] - -## Positive Observations -[What the PR does well.] - -## Next Actions -[Specific, ordered list of what should happen. Be directive but not prescriptive.] -``` - -Rules: -- Bias toward landing contributions. Prefer MERGE-FIX over FIX-MERGE for minor issues. -- Keep the report under 50 lines. No padding or filler. -- Every action item must be specific enough for someone unfamiliar with the PR to execute. -- If the code review found nothing concerning, say so briefly. diff --git a/workflows/pr-review/run-freshell.yaml b/workflows/pr-review/run-freshell.yaml deleted file mode 100644 index 84170687..00000000 --- a/workflows/pr-review/run-freshell.yaml +++ /dev/null @@ -1,17 +0,0 @@ -version: 1 - -repo: - path: /Users/matt/sw/personal/freshell - -llm: - providers: - anthropic: - backend: api - -git: - require_clean: false - commit_per_node: false - -runtime_policy: - stage_timeout_ms: 0 - stall_timeout_ms: 600000 diff --git a/workflows/pr-review/run.yaml b/workflows/pr-review/run.yaml deleted file mode 100644 index f51ab251..00000000 --- a/workflows/pr-review/run.yaml +++ /dev/null @@ -1,17 +0,0 @@ -version: 1 - -repo: - path: . - -llm: - providers: - anthropic: - backend: api - -git: - require_clean: false - commit_per_node: false - -runtime_policy: - stage_timeout_ms: 0 - stall_timeout_ms: 300000 diff --git a/workflows/pr-review/scripts/build-test.sh b/workflows/pr-review/scripts/build-test.sh deleted file mode 100644 index 5b2de932..00000000 --- a/workflows/pr-review/scripts/build-test.sh +++ /dev/null @@ -1,83 +0,0 @@ -#!/bin/sh -# Detect build system and run build + tests. -# Exits 0 on success, non-zero if build or tests fail. - -set -e - -SCRATCH=".ai/pr-data" -mkdir -p "$SCRATCH" - -# Detect build system -BUILD_CMD="" -TEST_CMD="" - -if [ -f "go.mod" ]; then - BUILD_CMD="go build ./..." - TEST_CMD="go test ./..." -elif [ -f "Cargo.toml" ]; then - BUILD_CMD="cargo build" - TEST_CMD="cargo test" -elif [ -f "package.json" ]; then - if [ -f "yarn.lock" ]; then - BUILD_CMD="yarn install && yarn build" - TEST_CMD="yarn test" - else - BUILD_CMD="npm install && npm run build" - TEST_CMD="npm test" - fi -elif [ -f "Makefile" ] || [ -f "makefile" ]; then - BUILD_CMD="make" - TEST_CMD="make test" -elif [ -f "CMakeLists.txt" ]; then - BUILD_CMD="cmake -B build && cmake --build build" - TEST_CMD="cd build && ctest" -fi - -if [ -z "$BUILD_CMD" ]; then - echo "No recognized build system found" - echo '{"build": "skip", "test": "skip", "reason": "no build system detected"}' > "$SCRATCH/build-report.json" - # Not a failure — some repos don't have a build step - exit 0 -fi - -echo "=== Detected build system ===" -echo " Build: $BUILD_CMD" -echo " Test: $TEST_CMD" - -# Run build -echo "" -echo "=== Build ===" -BUILD_STATUS="pass" -if eval "$BUILD_CMD" 2>"$SCRATCH/build-stderr.txt"; then - echo "Build: PASS" -else - BUILD_STATUS="fail" - echo "Build: FAIL" - cat "$SCRATCH/build-stderr.txt" -fi - -# Run tests (only if build passed) -TEST_STATUS="skip" -if [ "$BUILD_STATUS" = "pass" ] && [ -n "$TEST_CMD" ]; then - echo "" - echo "=== Tests ===" - if eval "$TEST_CMD" 2>&1 | tee "$SCRATCH/test-output.txt"; then - TEST_STATUS="pass" - echo "Tests: PASS" - else - TEST_STATUS="fail" - echo "Tests: FAIL" - fi -fi - -# Write structured report -cat > "$SCRATCH/build-report.json" <<EOF -{"build": "$BUILD_STATUS", "test": "$TEST_STATUS", "build_cmd": "$BUILD_CMD", "test_cmd": "$TEST_CMD"} -EOF - -# Exit non-zero if anything failed -if [ "$BUILD_STATUS" = "fail" ] || [ "$TEST_STATUS" = "fail" ]; then - exit 1 -fi - -exit 0 diff --git a/workflows/pr-review/scripts/collect-diff.sh b/workflows/pr-review/scripts/collect-diff.sh deleted file mode 100644 index 75328898..00000000 --- a/workflows/pr-review/scripts/collect-diff.sh +++ /dev/null @@ -1,33 +0,0 @@ -#!/bin/sh -# Collect the PR diff and file list for review agents. -# Reads: KILROY_INPUT_PR_REPO, KILROY_INPUT_PR_NUMBER. Requires: gh. - -set -e - -: "${KILROY_INPUT_PR_REPO:?KILROY_INPUT_PR_REPO must be set (owner/repo)}" -: "${KILROY_INPUT_PR_NUMBER:?KILROY_INPUT_PR_NUMBER must be set}" - -PR_REPO="$KILROY_INPUT_PR_REPO" -PR_NUMBER="$KILROY_INPUT_PR_NUMBER" -SCRATCH=".ai/pr-data" -mkdir -p "$SCRATCH" - -echo "=== Collecting PR diff ===" - -# Get the diff -gh pr diff "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-diff.patch" -DIFF_LINES=$(wc -l < "$SCRATCH/pr-diff.patch" | tr -d ' ') -echo "Diff: ${DIFF_LINES} lines" - -# Get changed file list -gh pr view "$PR_NUMBER" --repo "$PR_REPO" --json files \ - | python3 -c " -import json, sys -data = json.load(sys.stdin) -for f in data.get('files', []): - print(f['path']) -" > "$SCRATCH/changed-files.txt" - -FILE_COUNT=$(wc -l < "$SCRATCH/changed-files.txt" | tr -d ' ') -echo "Changed files: ${FILE_COUNT}" -cat "$SCRATCH/changed-files.txt" diff --git a/workflows/pr-review/scripts/setup-pr.sh b/workflows/pr-review/scripts/setup-pr.sh deleted file mode 100644 index fefae5f1..00000000 --- a/workflows/pr-review/scripts/setup-pr.sh +++ /dev/null @@ -1,80 +0,0 @@ -#!/bin/sh -# Checkout a PR branch, fetch metadata, merge base branch into it. -# Reads: KILROY_INPUT_PR_REPO, KILROY_INPUT_PR_NUMBER. Requires: gh, git. - -set -e - -: "${KILROY_INPUT_PR_REPO:?KILROY_INPUT_PR_REPO must be set (owner/repo)}" -: "${KILROY_INPUT_PR_NUMBER:?KILROY_INPUT_PR_NUMBER must be set}" - -PR_REPO="$KILROY_INPUT_PR_REPO" -PR_NUMBER="$KILROY_INPUT_PR_NUMBER" -SCRATCH=".ai/pr-data" -mkdir -p "$SCRATCH" - -echo "=== Setting up PR #${PR_NUMBER} from ${PR_REPO} ===" - -# Fetch PR metadata via gh -gh pr view "$PR_NUMBER" --repo "$PR_REPO" \ - --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,changedFiles \ - > "$SCRATCH/pr-meta.json" -gh pr view "$PR_NUMBER" --repo "$PR_REPO" > "$SCRATCH/pr-view.txt" - -# Extract metadata -BASE_BRANCH=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['baseRefName'])") -HEAD_BRANCH=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['headRefName'])") -TITLE=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['title'])") - -# Clone the repo if we're in an empty workspace -if [ ! -e ".git" ]; then - echo "No git repo found — cloning ${PR_REPO}..." - git clone "https://github.com/${PR_REPO}.git" _repo - # Move contents up (clone into subdir, then move to workspace root) - mv _repo/.git . - mv _repo/* . 2>/dev/null || true - mv _repo/.* . 2>/dev/null || true - rmdir _repo 2>/dev/null || true -fi - -# Add upstream remote for the PR's repo -REMOTE_URL="https://github.com/${PR_REPO}.git" -if ! git remote get-url upstream >/dev/null 2>&1; then - git remote add upstream "$REMOTE_URL" -fi - -# Fetch PR ref and base branch -echo "Fetching PR #${PR_NUMBER} head and base (${BASE_BRANCH})..." -git fetch upstream "pull/${PR_NUMBER}/head" --quiet -git fetch upstream "$BASE_BRANCH" --quiet - -# Checkout PR at FETCH_HEAD -RUN_TAG=$(echo "${KILROY_RUN_ID:-$$}" | tail -c 9) -LOCAL_BRANCH="pr-review/${PR_NUMBER}-${RUN_TAG}" -git checkout -b "$LOCAL_BRANCH" FETCH_HEAD --quiet - -PR_SHA=$(git rev-parse HEAD) -echo "PR branch HEAD: $PR_SHA (local: $LOCAL_BRANCH)" - -# Merge base branch into PR branch -echo "Merging upstream/${BASE_BRANCH} into PR branch..." -if git merge "upstream/${BASE_BRANCH}" --no-edit 2>"$SCRATCH/merge-stderr.txt"; then - echo "MERGE_STATUS=clean" > "$SCRATCH/merge-result.txt" - echo "Merge: clean" -else - echo "MERGE_STATUS=conflict" > "$SCRATCH/merge-result.txt" - git diff --name-only --diff-filter=U > "$SCRATCH/conflict-files.txt" 2>/dev/null || true - git merge --abort || true - echo "Merge: CONFLICTS (see $SCRATCH/conflict-files.txt)" -fi - -# Summary -FILES=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['changedFiles'])") -ADDS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['additions'])") -DELS=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['deletions'])") -AUTHOR=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['author']['login'])") - -echo "" -echo "PR #${PR_NUMBER}: ${TITLE}" -echo " Author: ${AUTHOR}" -echo " Branch: ${HEAD_BRANCH} -> ${BASE_BRANCH}" -echo " Files changed: ${FILES} (+${ADDS} / -${DELS})" diff --git a/workflows/pr-review/scripts/write-fail-report.sh b/workflows/pr-review/scripts/write-fail-report.sh deleted file mode 100644 index 909b3571..00000000 --- a/workflows/pr-review/scripts/write-fail-report.sh +++ /dev/null @@ -1,59 +0,0 @@ -#!/bin/sh -# Generate review-report.md from build/test failure data. -# Called when build_test exits non-zero. - -SCRATCH=".ai/pr-data" - -# Read build report if it exists -BUILD_STATUS="unknown" -TEST_STATUS="unknown" -if [ -f "$SCRATCH/build-report.json" ]; then - BUILD_STATUS=$(python3 -c "import json; print(json.load(open('$SCRATCH/build-report.json'))['build'])" 2>/dev/null || echo "unknown") - TEST_STATUS=$(python3 -c "import json; print(json.load(open('$SCRATCH/build-report.json'))['test'])" 2>/dev/null || echo "unknown") -fi - -# Read PR title -TITLE="(unknown)" -if [ -f "$SCRATCH/pr-meta.json" ]; then - TITLE=$(python3 -c "import json; print(json.load(open('$SCRATCH/pr-meta.json'))['title'])" 2>/dev/null || echo "(unknown)") -fi - -# Build the failure report -cat > review-report.md <<EOF -# PR Review Report - -**PR**: #${KILROY_INPUT_PR_NUMBER} — ${TITLE} -**Repo**: ${KILROY_INPUT_PR_REPO} -**Result**: BUILD/TEST FAILURE — code review skipped - -## Build & Test Results - -| Check | Status | -|---------|--------| -| Build | ${BUILD_STATUS} | -| Tests | ${TEST_STATUS} | - -## Details - -The PR failed build or test checks. Code review was skipped because the code -does not compile or pass tests in its current state. - -EOF - -# Append build stderr if available -if [ -f "$SCRATCH/build-stderr.txt" ] && [ -s "$SCRATCH/build-stderr.txt" ]; then - echo "### Build Errors" >> review-report.md - echo '```' >> review-report.md - head -100 "$SCRATCH/build-stderr.txt" >> review-report.md - echo '```' >> review-report.md -fi - -# Append test output if available -if [ -f "$SCRATCH/test-output.txt" ] && [ -s "$SCRATCH/test-output.txt" ]; then - echo "### Test Output" >> review-report.md - echo '```' >> review-report.md - tail -50 "$SCRATCH/test-output.txt" >> review-report.md - echo '```' >> review-report.md -fi - -echo "Wrote review-report.md (build/test failure)" diff --git a/workflows/pr-review/workflow.toml b/workflows/pr-review/workflow.toml deleted file mode 100644 index 6019470a..00000000 --- a/workflows/pr-review/workflow.toml +++ /dev/null @@ -1,21 +0,0 @@ -# PR review workflow: checkout, build/test, LLM code review, holistic review, report. -# Requires: gh CLI, git. Agent nodes use claude via --tmux. - -name = "pr-review" -description = "Review a GitHub pull request: build, test, code review, holistic assessment." -version = "1.0.0" - -[[inputs]] -name = "pr_repo" -description = "GitHub repository in owner/repo format" -required = true - -[[inputs]] -name = "pr_number" -description = "Pull request number" -required = true - -[defaults] -labels = { workflow = "pr-review" } - -outputs = ["review-report.md"] diff --git a/workflows/quick-launch/graph.codex.dot b/workflows/quick-launch/graph.codex.dot deleted file mode 100644 index 5cd002cc..00000000 --- a/workflows/quick-launch/graph.codex.dot +++ /dev/null @@ -1,27 +0,0 @@ -// Quick-launch variant: same workflow, codex agent. -// See graph.dot for the canonical version and docs. - -digraph quick_launch_codex { - graph [ - inputs="prompt", - outputs="result.md", - model_stylesheet=" - * { llm_provider: openai; llm_model: gpt-5-codex; } - " - ] - - start [shape=Mdiamond, label="Start"] - done [shape=Msquare, label="Done"] - - agent [ - shape=box, - label="Run agent", - agent_tool="codex", - llm_provider="openai", - prompt="Read .kilroy/INPUT.md for the user's request. The '## prompt' section is your assignment. If a '## context_file' section exists, read the file at that path for background context. If a '## context' section exists instead, use it as inline context. Complete the task and write your findings to result.md at the workspace root. Keep result.md self-contained — the caller will read it directly. When finished, write {\"status\":\"success\"} to $KILROY_STAGE_STATUS_PATH or $KILROY_STAGE_STATUS_FALLBACK_PATH if that fails.", - outputs="result.md" - ] - - start -> agent - agent -> done -} diff --git a/workflows/quick-launch/graph.dot b/workflows/quick-launch/graph.dot deleted file mode 100644 index c8d2074b..00000000 --- a/workflows/quick-launch/graph.dot +++ /dev/null @@ -1,40 +0,0 @@ -// Quick-launch: run a single agent against an inline prompt (plus optional -// context file) and collect result.md. -// -// Inputs: prompt (required), context_file or context (optional). -// Output: result.md at the worktree root. -// -// All inputs land in .kilroy/INPUT.md which the engine writes once at run -// start from the structured --input map. The agent reads that file directly; -// no prepare/stage step is needed because --input values are already sectioned -// as "## prompt", "## context_file", etc. -// -// Default agent: claude. For codex/gemini, use --graph graph.codex.dot or -// graph.gemini.dot. - -digraph quick_launch { - graph [ - inputs="prompt", - outputs="result.md", - model_stylesheet=" - * { llm_provider: anthropic; llm_model: claude-sonnet-4.6; } - " - ] - - start [shape=Mdiamond, label="Start"] - done [shape=Msquare, label="Done"] - - // Agent: read .kilroy/INPUT.md (which holds the user's prompt + any - // context_file path or inline context), do the work, write result.md - // to the worktree root. - agent [ - shape=box, - label="Run agent", - agent_tool="claude", - prompt="Read .kilroy/INPUT.md for the user's request. The '## prompt' section is your assignment. If a '## context_file' section exists, read the file at that path for background context. If a '## context' section exists instead, use it as inline context. Complete the task and write your findings to result.md at the workspace root. Keep result.md self-contained — the caller will read it directly. When finished, write {\"status\":\"success\"} to $KILROY_STAGE_STATUS_PATH or $KILROY_STAGE_STATUS_FALLBACK_PATH if that fails.", - outputs="result.md" - ] - - start -> agent - agent -> done -} diff --git a/workflows/quick-launch/graph.gemini.dot b/workflows/quick-launch/graph.gemini.dot deleted file mode 100644 index 12b60e4d..00000000 --- a/workflows/quick-launch/graph.gemini.dot +++ /dev/null @@ -1,27 +0,0 @@ -// Quick-launch variant: same workflow, gemini agent. -// See graph.dot for the canonical version and docs. - -digraph quick_launch_gemini { - graph [ - inputs="prompt", - outputs="result.md", - model_stylesheet=" - * { llm_provider: google; llm_model: gemini-2.5-pro; } - " - ] - - start [shape=Mdiamond, label="Start"] - done [shape=Msquare, label="Done"] - - agent [ - shape=box, - label="Run agent", - agent_tool="gemini", - llm_provider="google", - prompt="Read .kilroy/INPUT.md for the user's request. The '## prompt' section is your assignment. If a '## context_file' section exists, read the file at that path for background context. If a '## context' section exists instead, use it as inline context. Complete the task and write your findings to result.md at the workspace root. Keep result.md self-contained — the caller will read it directly. When finished, write {\"status\":\"success\"} to $KILROY_STAGE_STATUS_PATH or $KILROY_STAGE_STATUS_FALLBACK_PATH if that fails.", - outputs="result.md" - ] - - start -> agent - agent -> done -} diff --git a/workflows/quick-launch/workflow.toml b/workflows/quick-launch/workflow.toml deleted file mode 100644 index 92384a37..00000000 --- a/workflows/quick-launch/workflow.toml +++ /dev/null @@ -1,26 +0,0 @@ -# Quick-launch workflow: stage a task + optional context, then run one agent. -# Intended for investigation/research tasks kicked off from an agentic chat. - -name = "quick-launch" -description = "Single-agent task runner: stage a prompt (and optional context file), run the agent, collect result.md" -version = "0.1.0" - -outputs = ["result.md"] - -[[inputs]] -name = "prompt" -description = "Task description for the agent (short form). Required." -required = true - -[[inputs]] -name = "context_file" -description = "Absolute path to a file to stage as .kilroy/CONTEXT.md for the agent to read. Optional." -required = false - -[[inputs]] -name = "context" -description = "Inline context string to stage as .kilroy/CONTEXT.md. Ignored if context_file is set. Optional." -required = false - -[defaults] -labels = { workflow = "quick-launch" }