diff --git a/self-development/README.md b/self-development/README.md index a042f6ff..3610f454 100644 --- a/self-development/README.md +++ b/self-development/README.md @@ -185,6 +185,29 @@ Creates GitHub issues for actionable improvements found. kubectl apply -f self-development/kelos-self-update.yaml ``` +### kelos-retrospective.yaml + +Runs weekly to analyze PR outcomes and identify evidence-backed prompt improvements. + +| | | +|---|---| +| **Trigger** | Cron `0 0 * * 1` (weekly on Monday at midnight UTC) | +| **Model** | Opus | +| **Concurrency** | 1 | + +Each run performs a structured analysis: +1. **Collect PR data** — fetches all kelos-generated PRs from the last 7 days +2. **Classify outcomes** — categorizes each PR as merged or closed, with rejection reasons +3. **Compute metrics** — calculates merge rate, rejection rate by failure mode, and week-over-week trends +4. **Identify patterns** — proposes specific prompt changes backed by statistical evidence + +Creates GitHub issues only when actionable improvements are found (new failure patterns or evidence that previous changes didn't help). Skips output when merge rate is above 70% and all failure modes are already addressed. + +**Deploy:** +```bash +kubectl apply -f self-development/kelos-retrospective.yaml +``` + ## Customizing for Your Repository To adapt these examples for your own repository: diff --git a/self-development/kelos-retrospective.yaml b/self-development/kelos-retrospective.yaml new file mode 100644 index 00000000..1965667d --- /dev/null +++ b/self-development/kelos-retrospective.yaml @@ -0,0 +1,103 @@ +apiVersion: kelos.dev/v1alpha1 +kind: TaskSpawner +metadata: + name: kelos-retrospective +spec: + when: + cron: + schedule: "0 0 * * 1" # Weekly on Monday at midnight UTC + maxConcurrency: 1 + taskTemplate: + workspaceRef: + name: kelos-agent + model: opus + type: claude-code + ttlSecondsAfterFinished: 864000 + credentials: + type: oauth + secretRef: + name: kelos-credentials + podOverrides: + resources: + requests: + cpu: "250m" + memory: "512Mi" + ephemeral-storage: "2Gi" + limits: + cpu: "1" + memory: "2Gi" + ephemeral-storage: "2Gi" + agentConfigRef: + name: kelos-dev-agent + promptTemplate: | + You are a retrospective analyst for the Kelos self-development loop. + Your job is to measure the effectiveness of agent-generated PRs and + identify evidence-backed improvements to the worker prompt. + + ## Step 1: Collect PR outcome data (last 7 days) + + Fetch all recent kelos-generated PRs (filter to the last 7 days): + ``` + gh pr list --state all --label generated-by-kelos --limit 50 --search "created:>=$(date -d '7 days ago' +%Y-%m-%d)" --json number,title,state,mergedAt,closedAt,body,labels + ``` + + For each PR, classify it: + - **Merged**: Successfully contributed to the project + - **Closed without merge**: Rejected — investigate why + + For each closed PR, read its review comments to categorize the rejection: + ``` + gh pr view {number} --comments + gh pr view {number} --json reviews + gh api repos/{owner}/{repo}/pulls/{number}/comments + ``` + + Categorize rejections into failure modes: + - **Scope creep**: Agent added unrequested features + - **Design disagreement**: Maintainer wanted a different approach + - **Duplicate/existing code**: Agent created something that already existed + - **Format/convention violation**: PR description, commit format, etc. + - **Quality issue**: Tests missing, code incorrect, review feedback ignored + - **Not actionable**: Issue was not suitable for autonomous agent work + - **Stale/superseded**: Another PR addressed the same issue first + + ## Step 2: Compute metrics + + Calculate: + - Total PRs created this week + - Merge rate (merged / total) + - Rejection rate by failure mode + - Compare to previous weeks if data is available from prior retrospective issues + + ## Step 3: Identify actionable patterns + + For each failure mode with 2+ occurrences: + - Read the current worker prompt in `self-development/kelos-workers.yaml` + - Check if the prompt already addresses this failure mode + - If not, propose a specific prompt addition with exact wording + + For merged PRs that required multiple resets: + - What did the agent miss on the first attempt? + - Could the prompt be clearer about that scenario? + + ## Step 4: Output + + If you find actionable improvements (new failure patterns not yet addressed + in the prompt, or evidence that previous prompt changes didn't help): + Create a GitHub issue with: + - Title: "Retrospective: [week date range] — [key finding]" + - Body with: metrics summary, failure mode breakdown, specific prompt changes + ``` + gh issue create --title "..." --body "..." --label generated-by-kelos + ``` + + If all failure modes are already addressed in the prompt and merge rate + is above 70%: exit without creating an issue. Not every run needs output. + + ## Constraints + - Only analyze PRs from the last 7 days (avoid re-analyzing old data) + - Do NOT create PRs — only create issues with proposals + - Check existing issues first to avoid duplicates: `gh issue list --label generated-by-kelos --limit 20` + - Be specific: cite PR numbers, quote review comments, propose exact prompt wording + - Do not create vague "we should improve X" issues — every proposal must include concrete text changes + pollInterval: 1m