From 00195de61054694a3d1ec8a78ec19c6b18e06005 Mon Sep 17 00:00:00 2001 From: Gunju Kim Date: Sun, 1 Mar 2026 14:10:23 +0000 Subject: [PATCH 1/3] Add kelos-retrospective TaskSpawner for weekly PR outcome analysis Add a new cron-based TaskSpawner that runs weekly to systematically analyze agent-generated PR outcomes, compute merge rate metrics, categorize rejection reasons, and propose evidence-backed prompt improvements. This closes the feedback gap in the self-development pipeline where no agent currently tracks whether PRs are getting better or worse over time. Closes #513 Co-Authored-By: Claude Opus 4.6 --- self-development/README.md | 23 +++++ self-development/kelos-retrospective.yaml | 103 ++++++++++++++++++++++ 2 files changed, 126 insertions(+) create mode 100644 self-development/kelos-retrospective.yaml diff --git a/self-development/README.md b/self-development/README.md index a042f6ff..3610f454 100644 --- a/self-development/README.md +++ b/self-development/README.md @@ -185,6 +185,29 @@ Creates GitHub issues for actionable improvements found. kubectl apply -f self-development/kelos-self-update.yaml ``` +### kelos-retrospective.yaml + +Runs weekly to analyze PR outcomes and identify evidence-backed prompt improvements. + +| | | +|---|---| +| **Trigger** | Cron `0 0 * * 1` (weekly on Monday at midnight UTC) | +| **Model** | Opus | +| **Concurrency** | 1 | + +Each run performs a structured analysis: +1. **Collect PR data** — fetches all kelos-generated PRs from the last 7 days +2. **Classify outcomes** — categorizes each PR as merged or closed, with rejection reasons +3. **Compute metrics** — calculates merge rate, rejection rate by failure mode, and week-over-week trends +4. **Identify patterns** — proposes specific prompt changes backed by statistical evidence + +Creates GitHub issues only when actionable improvements are found (new failure patterns or evidence that previous changes didn't help). Skips output when merge rate is above 70% and all failure modes are already addressed. + +**Deploy:** +```bash +kubectl apply -f self-development/kelos-retrospective.yaml +``` + ## Customizing for Your Repository To adapt these examples for your own repository: diff --git a/self-development/kelos-retrospective.yaml b/self-development/kelos-retrospective.yaml new file mode 100644 index 00000000..ddec273d --- /dev/null +++ b/self-development/kelos-retrospective.yaml @@ -0,0 +1,103 @@ +apiVersion: kelos.dev/v1alpha1 +kind: TaskSpawner +metadata: + name: kelos-retrospective +spec: + when: + cron: + schedule: "0 0 * * 1" # Weekly on Monday at midnight UTC + maxConcurrency: 1 + taskTemplate: + workspaceRef: + name: kelos-agent + model: opus + type: claude-code + ttlSecondsAfterFinished: 864000 + credentials: + type: oauth + secretRef: + name: kelos-credentials + podOverrides: + resources: + requests: + cpu: "250m" + memory: "512Mi" + ephemeral-storage: "2Gi" + limits: + cpu: "1" + memory: "2Gi" + ephemeral-storage: "2Gi" + agentConfigRef: + name: kelos-dev-agent + promptTemplate: | + You are a retrospective analyst for the Kelos self-development loop. + Your job is to measure the effectiveness of agent-generated PRs and + identify evidence-backed improvements to the worker prompt. + + ## Step 1: Collect PR outcome data (last 7 days) + + Fetch all recent kelos-generated PRs: + ``` + gh pr list --state all --label generated-by-kelos --limit 50 --json number,title,state,mergedAt,closedAt,body,labels + ``` + + For each PR, classify it: + - **Merged**: Successfully contributed to the project + - **Closed without merge**: Rejected — investigate why + + For each closed PR, read its review comments to categorize the rejection: + ``` + gh api repos/{owner}/{repo}/pulls/{number}/reviews + gh api repos/{owner}/{repo}/pulls/{number}/comments + gh pr view {number} --comments + ``` + + Categorize rejections into failure modes: + - **Scope creep**: Agent added unrequested features + - **Design disagreement**: Maintainer wanted a different approach + - **Duplicate/existing code**: Agent created something that already existed + - **Format/convention violation**: PR description, commit format, etc. + - **Quality issue**: Tests missing, code incorrect, review feedback ignored + - **Not actionable**: Issue was not suitable for autonomous agent work + - **Stale/superseded**: Another PR addressed the same issue first + + ## Step 2: Compute metrics + + Calculate: + - Total PRs created this week + - Merge rate (merged / total) + - Rejection rate by failure mode + - Compare to previous weeks if data is available from prior retrospective issues + + ## Step 3: Identify actionable patterns + + For each failure mode with 2+ occurrences: + - Read the current worker prompt in `self-development/kelos-workers.yaml` + - Check if the prompt already addresses this failure mode + - If not, propose a specific prompt addition with exact wording + + For merged PRs that required multiple resets: + - What did the agent miss on the first attempt? + - Could the prompt be clearer about that scenario? + + ## Step 4: Output + + If you find actionable improvements (new failure patterns not yet addressed + in the prompt, or evidence that previous prompt changes didn't help): + Create a GitHub issue with: + - Title: "Retrospective: [week date range] — [key finding]" + - Body with: metrics summary, failure mode breakdown, specific prompt changes + ``` + gh issue create --title "..." --body "..." --label generated-by-kelos + ``` + + If all failure modes are already addressed in the prompt and merge rate + is above 70%: exit without creating an issue. Not every run needs output. + + ## Constraints + - Only analyze PRs from the last 7 days (avoid re-analyzing old data) + - Do NOT create PRs — only create issues with proposals + - Check existing issues first to avoid duplicates: `gh issue list --label generated-by-kelos --limit 20` + - Be specific: cite PR numbers, quote review comments, propose exact prompt wording + - Do not create vague "we should improve X" issues — every proposal must include concrete text changes + pollInterval: 1m From 715fdf1c14ad330c7c078f057506c3cbe1c7289a Mon Sep 17 00:00:00 2001 From: Gunju Kim Date: Sun, 1 Mar 2026 14:11:49 +0000 Subject: [PATCH 2/3] Fix prompt: add date filter and use gh pr view for reviews Address review feedback: - Add date filter to gh pr list command to enforce 7-day window - Replace gh api repos/{owner}/{repo} commands with gh pr view which doesn't require owner/repo resolution Co-Authored-By: Claude Opus 4.6 --- self-development/kelos-retrospective.yaml | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/self-development/kelos-retrospective.yaml b/self-development/kelos-retrospective.yaml index ddec273d..6459154d 100644 --- a/self-development/kelos-retrospective.yaml +++ b/self-development/kelos-retrospective.yaml @@ -36,9 +36,9 @@ spec: ## Step 1: Collect PR outcome data (last 7 days) - Fetch all recent kelos-generated PRs: + Fetch all recent kelos-generated PRs (filter to the last 7 days): ``` - gh pr list --state all --label generated-by-kelos --limit 50 --json number,title,state,mergedAt,closedAt,body,labels + gh pr list --state all --label generated-by-kelos --limit 50 --search "created:>=$(date -d '7 days ago' +%Y-%m-%d)" --json number,title,state,mergedAt,closedAt,body,labels ``` For each PR, classify it: @@ -47,9 +47,8 @@ spec: For each closed PR, read its review comments to categorize the rejection: ``` - gh api repos/{owner}/{repo}/pulls/{number}/reviews - gh api repos/{owner}/{repo}/pulls/{number}/comments gh pr view {number} --comments + gh pr view {number} --json reviews ``` Categorize rejections into failure modes: From 5e8c71a38cdcdc546c1f13c9ef01db6dba24ee08 Mon Sep 17 00:00:00 2001 From: Gunju Kim Date: Sun, 1 Mar 2026 15:24:42 +0000 Subject: [PATCH 3/3] Restore inline review comments API call in retrospective prompt The gh pr view --json reviews only returns review bodies and state, not line-level code review feedback. Restore the gh api endpoint for pulls/{number}/comments to ensure the retrospective agent has complete rejection data when categorizing failure modes. Co-Authored-By: Claude Opus 4.6 --- self-development/kelos-retrospective.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/self-development/kelos-retrospective.yaml b/self-development/kelos-retrospective.yaml index 6459154d..1965667d 100644 --- a/self-development/kelos-retrospective.yaml +++ b/self-development/kelos-retrospective.yaml @@ -49,6 +49,7 @@ spec: ``` gh pr view {number} --comments gh pr view {number} --json reviews + gh api repos/{owner}/{repo}/pulls/{number}/comments ``` Categorize rejections into failure modes: