Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions self-development/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,29 @@ Creates GitHub issues for actionable improvements found.
kubectl apply -f self-development/kelos-self-update.yaml
```

### kelos-retrospective.yaml

Runs weekly to analyze PR outcomes and identify evidence-backed prompt improvements.

| | |
|---|---|
| **Trigger** | Cron `0 0 * * 1` (weekly on Monday at midnight UTC) |
| **Model** | Opus |
| **Concurrency** | 1 |

Each run performs a structured analysis:
1. **Collect PR data** — fetches all kelos-generated PRs from the last 7 days
2. **Classify outcomes** — categorizes each PR as merged or closed, with rejection reasons
3. **Compute metrics** — calculates merge rate, rejection rate by failure mode, and week-over-week trends
4. **Identify patterns** — proposes specific prompt changes backed by statistical evidence

Creates GitHub issues only when actionable improvements are found (new failure patterns or evidence that previous changes didn't help). Skips output when merge rate is above 70% and all failure modes are already addressed.

**Deploy:**
```bash
kubectl apply -f self-development/kelos-retrospective.yaml
```

## Customizing for Your Repository

To adapt these examples for your own repository:
Expand Down
103 changes: 103 additions & 0 deletions self-development/kelos-retrospective.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: kelos-retrospective
spec:
when:
cron:
schedule: "0 0 * * 1" # Weekly on Monday at midnight UTC
maxConcurrency: 1
taskTemplate:
workspaceRef:
name: kelos-agent
model: opus
type: claude-code
ttlSecondsAfterFinished: 864000
credentials:
type: oauth
secretRef:
name: kelos-credentials
podOverrides:
resources:
requests:
cpu: "250m"
memory: "512Mi"
ephemeral-storage: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
ephemeral-storage: "2Gi"
agentConfigRef:
name: kelos-dev-agent
promptTemplate: |
You are a retrospective analyst for the Kelos self-development loop.
Your job is to measure the effectiveness of agent-generated PRs and
identify evidence-backed improvements to the worker prompt.

## Step 1: Collect PR outcome data (last 7 days)

Fetch all recent kelos-generated PRs (filter to the last 7 days):
```
gh pr list --state all --label generated-by-kelos --limit 50 --search "created:>=$(date -d '7 days ago' +%Y-%m-%d)" --json number,title,state,mergedAt,closedAt,body,labels
```

For each PR, classify it:
- **Merged**: Successfully contributed to the project
- **Closed without merge**: Rejected — investigate why

For each closed PR, read its review comments to categorize the rejection:
```
gh pr view {number} --comments
gh pr view {number} --json reviews
gh api repos/{owner}/{repo}/pulls/{number}/comments
```

Categorize rejections into failure modes:
- **Scope creep**: Agent added unrequested features
- **Design disagreement**: Maintainer wanted a different approach
- **Duplicate/existing code**: Agent created something that already existed
- **Format/convention violation**: PR description, commit format, etc.
- **Quality issue**: Tests missing, code incorrect, review feedback ignored
- **Not actionable**: Issue was not suitable for autonomous agent work
- **Stale/superseded**: Another PR addressed the same issue first

## Step 2: Compute metrics

Calculate:
- Total PRs created this week
- Merge rate (merged / total)
- Rejection rate by failure mode
- Compare to previous weeks if data is available from prior retrospective issues

## Step 3: Identify actionable patterns

For each failure mode with 2+ occurrences:
- Read the current worker prompt in `self-development/kelos-workers.yaml`
- Check if the prompt already addresses this failure mode
- If not, propose a specific prompt addition with exact wording

For merged PRs that required multiple resets:
- What did the agent miss on the first attempt?
- Could the prompt be clearer about that scenario?

## Step 4: Output

If you find actionable improvements (new failure patterns not yet addressed
in the prompt, or evidence that previous prompt changes didn't help):
Create a GitHub issue with:
- Title: "Retrospective: [week date range] — [key finding]"
- Body with: metrics summary, failure mode breakdown, specific prompt changes
```
gh issue create --title "..." --body "..." --label generated-by-kelos
```

If all failure modes are already addressed in the prompt and merge rate
is above 70%: exit without creating an issue. Not every run needs output.

## Constraints
- Only analyze PRs from the last 7 days (avoid re-analyzing old data)
- Do NOT create PRs — only create issues with proposals
- Check existing issues first to avoid duplicates: `gh issue list --label generated-by-kelos --limit 20`
- Be specific: cite PR numbers, quote review comments, propose exact prompt wording
- Do not create vague "we should improve X" issues — every proposal must include concrete text changes
pollInterval: 1m
Loading