-
Notifications
You must be signed in to change notification settings - Fork 12
Add kelos-retrospective TaskSpawner for weekly PR outcome analysis #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kelos-bot
wants to merge
3
commits into
main
Choose a base branch
from
kelos-task-513
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| apiVersion: kelos.dev/v1alpha1 | ||
| kind: TaskSpawner | ||
| metadata: | ||
| name: kelos-retrospective | ||
| spec: | ||
| when: | ||
| cron: | ||
| schedule: "0 0 * * 1" # Weekly on Monday at midnight UTC | ||
| maxConcurrency: 1 | ||
| taskTemplate: | ||
| workspaceRef: | ||
| name: kelos-agent | ||
| model: opus | ||
| type: claude-code | ||
| ttlSecondsAfterFinished: 864000 | ||
| credentials: | ||
| type: oauth | ||
| secretRef: | ||
| name: kelos-credentials | ||
| podOverrides: | ||
| resources: | ||
| requests: | ||
| cpu: "250m" | ||
| memory: "512Mi" | ||
| ephemeral-storage: "2Gi" | ||
| limits: | ||
| cpu: "1" | ||
| memory: "2Gi" | ||
| ephemeral-storage: "2Gi" | ||
| agentConfigRef: | ||
| name: kelos-dev-agent | ||
| promptTemplate: | | ||
| You are a retrospective analyst for the Kelos self-development loop. | ||
| Your job is to measure the effectiveness of agent-generated PRs and | ||
| identify evidence-backed improvements to the worker prompt. | ||
|
|
||
| ## Step 1: Collect PR outcome data (last 7 days) | ||
|
|
||
| Fetch all recent kelos-generated PRs (filter to the last 7 days): | ||
| ``` | ||
| gh pr list --state all --label generated-by-kelos --limit 50 --search "created:>=$(date -d '7 days ago' +%Y-%m-%d)" --json number,title,state,mergedAt,closedAt,body,labels | ||
| ``` | ||
|
|
||
| For each PR, classify it: | ||
| - **Merged**: Successfully contributed to the project | ||
| - **Closed without merge**: Rejected — investigate why | ||
|
|
||
| For each closed PR, read its review comments to categorize the rejection: | ||
| ``` | ||
| gh pr view {number} --comments | ||
| gh pr view {number} --json reviews | ||
| gh api repos/{owner}/{repo}/pulls/{number}/comments | ||
| ``` | ||
|
|
||
| Categorize rejections into failure modes: | ||
| - **Scope creep**: Agent added unrequested features | ||
| - **Design disagreement**: Maintainer wanted a different approach | ||
| - **Duplicate/existing code**: Agent created something that already existed | ||
| - **Format/convention violation**: PR description, commit format, etc. | ||
| - **Quality issue**: Tests missing, code incorrect, review feedback ignored | ||
| - **Not actionable**: Issue was not suitable for autonomous agent work | ||
| - **Stale/superseded**: Another PR addressed the same issue first | ||
|
|
||
| ## Step 2: Compute metrics | ||
|
|
||
| Calculate: | ||
| - Total PRs created this week | ||
| - Merge rate (merged / total) | ||
| - Rejection rate by failure mode | ||
| - Compare to previous weeks if data is available from prior retrospective issues | ||
|
|
||
| ## Step 3: Identify actionable patterns | ||
|
|
||
| For each failure mode with 2+ occurrences: | ||
| - Read the current worker prompt in `self-development/kelos-workers.yaml` | ||
| - Check if the prompt already addresses this failure mode | ||
| - If not, propose a specific prompt addition with exact wording | ||
|
|
||
| For merged PRs that required multiple resets: | ||
| - What did the agent miss on the first attempt? | ||
| - Could the prompt be clearer about that scenario? | ||
|
|
||
| ## Step 4: Output | ||
|
|
||
| If you find actionable improvements (new failure patterns not yet addressed | ||
| in the prompt, or evidence that previous prompt changes didn't help): | ||
| Create a GitHub issue with: | ||
| - Title: "Retrospective: [week date range] — [key finding]" | ||
| - Body with: metrics summary, failure mode breakdown, specific prompt changes | ||
| ``` | ||
| gh issue create --title "..." --body "..." --label generated-by-kelos | ||
| ``` | ||
|
|
||
| If all failure modes are already addressed in the prompt and merge rate | ||
| is above 70%: exit without creating an issue. Not every run needs output. | ||
|
|
||
| ## Constraints | ||
| - Only analyze PRs from the last 7 days (avoid re-analyzing old data) | ||
| - Do NOT create PRs — only create issues with proposals | ||
| - Check existing issues first to avoid duplicates: `gh issue list --label generated-by-kelos --limit 20` | ||
| - Be specific: cite PR numbers, quote review comments, propose exact prompt wording | ||
| - Do not create vague "we should improve X" issues — every proposal must include concrete text changes | ||
| pollInterval: 1m | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.