feat(devops): add auto-evolution loop (PR review + BMAD pipeline)#94
Merged
Conversation
…ipeline) Extends the DevOps Hand to periodically scan configured GitHub repos and: - review open PRs via the existing code-reviewer sub-agent, posting a single COMMENT review back to GitHub (never auto-APPROVE) - triage open issues via labels first, single-prompt LLM fallback - dispatch actionable issues (bug-fix / feature) to a new implementer sub-agent which runs the BMAD pipeline (Brainstorm -> Architect -> PRD -> Implement) scaled by bmad_strictness and produces a DRAFT PR Safety floor (always on): - draft PRs only, never auto-ready, never merge - never push to main/master/protected branches - escalates to devops_queue.json when touching workspace Cargo.toml, migrations, secrets, or >30 changed files - 70% per-turn token budget cap so subsequent ticks have headroom New settings: auto_evolve, evolution_repos, evolution_check_interval, bmad_strictness. New sub-agent: agents.implementer. New SKILL.md sections: Issue Triage Playbook, PR Review Automation, Bug Fix Playbook, BMAD Feature Pipeline, Draft PR Creation. Three new dashboard metrics: prs_reviewed, issues_processed, draft_prs_opened.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Blocking (5): - add max_changed_files setting (was referenced in implementer prompt but never defined) - drop metering_query reference (tool isn't in tools = [...] list); agent self-paces against budget instead - fix \n\n literal in jq --arg for issue cross-link comment; compose body in shell with printf so newlines survive - resolve BASE_BRANCH via /repos/owner/repo .default_branch instead of relying on an undefined variable - complete reviewer-verdict → GitHub review-event mapping (4 cases, not just request_changes); block routes through REQUEST_CHANGES with a blocking-prefix in the body, approve downgrades to COMMENT Medium (5): - correct Phase 6 → Phase 7 in the auto-evolution settings comment - remove schedule_create busy-loop confusion; Phase 7 fires per-turn while the Hand is already frequency = "continuous", with cadence enforced via devops_evolution_cursor memory key - generalize the forbid-main-worktree wording — discover and honor whatever pre-commit / pre-push / commit-msg hooks the upstream repo configures (was librefang-specific) - clarify the AI-attribution rule: ban LLM-vendor attribution (Claude, GPT, 🤖, etc.) but allow process attribution (DevOps Hand → implementer) for traceability - add USER_TYPE = "Bot" short-circuit that was extracted but never applied (bots get a token-cheap skip, not a deep review) Style (2): - document the four event_publish event names (devops_evolution_*) in a new SKILL.md table alongside the memory-keys table - justify implementer's max_history_messages = 100 with a comment (BMAD 4 phases × cargo build/test chains needs headroom)
D1 -- show SUMMARY_BODY (and VERDICT) assignment in PR review snippet: add explicit jq -r .summary / .verdict extraction from reviewer_output.json so the agent reading SKILL.md doesn't have to infer where these come from. D2 -- reword strict-mode wait semantics in both HAND.toml and SKILL.md: 'Stop. Wait...' was misleading because the agent loop has no in-turn pause primitive. Now spells out: end the current turn after queueing, let the continuous tick re-read the queue, resume on approved / skip on pending / abandon on rejected. Explicitly forbids busy-wait and sleep loops. D3 -- restructure bot / huge-diff short-circuit so agent-tool calls are expressed as numbered agent steps, not as '# memory_store ...' comments inside a bash block. The bash block now only extracts cheap signals; the decision and the tool calls are clearly agent-level. D4 -- remove the misleading 'exit 0' from the short-circuit bash and add a one-liner noting that exit 0 inside shell_exec only ends one shell session, not the Phase 7 pass; the agent must choose to move on.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the DevOps Hand to periodically scan configured GitHub repos and (a) review open PRs via the existing
code-reviewersub-agent and (b) triage open issues, dispatching actionable ones to a newimplementersub-agent that runs the Brainstorm → Architect → PRD → Implement pipeline scaled bybmad_strictnessand produces a draft PR.code-reviewerfor a structured verdict, posts a single GitHub review (COMMENTorREQUEST_CHANGES; never auto-APPROVE)bug-fix | feature | needs-info | skip); actionable ones get a draft PR via the BMAD pipeline[agents.implementer]with strict guardrails (failing test first for bugs, BMAD.md committed with the change, no merging, no push to protected branches)Safety floor (always on)
main/master/ protected branches--force/--no-verify/--amendagainst a remote branchdevops_queue.jsonon workspaceCargo.toml, migrations, secrets, or >30 changed filesSurface area
auto_evolve,evolution_repos,evolution_check_interval,bmad_strictness), Phase 7 — Evolution Loop in main agent prompt,[agents.implementer]block, 3 new dashboard metricsTest plan
taplo lint hands/devops/HAND.toml— passedtaplo fmt --check hands/devops/HAND.toml— passedpython scripts/validate.py --type hands— passedpython scripts/validate.py(full registry) — passedauto_evolve = true,evolution_repos = "<your-test-repo>", observe one tick produces aCOMMENTreview on an open PR.envOut of scope (intentional)
evolution-pilotHand — keeping everything inside DevOps Hand for now; see "is this too bloated?" discussion threads if we want to split laterevolution_check_intervalonly)