Skip to content

feat(devops): add auto-evolution loop (PR review + BMAD pipeline)#94

Merged
houko merged 3 commits into
mainfrom
feat/devops-evolution
May 14, 2026
Merged

feat(devops): add auto-evolution loop (PR review + BMAD pipeline)#94
houko merged 3 commits into
mainfrom
feat/devops-evolution

Conversation

@houko
Copy link
Copy Markdown
Contributor

@houko houko commented May 14, 2026

Summary

Extends the DevOps Hand to periodically scan configured GitHub repos and (a) review open PRs via the existing code-reviewer sub-agent and (b) triage open issues, dispatching actionable ones to a new implementer sub-agent that runs the Brainstorm → Architect → PRD → Implement pipeline scaled by bmad_strictness and produces a draft PR.

  • PR review path: pulls diff, asks code-reviewer for a structured verdict, posts a single GitHub review (COMMENT or REQUEST_CHANGES; never auto-APPROVE)
  • Issue path: label-first triage with single-prompt LLM fallback (bug-fix | feature | needs-info | skip); actionable ones get a draft PR via the BMAD pipeline
  • New sub-agent: [agents.implementer] with strict guardrails (failing test first for bugs, BMAD.md committed with the change, no merging, no push to protected branches)

Safety floor (always on)

  • Draft PRs only — Hand never marks PRs ready-for-review and never merges
  • Never pushes to main / master / protected branches
  • Never --force / --no-verify / --amend against a remote branch
  • Escalates to devops_queue.json on workspace Cargo.toml, migrations, secrets, or >30 changed files
  • Per-tick token budget capped at 70% so subsequent ticks have headroom

Surface area

  • HAND.toml: +246 lines — new routing aliases, 4 new settings (auto_evolve, evolution_repos, evolution_check_interval, bmad_strictness), Phase 7 — Evolution Loop in main agent prompt, [agents.implementer] block, 3 new dashboard metrics
  • SKILL.md: +365 lines — Issue Triage Playbook, PR Review Automation, Bug Fix Playbook, BMAD Feature Pipeline, Draft PR Creation
  • README.md: updated settings table + Auto-Evolution Mode section + required GitHub token scopes

Test plan

  • taplo lint hands/devops/HAND.toml — passed
  • taplo fmt --check hands/devops/HAND.toml — passed
  • python scripts/validate.py --type hands — passed
  • python scripts/validate.py (full registry) — passed
  • Smoke test in a sandbox librefang daemon with auto_evolve = true, evolution_repos = "<your-test-repo>", observe one tick produces a COMMENT review on an open PR
  • Trigger one bug-fix issue through to draft PR creation end-to-end
  • Verify the safety floor blocks: try pointing at a protected branch, try a >30-file change, try a path containing .env

Out of scope (intentional)

  • i18n translations for the 4 new settings — added English only, deferring to language-aware contributors
  • Cross-Hand event wiring with evolution-pilot Hand — keeping everything inside DevOps Hand for now; see "is this too bloated?" discussion threads if we want to split later
  • Webhook-driven triggering (currently cron-driven via evolution_check_interval only)

…ipeline)

Extends the DevOps Hand to periodically scan configured GitHub repos and:
- review open PRs via the existing code-reviewer sub-agent, posting a
  single COMMENT review back to GitHub (never auto-APPROVE)
- triage open issues via labels first, single-prompt LLM fallback
- dispatch actionable issues (bug-fix / feature) to a new implementer
  sub-agent which runs the BMAD pipeline (Brainstorm -> Architect ->
  PRD -> Implement) scaled by bmad_strictness and produces a DRAFT PR

Safety floor (always on):
- draft PRs only, never auto-ready, never merge
- never push to main/master/protected branches
- escalates to devops_queue.json when touching workspace Cargo.toml,
  migrations, secrets, or >30 changed files
- 70% per-turn token budget cap so subsequent ticks have headroom

New settings: auto_evolve, evolution_repos, evolution_check_interval,
bmad_strictness. New sub-agent: agents.implementer. New SKILL.md
sections: Issue Triage Playbook, PR Review Automation, Bug Fix
Playbook, BMAD Feature Pipeline, Draft PR Creation. Three new
dashboard metrics: prs_reviewed, issues_processed, draft_prs_opened.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

houko added 2 commits May 14, 2026 15:24
Blocking (5):
- add max_changed_files setting (was referenced in implementer prompt
  but never defined)
- drop metering_query reference (tool isn't in tools = [...] list);
  agent self-paces against budget instead
- fix \n\n literal in jq --arg for issue cross-link comment; compose
  body in shell with printf so newlines survive
- resolve BASE_BRANCH via /repos/owner/repo .default_branch instead
  of relying on an undefined variable
- complete reviewer-verdict → GitHub review-event mapping (4 cases,
  not just request_changes); block routes through REQUEST_CHANGES
  with a blocking-prefix in the body, approve downgrades to COMMENT

Medium (5):
- correct Phase 6 → Phase 7 in the auto-evolution settings comment
- remove schedule_create busy-loop confusion; Phase 7 fires per-turn
  while the Hand is already frequency = "continuous", with cadence
  enforced via devops_evolution_cursor memory key
- generalize the forbid-main-worktree wording — discover and honor
  whatever pre-commit / pre-push / commit-msg hooks the upstream
  repo configures (was librefang-specific)
- clarify the AI-attribution rule: ban LLM-vendor attribution
  (Claude, GPT, 🤖, etc.) but allow process attribution
  (DevOps Hand → implementer) for traceability
- add USER_TYPE = "Bot" short-circuit that was extracted but never
  applied (bots get a token-cheap skip, not a deep review)

Style (2):
- document the four event_publish event names (devops_evolution_*)
  in a new SKILL.md table alongside the memory-keys table
- justify implementer's max_history_messages = 100 with a comment
  (BMAD 4 phases × cargo build/test chains needs headroom)
D1 -- show SUMMARY_BODY (and VERDICT) assignment in PR review snippet:
add explicit jq -r .summary / .verdict extraction from reviewer_output.json
so the agent reading SKILL.md doesn't have to infer where these come from.

D2 -- reword strict-mode wait semantics in both HAND.toml and SKILL.md:
'Stop. Wait...' was misleading because the agent loop has no in-turn
pause primitive. Now spells out: end the current turn after queueing,
let the continuous tick re-read the queue, resume on approved / skip
on pending / abandon on rejected. Explicitly forbids busy-wait and
sleep loops.

D3 -- restructure bot / huge-diff short-circuit so agent-tool calls are
expressed as numbered agent steps, not as '# memory_store ...' comments
inside a bash block. The bash block now only extracts cheap signals;
the decision and the tool calls are clearly agent-level.

D4 -- remove the misleading 'exit 0' from the short-circuit bash and
add a one-liner noting that exit 0 inside shell_exec only ends one
shell session, not the Phase 7 pass; the agent must choose to move on.
@houko houko merged commit d215388 into main May 14, 2026
3 checks passed
@houko houko deleted the feat/devops-evolution branch May 14, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant