Skip to content

Latest commit

Β 

History

History
129 lines (102 loc) Β· 5.5 KB

File metadata and controls

129 lines (102 loc) Β· 5.5 KB

ShellForge Roadmap

Completed

v0.1.0 β€” Foundation

  • Go binary with Ollama integration
  • 3 agents (QA, report, prototype)
  • agentguard.yaml governance (enforce/monitor)
  • Cron-based scheduling

v0.2.0 β€” Release Pipeline

  • Goreleaser + Homebrew tap (brew install shellforge)
  • GitHub Pages site
  • shellforge serve β€” daemon mode with memory-aware scheduling
  • Terminal Bench 2.0 Harbor adapter

v0.3.x β€” Multi-Driver

  • shellforge run <driver> β€” launch governed agents
  • Driver support: Claude Code, Copilot CLI, Codex, Gemini
  • Format-agnostic intent parser (extracts tool calls from any model output)
  • Normalizer (raw tool call β†’ Canonical Action Representation)
  • Correction engine (denial β†’ feedback β†’ retry)
  • Setup wizard (6-step interactive installer)

v0.4.x β€” Environment Awareness

  • Server mode (Linux, no GPU) β€” skips Ollama, shows API drivers
  • Mac mode β€” local models via Ollama
  • shellforge evaluate β€” JSON governance evaluation endpoint
  • shellforge swarm β€” starts Dagu orchestration dashboard

v0.5.x β€” Driver Iteration

  • Tested Crush (broken β€” OpenAI-compat shim loses tool calls)
  • Tested Aider (file editing only, no shell execution)
  • Evaluated Goose (Block) β€” native Ollama, actually executes tools

v0.6.0 β€” Goose + Governed Shell ← CURRENT

  • Goose as local model driver (shellforge run goose)
  • govern-shell.sh β€” shell wrapper that evaluates every command through AgentGuard
  • shellforge run goose sets SHELL to governed wrapper automatically
  • Fixed catch-all deny bug (bounded-execution policy was denying everything)
  • Dagu DAG templates (sdlc-swarm, studio-swarm, workspace-swarm, multi-driver)

In Progress

Phase 7 β€” Governed Multi-Agent Architecture

Foundation types exist (internal/action/, internal/orchestrator/, internal/scheduler/queue.go) but not wired into execution.

7.1 β€” Wire Orchestrator

  • Connect orchestrator state machine to shellforge run
  • Proposal β†’ Governance β†’ Result flow through kernel
  • Run-level audit trail (structured events, not just logs)

7.2 β€” Turn-Based Swarm

  • Planner agent β€” task decomposition via Ollama
  • Worker agent β€” Goose executes subtasks with governance
  • Evaluator agent β€” validates results
  • State machine: PLANNING β†’ WORKING β†’ EVALUATING β†’ COMPLETE

7.3 β€” Resilience

  • Anti-loop hash detection
  • Escalation thresholds (auto-fail after N denials)
  • Circuit breaker on Ollama failures

7.4 β€” Observability

  • Structured event emission to SQLite
  • Run summaries with governance stats
  • 24h soak test

Planned

Phase 8 β€” AgentGuard MCP Server

  • MCP server exposing governed tools
  • Goose β†’ MCP β†’ AgentGuard β†’ execute
  • Dual-layer: kernel enforces, MCP integrates

Phase 9 β€” Terminal Bench 2.0

  • Harbor adapter
  • Dry run on single task with Goose
  • Full 89-task evaluation
  • Leaderboard submission

Phase 10 β€” Production Hardening

  • AgentGuard Go kernel integration (in-process, not subprocess)
  • Publish Go module (github.com/AgentGuardHQ/agentguard/go/pkg/hook)
  • Move internal/ types to pkg/ for external import
  • Cloud telemetry opt-in (AgentGuard Cloud)

Phase 11 β€” Replace Workspace Bash Swarm

  • Dagu replaces server/deploy.sh + cron + queue.txt
  • Multi-driver DAGs: Claude Code + Copilot + Codex on Linux box
  • Same governance policy across all drivers
  • ShellForge as the runtime for agentguard-workspace swarm

Bug Backlog (Open Issues)

Bugs identified during v0.6.x development. Fix before v1.0.

Issue Package Severity Description
#69 agentguard.yaml High Governance gap: plain rm and rm -r bypass no-destructive-rm policy
#67 scripts/govern-shell.sh Medium Fragile sed-based JSON parsing β€” denial reason extraction can fail or corrupt
#65 internal/scheduler Medium os.WriteFile error silently ignored β€” audit log loss
#63 internal/normalizer Medium classifyShellRisk prefix match too broad β€” catalog_tool classified as read-only
#62 cmd/shellforge Medium cmdEvaluate ignores JSON unmarshal error β€” malformed input defaults to allow
#61 internal/intent Low Dead code in flattenParams β€” first assignment immediately overwritten
#60 all packages High Zero test coverage β€” critical for a governance runtime

Stack (as of v0.6.1)

Component Role Status
Goose (Block) Local model driver Working
Claude Code API driver (Linux) Working (via hooks)
Copilot CLI API driver (Linux) Working (via hooks)
Codex CLI API driver (Linux) Coming soon
Gemini CLI API driver (Linux) Coming soon
Ollama Local inference Working
AgentGuard Governance kernel Working (YAML eval + Go kernel)
Dagu Orchestration Working (DAGs + web UI)
RTK Token compression Optional
Docker Sandbox Optional