Skip to content

feat(agent): ynh agent run — autonomous agent loop driver#154

Merged
eyelock merged 6 commits into
developfrom
feat/yna-loop-agent
May 12, 2026
Merged

feat(agent): ynh agent run — autonomous agent loop driver#154
eyelock merged 6 commits into
developfrom
feat/yna-loop-agent

Conversation

@eyelock
Copy link
Copy Markdown
Owner

@eyelock eyelock commented May 12, 2026

Summary

Adds ynh agent run — an autonomous agent loop driver embedded in the ynh binary. The loop spawns a vendor agent subprocess, runs sensors after each turn, synthesises feedback, and enforces budgets until convergence or a halt condition.

Companion PR: eyelock/TermQ#248

Changes

Core loop (internal/agent/)

  • loop.goRunLoop: plan phase, act loop, sensor execution, convergence check, stuckness watchdog, interactive approval, budget enforcement, NDJSON trajectory emission
  • worker.goWorkerBackend / WorkerSession interfaces; wire-format details stay inside each implementation
  • budget.goBudget: turn/token/wall-clock limits with typed exit codes and BudgetType enum
  • watchdog.goWatchdog: edit-loop detection + no-progress detection (sensor hash unchanged K turns)
  • sensor.goRunSensor wrapping ynh sensors run; SensorHash for watchdog; --sensor-overlay-json pass-through
  • control.goControlReader: JSON control messages for interactive approval/interrupt via stdin
  • trajectory.goTrajectoryWriter: typed NDJSON event stream (type, timestamp, synthesized_feedback, budget enum, total_turns/total_tokens)

Backends

  • claude.goClaudeBackend: long-lived subprocess, stream-json mode
  • codex.goCodexBackend: long-lived subprocess, codex exec --json
  • cursor.goCursorBackend: per-turn subprocess with --resume <chatId>

CLI (cmd/ynh/)

  • agent.goynh agent run with flags: --harness, --task, --backend, --sandbox, --model, --max-turns, --max-tokens, --max-wall, --convergence-sensor, --worktree, --emit-jsonl, --sensor-overlay, --interactive, --no-plan
  • sensors.go--sensor-overlay-json flag; shallow JSON merge applied before execution
  • cliformat.go — disabled HTML escaping in structured error envelope so <harness-name> renders literally

Exit codes

Code Meaning
0 Converged
10 Turn cap
11 Token budget
12 Wall-clock limit
13 Stuck
14 Tamper detected
20 Worker error
30 User aborted

Testing

  • make check passes — 0 lint issues, all tests green, both binaries build
  • make e2e passes — full E2E suite green
  • Unit tests for all new packages: budget, watchdog, trajectory, control, sensor, claude, codex, cursor, loop integration, cliformat

🤖 Generated with Claude Code

David Collie and others added 6 commits May 11, 2026 17:31
Implements the Phase 1 loop driver as `ynh agent run`, embedding
it in the ynh binary per the agent-loop plan. The loop driver is
the missing orchestration layer that sits above ynh's sensor
execution: it spawns a vendor agent subprocess, runs sensors
between turns, synthesises feedback, and enforces budgets and
stuckness limits until all sensors converge.

Key design decisions:
- WorkerBackend interface isolates all wire-format details inside
  each backend; the loop driver never sees stream-json specifics.
  Claude Code is the only v1 backend; the interface is ready for
  Codex (Phase 4) with ~200 incremental lines.
- Sensor execution shells out to `ynh sensors run` (already
  shipped in v0.3.1) so loop-driver policy (pass/fail thresholds)
  stays separate from ynh's mechanical execution.
- NDJSON trajectory writer emits one event per line to a JSONL
  file or stdout; TermQ's Inspector drives off this stream.
- Stdin control protocol (approve_plan, reject_plan, interrupt,
  approve_turn, replace_feedback) allows TermQ and CI to steer
  the loop without polling.
- Budget (turns/tokens/wall-clock) and stuckness watchdog
  (edit-loop + no-progress) are enforced in-process with typed
  exit codes (10-30) for CI integration.
- srt sandbox support via --sandbox srt|none.
- Plan/Act phase split: first turn writes plan.md, awaits
  approval, then enters the act loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add CodexBackend (codex exec --json) and CursorBackend (per-turn
  subprocess with --resume) so all three vendor CLIs are supported
- Fix trajectory wire format to match TermQ consumer expectations:
  Event.Kind serialises as "type" (not "kind"), Event.Timestamp as
  "timestamp" (not "time")
- BudgetExceededData gains a typed Budget field ("turns"/"tokens"/"wall_clock")
- SessionEndData gains TotalTurns and TotalTokens on all exit paths
- TurnApprovalData field renamed to SynthesizedFeedback (JSON: synthesized_feedback)
- Budget.Exceeded() returns BudgetType as a third value; loop driver
  threads it through to the trajectory event

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover NDJSON parsing, content accumulation, usage tracking, EOF handling,
unknown-event skipping, Send wire format, and cursor session state
(pending queue, firstTurn flag, Close no-op).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tching

Loop driver accepts --sensor-overlay <json> (e.g. '{"build":{"source":
{"command":"make fast"}}}') and passes each sensor's overlay to
ynh sensors run via the new --sensor-overlay-json flag. ynh performs a
shallow JSON field-merge over the base harness declaration before
executing the sensor, keeping all execution logic inside ynh.

TermQ uses this to let users tweak sensor declarations per-session in
the Inspector without modifying the installed harness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Go's json.Marshal escapes < and > as </> by default.
Switching to json.NewEncoder + SetEscapeHTML(false) so usage strings
like <harness-name> render literally in terminals and CI logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover unstructured vs structured mode, JSON field values, no-HTML-escape
behaviour (verifies SetEscapeHTML(false) is effective), and trailing newline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eyelock eyelock changed the title feat(agent): ynh agent run — autonomous loop driver (Phase 1) feat(agent): ynh agent run — autonomous agent loop driver May 12, 2026
@eyelock eyelock marked this pull request as ready for review May 12, 2026 05:45
@eyelock eyelock merged commit bee32ac into develop May 12, 2026
6 checks passed
@eyelock eyelock deleted the feat/yna-loop-agent branch May 12, 2026 05:45
@eyelock eyelock mentioned this pull request May 13, 2026
eyelock added a commit that referenced this pull request May 13, 2026
Promotes release/v0.4.0 to main for stable release.

## What's in 0.4.0

**Features**
- `ynh agent run` — autonomous agent loop driver (#154)
- `--instructions` flag for per-invocation context injection on `ynh
run` (#152)
- Focus, profile, hook, and MCP editing commands — `ynh
focus/profile/hook/mcp add/remove/update` (#160)

**Fixes**
- Schema-3 pointer-form local installs: read/write symmetry (#158)
- \`ynh ls\`: derive load id from entry namespace, not hard-coded
\`local/\` (#159)
- Include/delegate edits route to source dir for local installs (#149)
- Dead-code cleanup in harness loader (#150)

**Docs**
- New \`docs/focus.md\` reference page
- Tutorial chain realigned to current sidebar ordering
- \`--focus\` flag and \`YNH_FOCUS\` env var documented across \`ynh
run\` and \`ynd preview/diff/export\` (#161)

**Tests**
- Unit coverage push: \`internal/harness\` 48.5% → 84.5%; agent and
vendor cheap wins (#162)
- E2E coverage for ynd-side focus handling, \`ynd fmt\` edge cases,
focus clear-profile round trip (#163)

**CI/release**
- \`-trimpath\` added to goreleaser build flags (#156)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant