Skip to content

feat(agent): long-running task loop with LLM-planned proposals#2264

Closed
robbie-c wants to merge 2 commits into
mainfrom
posthog-code/long-running-task
Closed

feat(agent): long-running task loop with LLM-planned proposals#2264
robbie-c wants to merge 2 commits into
mainfrom
posthog-code/long-running-task

Conversation

@robbie-c
Copy link
Copy Markdown
Member

Summary

Adds an auto-continuing turn loop the agent can enter for tasks with an objective, measurable success criterion (test pass count, bundle size, error count, etc.). Inspired by Anthropic's Ralph Wiggum plugin but redesigned around the desktop agent framework instead of a bash loop.

How it works

  1. User types /long-running-task <description> (or the agent self-initiates).
  2. A brief planning prompt goes to the agent, with planning instructions attached as hidden _meta context. The agent explores first — reads relevant files, runs measurement commands, asks clarifying questions via AskUserQuestion only when intent is genuinely ambiguous.
  3. The agent emits a proposal as a <long-running-task-config>{...}</long-running-task-config> JSON block (goal, successCriterion, marker, maxIterations, approach).
  4. The renderer parses the block on end_turn and shows an inline confirmation card above the prompt input — editable fields, Start / Edit / Cancel.
  5. On Start, the renderer calls a new tRPC mutation that sets server-side state, then sends a kickoff prompt.
  6. After each end_turn while the task is active, the loop in claude-agent.ts pushes a continuation user message into session.input and keeps the while-loop alive instead of returning. The agent exits when:
    • It emits the configured marker (default <TASK_COMPLETE>) in its assistant text
    • It hits the max-iteration cap (default 20) — gets one wrap-up turn before exiting
    • The user clicks Stop loop in the pill
    • The user cancels normally
  7. Mid-loop user messages pause auto-continuation for that turn so the user can steer.

A pill above the prompt input shows iteration count + goal + Stop button while a task is active.

Files of note

  • packages/agent/src/adapters/claude/long-running-task/utils.ts — schemas, continuation builder, proposal parser, broadcast helpers
  • packages/agent/src/adapters/claude/claude-agent.ts — loop hook in the result.subtype === "success" branch + new START_LONG_RUNNING_TASK / STOP_LONG_RUNNING_TASK extMethods
  • packages/agent/src/adapters/claude/session/instructions.ts — appended system prompt teaching the proposal workflow
  • apps/code/src/renderer/components/long-running-task/ — pill + proposal card
  • apps/code/src/renderer/stores/longRunningTaskStore.ts — Zustand store keyed by taskRunId

Known limitations

  • Local sessions only. The slash command early-returns with a toast on cloud sessions — extMethod doesn't work over the cloud path.
  • Sentinel marker is trust-based. The agent could in theory emit the marker without actually verifying. The system prompt addendum is explicit about this; the iteration cap is the hard safety net. We can revisit with structural verification later if the trust-based version misfires.

Test plan

  • /long-running-task with an iterative task (e.g. fix N failing tests). Agent explores, asks if needed, emits proposal block, card renders.
  • Approve the proposal card → pill appears with 0/20, loop kicks in.
  • Counter increments on each end_turn until marker is emitted or cap is hit.
  • Edit the proposal in the card before approving — edited values persist when starting.
  • Click Stop loop mid-flight — current turn finishes, no more continuations injected.
  • Send a steering message mid-loop — agent responds to user, loop stays active and continues after.
  • Hit the iteration cap deliberately (set max=2 in a clearly-impossible task) — wrap-up turn runs.
  • /long-running-task on a cloud task → toast says not supported.
  • Cancel the entire prompt mid-loop → loop exits cleanly.

Created with PostHog Code

robbie-c added 2 commits May 20, 2026 22:40
Adds an auto-continuing turn loop that the agent enters when the user
approves a proposal block. While active, the harness pushes a continuation
user message after each end_turn until the agent emits a configured marker,
hits a max-iteration cap, or the user steers/stops.

The /long-running-task slash command sends a brief planning prompt with
hidden context — the agent explores the codebase, asks clarifying questions
via AskUserQuestion when intent is ambiguous, then emits the config as a
<long-running-task-config> JSON block. The renderer parses that block,
shows an inline confirmation card (editable goal/criterion/marker/max), and
on Start sets the server-side state via a new tRPC mutation + sends the
kickoff prompt.

While a task is active, a pill above the prompt input shows iterations
N/M + goal + a Stop button. Mid-loop user messages pause auto-continuation
for that turn so the user can steer.

Generated-By: PostHog Code
Task-Id: c1333c34-d768-47e2-9dda-d2e456ab701a
Refactors the long-running task helpers to be adapter-agnostic and adds
the same loop primitive to the codex adapter.

- decideLongRunningTaskStep now returns the continuation text instead of
  pushing it; caller routes it through the right turn mechanism
  (session.input push for claude, new PromptRequest for codex)
- LRT system-prompt instructions extracted to a shared module that both
  adapters append to their system prompt
- codex runPrompt wraps codexConnection.prompt() in a while loop —
  each iteration is a fresh ACP turn, no broadcast on continuations so
  the user only sees the original prompt in chat
- codex-client forwards every session notification to the session's
  notificationHistory so getLatestAssistantText finds the agent's text
  for marker + proposal detection
- codex extMethod handles START / STOP and emits the same update
  notifications the renderer already listens for
- codex spawn instructions get LRT instructions appended, mirroring the
  Claude APPENDED_INSTRUCTIONS path

Generated-By: PostHog Code
Task-Id: c1333c34-d768-47e2-9dda-d2e456ab701a
@robbie-c robbie-c force-pushed the posthog-code/long-running-task branch from 9246890 to fa85fa1 Compare May 20, 2026 21:47
@robbie-c robbie-c closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant