Zellij to tmux + ConPTY runtime, session save/restore, crash-proof reconcile (port #404)#2183
Merged
harshitsinghbhandari merged 4 commits intoJun 25, 2026
Conversation
* feat(runtime): add tmux adapter package
Adds backend/internal/adapters/runtime/tmux implementing ports.Runtime via
the tmux CLI. Drop-in replacement for the zellij adapter on Darwin/Linux.
Key design points:
- Handle is a plain session id string (no pane-id split needed for tmux).
- Exact-match session targeting via = prefix for kill-session and has-session.
- Keep-alive shell appended to launch command so sessions survive agent exit.
- send-keys -l chunked for literal text delivery (no key-name interpretation).
- IsAlive distinguishes definitive-dead (missing/no-server output) from probe
errors so the reaper never kills a session on a transient tmux failure.
- 34 tests pass: 32 unit tests via fakeRunner seam, 2 integration tests on
real tmux 3.6b (TestRuntimeIntegration, TestRuntimeIntegrationExactSessionParsing).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tmux): address four code-review findings in tmux runtime adapter
- Remove em dash from tmux_test.go:462 (project hard rule); replace with semicolon
- Derive integration test session IDs from t.Name() so concurrent runs do not collide on the same tmux session
- Remove dead scaffolding variables (r/fr, r2/fr2) in TestCreateDestroysAndReturnsErrorWhenNotAlive
- Quote \${SHELL:-/bin/sh} in buildLaunchCommand and update all asserting tests
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(runtime): wire tmux on Darwin/Linux via runtimeselect, keep zellij on Windows
- New package runtimeselect: Runtime union interface (ports.Runtime +
SendMessage/GetOutput/AttachCommand) with compile-time assertions for
both adapters. New(log) returns tmux on non-Windows, zellij on Windows
(replicating the old daemon socket-dir setup).
- daemon.go: replace zellij-specific socket-dir block with
runtimeselect.New(log); update comment to be runtime-neutral.
- lifecycle_wiring.go: startSession param changed from *zellij.Runtime
to runtimeselect.Runtime.
- cli/doctor.go: runtime-aware checkTerminalRuntime (tmux on Darwin/Linux,
zellij on Windows); added checkTmux.
- cli/spawn.go: attach hint prints tmux attach -t <name> on non-Windows,
keeps zellij attach hint on Windows.
- wiring_test.go: startSession test uses runtimeselect.New(nil); zellij
direct tests retained for zellij-specific coverage.
- doctor_test.go: replaced three zellij tool tests with tmux equivalents.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore: tidy runtime-neutral comments and doctor import grouping
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(tmux): drop unused runner.Start seam
tmux creates sessions detached via new-session -d, so the Start method
(carried over from the zellij runner shape, where it backs the Windows
fire-and-forget spawn) is never called. Remove it from the interface and
its implementations to shrink the seam.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(conpty): add protocol codec and output ring buffer (pure Go, OS-agnostic)
Ports the ConPTY named-pipe binary framing protocol and rolling output
buffer from pty-host.ts to Go. Implements EncodeMessage, MessageParser
(handles arbitrary chunk boundaries, payload copy guarantee), and Ring
(MaxOutputLines=1000, ANSI-safe, concurrent Append+Snapshot). All 15
unit tests pass on Darwin; GOOS=windows build is also clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(conpty): harden copy-safety and add concurrent ring test
Strengthen TestParserPayloadIsCopy to catch internal-buffer aliasing:
feed frame1, capture its payload, feed frame2 of the same length so the
parser's buffer overwrites the frame1 region, then assert frame1's bytes
are unchanged. The prior test only mutated the input slice post-Feed and
did not exercise the real aliasing risk.
Add TestRingConcurrent: 10 writer goroutines (Append) and 10 reader
goroutines (Snapshot + Tail) running concurrently with a WaitGroup. The
test is meaningful only under the race detector and catches any missing
mu coverage on Ring's exported methods.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(ptyregistry): port Windows pty-host sideband registry to Go
Adds package ptyregistry under backend/internal/adapters/runtime/conpty/ptyregistry.
Ports windows-pty-registry.ts: defensive read, atomic temp+rename write,
delete-on-empty, register-replaces-same-ID, and auto-pruning List.
PID liveness isolated behind build tags (syscall.Kill on Unix,
OpenProcess on Windows). 10 tests all green on Darwin.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(sdd): phase B briefs and progress for B1-B3
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(conpty): add pty-host serve engine with loopback TCP transport (B3)
Ports pty-host.ts behavior to Go: ptyConn interface seam, Serve engine
with ring replay, fan-out broadcast, MSG_* handlers, PTY-exit keep-alive,
and graceful shutdown (ConPTY dispose first, 50ms grace, then clients and
listener). Real conptyConn is Windows-only via build tag; non-Windows stub
keeps the package importable on Darwin/Linux. Tests use a fake ptyConn
with real loopback sockets and the B1 MessageParser, passing with -race.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(conpty): deliver scrollback snapshot and register client atomically
Review of Task B3 found one Important bug and two minors.
Important: in handleConn the ring Snapshot and the client registration
ran under two separate h.mu acquisitions. A PTY chunk arriving in that
gap was in neither the snapshot nor that client's broadcast, so it was
silently dropped (a hole in the client's stream). Now take the snapshot,
write it to the conn, and add the conn to the clients set all under a
single h.mu hold; broadcast also takes h.mu so it cannot interleave.
Added TestScrollbackLiveOrdering_NoDrop, which emits a contiguous
numbered stream while a client connects and asserts the client's stream
has no internal gap. It reliably fails against the old two-step code and
passes under -race -count=20.
Minor (faithfulness): conptyConn.Close() now also best-effort
Process.Kill() (nil-guarded) so a child that ignores ConPTY EOF still
exits and Done() fires, mirroring pty.kill() in pty-host.ts.
Minor (simplify): use os.Environ() instead of
exec.Command(shellCmd).Environ() for the child env.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(sdd): B4 brief
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(conpty): add runtime adapter with loopback pty-client and session management (B4)
Implements the conpty Runtime adapter: injectable spawn seam, loopback
TCP client helpers (SendMessage/GetOutput/IsAlive/Kill), and Runtime
methods (Create/Destroy/IsAlive/SendMessage/GetOutput). Session resolution
uses an in-memory map with B2 registry fallback for daemon-restart
recovery. Windows-only detached spawn in spawn_windows.go; stub errors
on other OSes. All adapter methods are unit-tested on Darwin against an
in-process B3 Serve and fakePTY. 48 tests pass, all three GOOS builds
succeed, vet clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(conpty): split IsAlive dead-vs-transient for reaper safety
clientIsAlive collapsed every probe failure (dial timeout, read-deadline
expiry, write error, connection-refused) to false, which the reaper turns
into ProbeDead and the LCM can promote to a permanent reap. A single
transient 2s loopback timeout would spuriously kill a live idle session.
Now clientIsAlive returns (alive bool, transientErr error): a refused dial
is definitively gone (false, nil); a timeout or any connected-then-failed
I/O error is transient (false, err) so the reaper records ProbeFailed and
retries. Wire IsAlive to propagate it. Add regression test covering both
the refused-is-gone and timeout-is-transient paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(sdd): B5 brief + ledger
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(terminal): stream-based Attach for tmux/zellij/conpty
Evolve the terminal layer from argv-based attach (PTYSource.AttachCommand
+ injected spawnFunc) to stream-based attach (Source embedding
ports.Attacher). tmux/zellij keep spawning their attach CLI on a local
PTY via the new shared ptyexec.Spawn; conpty attaches by dialing its
loopback pty-host directly with a loopbackStream over the B1 framing
protocol. Reattach/backoff/size/SIGWINCH/detach semantics are unchanged.
- ports: add Stream + Attacher.
- ptyexec: new shared package holding the creack/pty (unix) and ConPTY
(windows) spawn, moved verbatim from terminal with its tests.
- terminal: PTYSource -> Source, drop spawnFunc/WithSpawn, run loop calls
src.Attach and uses ports.Stream.
- tmux/zellij: add Attach (argv via ptyexec.Spawn); conpty: add Attach
(loopbackStream); ports.Attacher assertions on all three.
- runtimeselect: union embeds ports.Attacher in place of AttachCommand.
- tests migrated; new conpty attach_test against in-process Serve+fakePTY.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* style(ptyexec): replace em dashes carried from moved pty files
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(sdd): B6 brief + B5 ledger
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(runtime): select conpty on Windows, register pty-host subcommand, delete zellij
- runtimeselect.New: Windows branch now returns conpty.New(conpty.Options{}) instead
of zellij; compile-time assertion updated to conpty.Runtime.
- cli/ptyhost.go: new hidden "ao pty-host" subcommand (DisableFlagParsing so agent
shell args with leading dashes survive); calls conpty.RunHost and exits with its code.
- cli/root.go: wires newPtyHostCommand alongside newLaunchCommand.
- cli/doctor.go: Windows terminal-runtime check replaced with a static ConPTY
built-in pass; zellij import and checkZellij function removed.
- cli/spawn.go: Windows attach hint updated to dashboard message (ConPTY has no
CLI attach); zellij import removed.
- daemon/lifecycle_wiring.go: stale zellij comment updated to tmux/conpty.
- daemon/wiring_test.go: zellij import and TestDaemonZellijSocketDir test removed;
TestWiring_StartLifecycleThreadsMessengerIntoLCM now uses tmux.New.
- terminal/attachment_integration_test.go: re-pointed at real tmux
(TestAttachmentStreamsRealTmuxPane + TestAttachmentReattachAdoptsNewSize);
sessions cleaned up in t.Cleanup.
- internal/adapters/runtime/zellij: deleted entirely.
All three GOOS builds pass; go test -race ./... 1607 passed; go vet clean;
grep -rn "runtime/zellij" returns nothing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(daemon): correct terminal-runtime comment to conpty on Windows
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(ptyexec): drop stale zellij reference in Windows spawn comment
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(sdd): final phase B ledger
* build(desktop): support local keychain signing for macOS builds
Bridge forge.config.ts to accept the local keychain flow (APPLE_SIGNING_IDENTITY
identity + AO_NOTARY_PROFILE notarytool profile) in addition to the existing CI
secrets path (CSC_LINK + APPLE_ID/app-specific-password). Enables a signed +
notarized macOS build from a developer Mac without exporting a .p12 or the Apple
ID app-specific password.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(daemon): default TERM so Finder-launched tmux attach works
A Finder/Dock launch starts the supervisor under launchd with no
controlling tty, so TERM is unset. The daemon inherits that, and its
tmux attach client (spawned with env=nil, inheriting the daemon env)
dies immediately with "open terminal failed: terminal does not support
clear" — the orchestrator terminal pane never opens.
Seed TERM=xterm-256color (what the renderer's xterm.js emulates) as the
base of buildDaemonEnv, the same place PATH is reconstructed for the same
class of "Finder launch lacks a terminal's env" bug. A real TERM from the
shell/process env still wins.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(lifecycle): plan for save-on-close/restore-on-open sessions
Captures the intended daemon lifecycle: on shutdown save every running
session (worker and orchestrator) plus its gitignore-respecting uncommitted
work to refs/ao/preserved/<id>, then force-remove worktrees; on boot recreate
worktrees, replay the preserved work, and restore all sessions. Reuses
existing SQLite state, session_worktrees.preserved_ref, manager.Restore, and
the /shutdown endpoint (no new file, migration, or route).
Also gitignore the built daemon binary copied into frontend/daemon/.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore(frontend): sync regenerated pnpm-lock and routeTree
Working-tree regeneration of the pnpm lockfile and TanStack Router generated
route tree. No hand edits; generated output only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(workspace): add ForceDestroy for shutdown-path worktree removal
Adds ForceDestroy(ctx, info) to ports.Workspace and the gitworktree
adapter. It runs `git worktree remove --force`, then prune, then
os.RemoveAll as a backstop. A new worktreeForceRemoveArgs builder in
commands.go emits --force; the existing worktreeRemoveArgs is untouched
so Destroy still refuses dirty worktrees via ErrWorkspaceDirty.
TDD: test first creates a dirty worktree, confirms Destroy refuses with
ErrWorkspaceDirty, then confirms ForceDestroy succeeds and the path is
gone and deregistered. All 1609 backend tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(workspace): add StashUncommitted and ApplyPreserved for session lifecycle
Implements the correctness-critical save-on-close / restore-on-open pair
in the gitworktree adapter:
- StashUncommitted: captures uncommitted work (tracked edits and new
non-ignored files) via a temp GIT_INDEX_FILE into a real commit stored
at refs/ao/preserved/<session-id>. Never touches the real index or
stash stack. Returns empty string for clean worktrees. Logs the count
of .gitignore-skipped paths.
- ApplyPreserved: replays the preserve commit onto a freshly re-added
worktree via "git checkout <SHA> -- .". Deletes the ref on clean
success; keeps it and returns ErrPreservedConflict (wrapped) on
content conflicts.
- Adds both methods to ports.Workspace interface and stubs them in
integration and session_manager test doubles.
TDD: wrote two failing tests first (RED confirmed via build failure on
undefined methods), then implemented to GREEN. All 39 adapter tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(workspace): replace path-checkout with cherry-pick in ApplyPreserved
git checkout <sha> -- . is a path-checkout that always exits 0 for
content divergence, making ErrPreservedConflict unreachable. Replace
with git cherry-pick --no-commit which performs a true three-way merge,
leaves textual conflict markers on conflict, and exits non-zero so the
sentinel is correctly returned. Conflict detection now uses exit code
only (locale-independent). Add TestWorkspaceIntegrationApplyPreservedConflict
to assert: error is ErrPreservedConflict, preserve ref is kept, conflict
markers appear in the file. All 40 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(session-manager): add SaveAndTeardownAll and RestoreAll for shutdown lifecycle
Implements Task 3: capture-then-destroy on shutdown and restore-all on startup.
- Adds ErrPreservedConflict to ports as a named sentinel; gitworktree aliases it
(following the same pattern as ErrBranchCheckedOutElsewhere).
- Extends the Store interface with UpsertSessionWorktree and ListSessionWorktrees
so the session manager can write the shutdown-saved marker and read it back.
- SaveAndTeardownAll: for every live session with a workspace path, stash
uncommitted work, write the session_worktrees row (DB commit before worktree
removal, crash-safety invariant), mark terminated, destroy runtime, force-remove
the worktree. Best-effort per session; no kind filter.
- RestoreAll: for every terminated session that has a session_worktrees row (the
marker written by SaveAndTeardownAll), re-create the worktree, apply any
preserved ref (conflict logs and continues), then relaunch via the existing
single-session Restore. Sessions killed by the user before shutdown (no row)
are skipped. Best-effort per session; no kind filter.
- TDD: 9 new tests (RED confirmed via build failure, GREEN confirmed 63 pass).
Full suite: 1621 tests across 77 packages.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(terminal): enable tmux mouse scroll and fix link clicking
On macOS the runtime is tmux, but two mouse interactions were broken in
the embedded terminal while copy/paste kept working:
- Scroll: the renderer drives scrolling by writing SGR mouse-wheel
reports into the pane (the zellij `--mouse-mode true` model), but tmux
ignores those reports unless mouse mode is on. Create only set `status
off`, never `mouse on`, so wheel scrolling silently no-opped. Enable
`set-option -t <id> mouse on`, mirroring the existing status-off step.
- Link clicking: the default WebLinksAddon handler calls window.open()
with an empty URL and then assigns location.href. Electron's
setWindowOpenHandler denies every window.open and only forwards the URL
passed to it, so the empty open is dropped and clicks no-op. Pass the
matched URL to window.open directly so the main process routes it to
shell.openExternal (the OS browser).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(session-manager): assert UpsertSessionWorktree precedes ForceDestroy
Add a shared ordered call log (sharedLog *[]string) to both fakeStore
and fakeWorkspace. TestSaveAndTeardownAll_CaptureOrderAndMarker now
wires both fakes to the same slice and asserts upsertIdx < forceIdx,
enforcing the crash-safety invariant that the DB write is committed
before the worktree is force-destroyed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(daemon): wire RestoreAll/SaveAndTeardownAll into boot/shutdown sequence
Exposes session manager through a minimal sessionLifecycle interface
(RestoreAll, SaveAndTeardownAll) returned from startSession, then calls
RestoreAll (best-effort) before srv.Run and SaveAndTeardownAll with a
fresh 30s-bounded context after srv.Run returns. Both SIGTERM and POST
/shutdown funnel through srv.Run returning, so the single save call site
covers both paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(daemon): fix seam-test tautology and lifecycle variable shadow
Finding 1: dispatch both sessionLifecycle methods through an interface
variable (var sl sessionLifecycle = fake) so the runtime body exercises
interface dispatch, not just direct struct method calls.
Finding 2: rename local variable 'lifecycle' to 'lc' in
TestWiring_StartSessionBuildsSessionService to remove the shadow of the
imported lifecycle package.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(frontend): call POST /shutdown before killing daemon on quit
In before-quit, POST /shutdown (8s AbortSignal.timeout) so the daemon
saves sessions gracefully before the SIGTERM kill. Adds a re-entrancy
guard (quitting flag) so a concurrent app.quit() cannot double-preventDefault.
Falls back to killDaemon on fetch failure or timeout: quit is never blocked.
Keeps the process.on('exit') SIGTERM fallback intact.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(storage): guard session_worktrees.state against empty-string CHECK violation; add ponytail comments
The save path (saveAndTeardownOne) never sets domain.SessionWorktreeRecord.State,
so it arrives at UpsertSessionWorktree as "". The generated upsert includes state
in the INSERT column list, so the DB default ('active') is never applied and the
CHECK constraint (state IN ('active', ...)) would fire at the first real shutdown.
Fix: default to 'active' in the store adapter when row.State is "". No schema
change, no migration, no gen edit.
Also add ponytail: comments on the State field (domain type), the write path, and
the read path, documenting that state is unused multi-repo scaffolding and that the
upgrade path is to wire a real value when multi-repo worktree lifecycle states ship.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(storage): add real-SQLite test for empty-State guard in UpsertSessionWorktree
Adds TestUpsertSessionWorktreeEmptyStateDefaultsToActive to the store
test file. It inserts a SessionWorktreeRecord with State at zero value
"" via UpsertSessionWorktree against a real SQLite DB, then reads the
row back and asserts State == "active". This directly exercises the
guard added in the prior commit and would fail if the guard were
removed (the CHECK constraint rejects ""). Mirrors the helpers and
setup pattern of TestSessionWorktreesRoundTrip exactly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(comments): correct shutdown-mechanism and task-ref inaccuracies
Fix 1: daemon.go comment near SaveAndTeardownAll now correctly states
that POST /shutdown closes the shutdownRequested channel (not cancel ctx).
Also tighten the RestoreAll comment to remove the inaccurate claim.
Fix 2: remove "Task 2's" phrasing from ForceDestroy ponytail comment in
workspace.go; condition still references StashUncommitted by name.
Fix 3: add note in main.ts that the 8s fetch timeout is shorter than the
daemon's 30s save bound, so a SIGTERM after fetch abort does not cut the
in-flight save short.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore: remove .superpowers workflow scratch from repo
These SDD workflow artifacts (task briefs, agent reports, progress ledger,
review packages) were committed by accident in prior work, against the
.superpowers/sdd/.gitignore intent. Remove them from the repo; they remain
local-only scratch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(spec): graceful restore + post-failure orchestrator recreate
Fix the opaque 500 when restoring an un-resumable session (typed 409
SESSION_NOT_RESUMABLE), and add a post-failure popup that offers to recreate a
fresh orchestrator on the same branch (cleaning the worktree, preserving
committed history). Orchestrators only; recreate fires only after a restore
attempt confirms the session cannot be resumed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(plan): restore-recreate orchestrator; reuse existing /orchestrators clean=true
Planning discovery: the recreate capability already ships via POST /orchestrators
(clean=true), which kills the dead orchestrator and re-spawns on the canonical
branch (addWorktree reattaches an existing branch). So the feature collapses to a
typed-error fix plus a frontend popup. Spec updated to match.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(session): return typed SESSION_NOT_RESUMABLE instead of 500 on un-resumable restore
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(renderer): offer recreate-orchestrator popup when a session cannot be restored
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(spec): drop stale OpenAPI-regen note (feature adds no route)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(ci): gofmt/goimports, golangci-lint hygiene, and Windows-aware doctor tests
Formatting: ran gofmt and goimports (with local-prefixes) on the 8 listed
files plus ptyexec/spawn_unix.go which the linter also flagged.
Lint (25 issues fixed):
- gosec G115: EncodeMessage now returns ([]byte, error) with an explicit
bounds check before the int->uint32 conversion; all callers updated.
- govet nilness: removed dead `if lastErr == nil` branch in clientIsAlive;
lastErr is provably non-nil at that point (real bug).
- nilerr: extracted runAcceptLoop helper so Accept-error-on-close is not
flagged; listener close is normal shutdown, not a caller error.
- staticcheck SA4010: removed dead `full = append(...)` loop in host_test.
- revive var-declaration: `var prev int = -1` -> `prev := -1`.
- revive redefines-builtin-id: deleted local `min` helper; builtin covers it.
- unparam (2): dropped always-nil env return from attachCommand; dropped
unused shellPath param from buildLaunchCommand; updated callers.
- errcheck (8): deferred Close/Remove calls wrapped in func(){_ = ...}();
type assertion in host_main.go uses ok-form; fmt.Fprintf to stdout uses
_, _ = pattern; workspace.go tmpIdx.Close() uses _ =.
- gocritic nestingReduce: inverted if+continue in runtime.go resolve loop.
Windows E2E: skip TestDoctorChecksTmuxVersion,
TestDoctorChecksTmuxVersionFailsOnError, TestDoctorWarnsWhenTmuxMissing on
windows (ao doctor emits a conpty check there, not tmux).
Verified: gofmt -l . clean, golangci-lint 0 issues, go build ok,
go test -race 1624/1624 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(ci): set git identity in worktree clone fixture; loosen tmux reattach timeouts
The preserve round-trip/conflict tests commit inside a worktree of the cloned
repo, which had no git identity; CI runners cannot auto-derive one, failing with
"empty ident name". Set user.email/user.name on the clone in setupOriginClone so
its worktrees inherit it.
The tmux reattach test drives a real shell and parses stty output, which is slow
under -race on CI; raise its echo-write and SIZE-output waits.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(terminal): resend size probe on tmux reattach until the shell answers
Bumping timeouts was the wrong fix: a 30s wait still failed, so the probe output
deterministically never appeared, not slowness. onOpen signals the stream accepts
input, not that the reattached sh -i is at a prompt, so the first echo keystroke
can be dropped. Resend the probe each poll until SIZE output lands, and on timeout
dump the captured pane buffer so a remaining failure is self-explaining.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(terminal): set TERM for real-tmux attach tests so they run in CI
Root cause (from the buffer dump the prior commit added): with TERM unset on CI
runners, tmux refuses to attach a client and prints "open terminal failed:
terminal does not support clear", so the pane never runs the size probe. The
daemon defaults TERM in production; the tests bypass it. Set TERM=xterm-256color
in both real-tmux tests. Reproduced locally with `env -u TERM` (fails the same
way) and verified the fix passes under it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(spec): crash-proof session reconcile design
Boot-time reconcile makes live tmux + worktree state match the DB on every
daemon start, so a SIGKILL/crash/force-quit that skips SaveAndTeardownAll no
longer leaks an orphaned daemon, tmux sessions, or worktrees. Adopt
crash-surviving tmux sessions, preserve-and-terminate dead ones, reap
in-namespace orphans, and add a frontend kill+replace branch for a wedged
orphan daemon.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(spec): simplify reconcile to per-session IsAlive, drop ListSessions
Every leak in the incident maps to a DB row, so orphan-reap is a per-session
IsAlive+Destroy over terminated rows; no runtime enumeration, no ports/conpty/
runtimeselect changes. Reaping a tmux session with no DB row is deferred (YAGNI).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(plan): crash-proof session reconcile implementation plan
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(session): reconcile live pass (adopt alive, stash+terminate dead)
* feat(session): reconcile reap pass and Reconcile entry point
* feat(daemon): run Reconcile on boot in place of bare RestoreAll
* test(integration): reconcile terminates dead-live sessions and reaps leaked tmux
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(integration): correct misleading CreateSession comment in reconcile test
* feat(frontend): kill+replace a wedged orphan daemon on launch
When both inspectExistingDaemon and resolveDaemonFromPort return null but
a process still holds the daemon port (a crashed/orphaned daemon), spawning
a new Go child would collide on the port and exit 1. Detect this case, SIGTERM
the holder (via the run-file PID, falling back to the probe PID), poll until the
port is free (up to 8s), clear the stale run-file, then proceed to spawn fresh.
The healthy-daemon reuse path is unchanged.
Pure helper: src/shared/daemon-takeover.ts (planDaemonTakeover)
Unit tests: src/shared/daemon-takeover.test.ts (3 tests, TDD red-green)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(frontend): fire orphan-daemon takeover when a holder actually exists
Replace planDaemonTakeover (inverted logic: ran kill block only when probe
was null) with shouldReplacePortHolder(probe, holderPidAlive) which returns
true when a real holder exists: non-null probe (rejected responder) OR a
run-file PID that is still alive (hung holder). Update main.ts call site to
compute PID liveness before gating the kill block. Update tests to cover all
three distinct outcomes non-vacuously.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs+test: accurate takeover comments, reconcileLive probe-error test, Reconcile doc
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(session): restore promptless orchestrators and crash-orphaned sessions
The orchestrator was abandoned on every app open: a fresh orchestrator
spawned each launch and the prior conversation appeared lost (it was not;
the transcript stays in ~/.claude, resumable by the deterministic
--session-id AO pins). Two defects combined:
1. Restore's guard rejected any session with no agentSessionId AND no
prompt as ErrNotResumable. But Claude resumes via a deterministic
session id regardless of those fields, so promptless orchestrators
were perfectly resumable yet always rejected. Workers slipped through
only because they carry a prompt. Move the resumability decision to the
adapter: restoreArgv returns ErrNotResumable only when GetRestoreCommand
reports it cannot resume AND there is no prompt to fresh-launch from.
2. reconcileLive marked a crash-orphaned (dead-runtime) session terminated
without a restore marker, so RestoreAll skipped it and it stayed dead.
It now saves-and-tears-down to the same end state a graceful shutdown
produces (capture work, write the session_worktrees marker, terminate,
remove the worktree), so RestoreAll relaunches it on the same boot,
resuming history. Crash recovery now matches graceful restart. If work
capture fails it terminates without a marker rather than risk losing
un-preserved work.
Tests: promptless orchestrator restores via adapter resume; promptless
session with a non-resuming adapter still returns ErrNotResumable;
reconcileLive writes the marker + tears down the worktree. Full backend
suite green (1632), gofmt/vet clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 9ae05735d6f06ac989857534bae2766392772c71)
…tWrapper#409) PR AgentWrapper#404 migrated the runtime adapter from Zellij to tmux (Darwin/Linux) plus conpty (Windows), selected via runtimeselect, but ~30 stale zellij references lingered in comments and docs describing zellij as the current runtime. This is a comments/docs-only cleanup with no behavioral change: comments now say tmux (or tmux/conpty when both platforms are relevant), terminal/doc.go and docs/backend-code-structure.md are rewritten to reflect the tmux + conpty + runtimeselect attach model, and the daemon environment, STATUS, stack, architecture, and CLI docs are updated. Also gitignore the local .codegraph/ and .cursor/ tooling dirs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Ports ReverbCode PR #404 (
9ae0573) into agent-orchestrator: migrate the terminal runtime from Zellij to tmux (Darwin/Linux) and ConPTY (Windows), plus the session save/restore lifecycle and crash-proof reconcile work that shipped with it.This was a clean cherry-pick (
-x) ontomainwith zero conflicts; the parent commit (e970f72d, PR #403) is already in our history and the pre-state (thezellij/adapter) matched exactly.Highlights
runtime/tmux) implementingports.Runtimevia the tmux CLI; drop-in replacement for zellij on Darwin/Linux.runtime/conpty): pure-Go named-pipe framing protocol + output ring buffer, pty-host serve engine over loopback TCP, sideband pty registry, and aruntime.Runtimeadapter. New hiddenao pty-hostsubcommand.runtimeselect: picks tmux on non-Windows, conpty on Windows.zellij/deleted entirely.ptyexecpackage (PTY spawn moved out ofterminal).ForceDestroy,StashUncommitted/ApplyPreserved(preserve uncommitted work torefs/ao/preserved/<id>),SaveAndTeardownAll/RestoreAll, wired into daemon boot/shutdown and the frontendPOST /shutdownon quit.TERMfor Finder-launched attach, typedSESSION_NOT_RESUMABLE(no more opaque 500), local-keychain macOS signing inforge.config.ts.Verification
go build ./...clean on darwin, linux, and windows (cross-compiled).go vet ./...clean.go test -race ./...green across the full backend suite.Frontend changes are a verbatim carry of the merged commit (including regenerated
pnpm-lock.yamlandrouteTree.gen.ts); not separately rebuilt.Upstream: aoagents/ReverbCode@9ae0573
🤖 Generated with Claude Code