Skip to content

Comments

feat: implement state persistence#177

Open
35C4n0r wants to merge 30 commits intomainfrom
35C4n0r/agentapi-state-persistence
Open

feat: implement state persistence#177
35C4n0r wants to merge 30 commits intomainfrom
35C4n0r/agentapi-state-persistence

Conversation

@35C4n0r
Copy link
Collaborator

@35C4n0r 35C4n0r commented Jan 31, 2026

Closes: coder/internal#1256

MergeAfter: #172

@35C4n0r 35C4n0r self-assigned this Feb 1, 2026
@35C4n0r 35C4n0r marked this pull request as ready for review February 1, 2026 15:52
@35C4n0r 35C4n0r marked this pull request as draft February 1, 2026 15:52
@35C4n0r 35C4n0r changed the base branch from main to cj/refactor-conversation-orig February 3, 2026 08:51
@35C4n0r 35C4n0r marked this pull request as ready for review February 3, 2026 08:54
@35C4n0r 35C4n0r marked this pull request as draft February 3, 2026 08:54
@35C4n0r 35C4n0r closed this Feb 3, 2026
@35C4n0r 35C4n0r reopened this Feb 3, 2026
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

✅ Preview binaries are ready!

To test with modules: agentapi_version = "agentapi_177" or download from: https://github.com/coder/agentapi/releases/tag/agentapi_177

@mafredri mafredri self-requested a review February 3, 2026 13:09
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far! I left a few comments and suggestions. Mainly about moving some concerns from httpapi to cmd/server. I think it would be helpful to have some tests based on real agent session restoration output to evaluate the screentracker changes.

func (s *Server) HandleSignals(ctx context.Context, process *termexec.Process) {
// Handle shutdown signals (SIGTERM, SIGINT only on Windows)
shutdownCh := make(chan os.Signal, 1)
signal.Notify(shutdownCh, os.Interrupt, syscall.SIGTERM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this compile on Windows? IIRC we can only support os.Interrupt there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiles - yes (since PR Preview Build / Build Release Binaries (pull_request) passes), but haven't tested it.

@mafredri mafredri requested a review from SasSwart February 10, 2026 13:38
# Conflicts:
#	cmd/server/server.go
#	lib/httpapi/server.go
#	lib/screentracker/conversation.go
#	lib/screentracker/pty_conversation.go
#	lib/screentracker/pty_conversation_test.go
# Conflicts:
#	lib/httpapi/server.go
#	lib/screentracker/pty_conversation.go
@35C4n0r 35C4n0r marked this pull request as ready for review February 17, 2026 09:43
@35C4n0r 35C4n0r requested a review from mafredri February 17, 2026 10:48
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage seems good, but I would really like to see real testdata created for e.g. Claude where all output from the initial conversation, and then again for the restoration part. And testing that AgentAPI does in fact handle this correctly.

I'm not sure how feasible it is, but ideally it'd be nice to capture everything, including control characters, that Claude outputs. Think asciinema recording.

Also, I think we need to really consider everything that can affect the AI agent output. For instance, given the nature of adjustScreenAfterStateLoad, what if --term-height and --term-width have been adjusted between invocations? If the state is being restored, we should probably do so much earlier (in runServer) and print warnings that these options are being overridden by the state restoration and forcing the previous options. Wdyt?

// Store the first stable snapshot for filtering later
snapshots := c.snapshotBuffer.GetAll()
if len(snapshots) > 0 {
c.firstStableSnapshot = c.cfg.FormatMessage(strings.TrimSpace(snapshots[len(snapshots)-1].screen), "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we need a FormatMessage nil check here?

Also, what if FormatMessage function has changed between session save and restore (different AgentAPI versions). Potential issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what if FormatMessage function has changed between session save and restore (different AgentAPI versions). Potential issue?

This doesn't pose an issue, c.firstStableSnapshot is not stored in the state, it is used to strip the conversation history loaded by the agent in case of --continue/--resume so that we can load our saved state messages.

@mafredri mafredri requested a review from johnstcn February 18, 2026 09:45
@35C4n0r
Copy link
Collaborator Author

35C4n0r commented Feb 19, 2026

Also, I think we need to really consider everything that can affect the AI agent output.

Agreed

For instance, given the nature of adjustScreenAfterStateLoad, what if --term-height and --term-width have been adjusted between invocations? If the state is being restored, we should probably do so much earlier (in runServer) and print warnings that these options are being overridden by the state restoration and forcing the previous options. Wdyt?

imo, this is not an issue; nothing in adjustScreenAfterStateLoad will be affected by height/width. Its only role rn is to strip the conversation history loaded by the agent when using --continue/--resume.

@mafredri
Copy link
Member

imo, this is not an issue; nothing in adjustScreenAfterStateLoad will be affected by height/width. Its only role rn is to strip the conversation history loaded by the agent when using --continue/--resume.

So we don't need to think about how a terminal size affects line-splitting/ASCII formatting?


if c.initialPromptReady && !c.loadStateSuccessful && c.cfg.StatePersistenceConfig.LoadState {
_ = c.loadState()
c.loadStateSuccessful = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outright failure will cause the session to be essentially "bricked" until you fix, rename, or move the state file. We should definitely surface the error, but complete failure is not ideal.

@35C4n0r
Copy link
Collaborator Author

35C4n0r commented Feb 19, 2026

@johnstcn @mafredri Now the user will see an error like this:
image
Agentapi will work as usual; for the previous corrupted state, everything will become AgentMessage (i.e. previous states would lose the differentiation b/w agent and user message).

Ref:

@35C4n0r 35C4n0r requested review from johnstcn and mafredri February 20, 2026 07:44
Copy link
Member

Deep Code Review

Overall risk: Medium. The state file format decisions are one-way doors, and there are a few logic bugs that could produce incorrect behavior in edge cases.

The architecture is sound and the reviewer feedback from mafredri and johnstcn has been largely addressed (PID file moved to cmd/server, signal handlers split per-platform, messagesLocked extracted, file permissions tightened, json.NewDecoder adopted). Several issues remain.


P2-1: ConversationMessage lacks JSON struct tags — fragile state file format

File: lib/screentracker/conversation.go:86-91

Without json tags, encoding/json uses PascalCase ("Id", "Message", etc.). The HTTP API's MessageUpdateBody uses lowercase (json:"id", json:"message"). This creates an inconsistency between the state file and the API.

More critically, this is a one-way door: once state files are written with PascalCase keys, adding json:"id" tags later (to align with API conventions) silently breaks deserialization of existing state files. Add explicit json tags now before any state files are created in production.


P2-2: No version validation on state load

File: lib/screentracker/pty_conversation.go:641-649

AgentState.Version is written as 1 during save but never checked during load. If the format changes in a future version, loading an incompatible state file would silently misinterpret data. Add a check:

if agentState.Version != 1 {
    return xerrors.Errorf("unsupported state file version %d (expected 1)", agentState.Version)
}

P2-3: initialPromptSent logic silently drops prompt for same-prompt restore with unsent state

File: lib/screentracker/pty_conversation.go:655-658

c.initialPromptSent = agentState.InitialPromptSent  // line 655
if len(c.cfg.InitialPrompt) > 0 {
    isDifferent := buildStringFromMessageParts(c.cfg.InitialPrompt) != agentState.InitialPrompt
    c.initialPromptSent = !isDifferent  // line 658: unconditionally overwrites line 655
}

When the same prompt is provided and the saved state has InitialPromptSent: false (saved during init before prompt delivery), line 658 forces initialPromptSent = true, silently preventing the prompt from ever being sent. The correct logic:

if isDifferent {
    c.initialPromptSent = false
} else {
    c.initialPromptSent = agentState.InitialPromptSent  // preserve saved status
}

P2-4: adjustScreenAfterStateLoad echoes user message as agent response

File: lib/screentracker/pty_conversation.go:692-693

if !c.userSentMessageAfterLoadState && len(c.messages) > 0 {
    newScreen = "\n" + c.messages[len(c.messages)-1].Message
}

This doesn't check the role of the last message. If the saved state ends with a user message (user sent a message right before shutdown, agent hadn't responded yet), the user's text becomes the agent message via updateLastAgentMessageLocked at line 302. Fix:

if !c.userSentMessageAfterLoadState && len(c.messages) > 0 &&
    c.messages[len(c.messages)-1].Role == ConversationRoleAgent {
    newScreen = "\n" + c.messages[len(c.messages)-1].Message
}

P2-5: EmitError uses untyped string for level, inconsistent with codebase enum pattern

Files: lib/screentracker/conversation.go:83, lib/httpapi/events.go:56-60

Every other categorical field in the API (ConversationStatus, ConversationRole, AgentStatus, EventType) uses typed string enums with OpenAPI schema registration. ErrorBody.Level is just string, weakening the API contract. Define type ErrorLevel string with ErrorLevelWarning / ErrorLevelError constants and a Schema() method to match the established pattern.


P3-1: Empty restored prompt can trigger sending empty content to agent

File: lib/screentracker/pty_conversation.go:659-664

When saved state has InitialPrompt: "" and no new prompt is provided, a []MessagePart{MessagePartText{Content: ""}} is created. At line 219, len(c.cfg.InitialPrompt) > 0 is true (1 element), so this empty message is enqueued and a carriage return is sent to the agent. Guard with:

if agentState.InitialPrompt != "" {
    c.cfg.InitialPrompt = []MessagePart{MessagePartText{Content: agentState.InitialPrompt}}
}

P3-2: errors slice in EventEmitter grows unboundedly

File: lib/httpapi/events.go:76,215-216,239-245

Every EmitError call appends to e.errors, which is replayed to all new SSE subscribers. Combined with the frontend's duration: Infinity toasts, reconnecting clients see all historical errors as new toast notifications. Cap the slice or don't replay stale errors to new subscribers.


P3-3: ErrorBody uses time.Now() instead of quartz clock

File: lib/httpapi/events.go:212

The codebase consistently uses quartz.Clock for testability. EmitError calls time.Now() directly. Add a quartz.Clock to EventEmitter (via WithClock option, matching the existing option pattern).


P3-4: ctx variable shadowing in NewServer disconnects conversation from caller context

File: lib/httpapi/server.go:280

ctx, cancel := context.WithCancel(context.Background()) shadows the function parameter ctx. NewPTY at line 259 receives the original ctx, but conversation.Start at line 308 receives the shadowed ctx. Rename to shutdownCtx, shutdownCancel for clarity.


P3-5: PID file has no stale-PID detection on startup

File: cmd/server/server.go:140-143,271-287

writePIDFile() silently overwrites any existing PID file without checking if the PID references a running process. After a crash (which skips defer cleanupPIDFile), scripts relying on the PID file could target the wrong process. Consider checking if the old PID is alive on startup and logging a warning.


P3-6: Restored prompt loses terminal formatting (hidden parts)

File: lib/screentracker/pty_conversation.go:576,660-664

The saved InitialPrompt is built from buildStringFromMessageParts, which calls .String() on each part. For MessagePartText with Hidden: true, .String() returns "". So the saved prompt loses hidden escape sequences (bracketed paste mode, echo prevention). On restore, the prompt is sent as plain text without terminal formatting. This only matters in the rare case where state is saved before the initial prompt was sent, but it's worth documenting as a known limitation.


P4-1: SaveState() holds lock during all filesystem I/O

File: lib/screentracker/pty_conversation.go:553-612

Acknowledged by previous reviewer as acceptable for now. The lock scope could be narrowed: copy data under lock, release, then do I/O.

P4-2: Windows syscall.SIGTERM is effectively dead code

File: cmd/server/signals_windows.go:20

Only os.Interrupt is meaningful on Windows. syscall.SIGTERM is harmless but misleading.


Questions

  1. State file format stability: Is this the right time to nail down the wire format (add JSON tags to ConversationMessage, validate version)? Or is the format expected to change frequently during development?

  2. adjustScreenAfterStateLoad approach: The strings.Replace + "return last saved message" strategy is fragile. Has real-agent testing (actual Claude/Aider output, not echo) validated this approach works reliably?

  3. Empty initial prompt edge case: Is there a valid scenario where InitialPrompt is empty in the saved state? If so, should the restore path be a no-op?


Test Coverage Observations

  • adjustScreenAfterStateLoad() has no direct unit tests. It's only exercised indirectly, and with assert.Contains rather than assert.Equal, which masks bugs in screen adjustment.
  • EmitError / EventTypeError has zero unit test coverage (no tests for broadcasting, replay to new subscribers, or accumulation).
  • The testEmitter in unit tests silently discards errors via EmitError(_ string, _ string) {}, so error emissions from loadStateLocked are never verified.
  • E2E tests use time.Sleep for synchronization (lines 142, 158, 206, 225, 272), which is fragile on slow CI. Polling for expected state would be more robust.
  • Runtime CLI validation (--load-state requires --state-file) is not tested because runServer calls os.Exit(1).

Suggested Validation Plan

go test ./...
go test -race ./lib/screentracker/... ./lib/httpapi/... ./cmd/server/...

Manual:

  1. Start with --state-file /tmp/test-state.json, send messages, send SIGUSR1, verify state file
  2. Restart with same flags, verify conversation restored
  3. Restart with different --initial-prompt, verify new prompt sent
  4. Kill with SIGKILL, verify previous save is intact
  5. Inspect state file JSON: check field casing
  6. Connect second SSE client after state load failure, verify error received

Comment on lines +239 to +244
// Stop the HTTP server
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := srv.Stop(shutdownCtx); err != nil {
logger.Error("Failed to stop HTTP server", "error", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving this into case <-gracefulCtx.Done(): above? I'm guessing we don't have to call srv.Stop if we receive on serverErrCh.

Does stop error if the server already closed? If yes, we'll end up printing a misleading error here.

EDIT: I looked at Stop and the once does guard against multiple Stops, but not against Stop producing an error if the server closed before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return early if we receive anything(non nil) on serverErrCh

	select {
	case err := <-serverErrCh:
		if err != nil {
			return xerrors.Errorf("failed to start server: %w", err)

I looked at Stop and the once does guard against multiple Stops, but not against Stop producing an error if the server closed before?

True, I had an alternative in mind, but I decided to proceed with once here for the above-mentioned reason. I'll add a check in Stop
I'll check for this ErrServerClosed, and return nil in that case.

From docs
Once Shutdown has been called on a server, it may not be reused; future calls to methods such as Serve will return ErrServerClosed.

}

// Close the process
if err := process.Close(logger, 5*time.Second); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the result of Close if the process already exited? Should we check processExitCh first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to the select-case block below.

	select {
	case err := <-processExitCh:
		if err != nil {
			return xerrors.Errorf("agent exited with error: %w", err)
		}
	default:
		// Close the process
		if err := process.Close(logger, 5*time.Second); err != nil {
			logger.Error("Failed to close process cleanly", "error", err)
		}
	}


// Enqueue initial prompt once after agent is ready (and after state is potentially loaded)
if c.initialPromptReady && len(c.cfg.InitialPrompt) > 0 && !c.initialPromptSent {
c.outboundQueue <- outboundMessage{parts: c.cfg.InitialPrompt, errCh: nil}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this play nicely with the logic around userSentMessageAfterLoadState? If possible, I'd like to see that state flag removed as it adds a bit of confusion about how the whole pipeline interacts.

The scenario I have in mind: Resume with a new "initial prompt", what happens? Will the new prompt be sent? Will it bypass userSentMessageAfterLoadStat?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resume with a new "initial prompt", what happens? Will the new prompt be sent? Will it bypass userSentMessageAfterLoadStat

  • Yes, it will be sent (given that it's not the same as the initialPrompt in the stateFile).
  • No, it won't bypass userSentMessageAfterLoadState. The userSentMessageAfterLoadState is set in sendMessage, and sendMessage is called for everything that comes in via outboundQueue channel (this is done in a go func in Start).

s.logger.Error("Failed to send event", "subscriberId", subscriberId, "error", err)
return
}
case <-s.shutdownCtx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential future enhancement, if we set the http server base context to shutdownCtx, we only need to handle the request context as it will inherit from the base context. (Needs verification if there's a benefit here though.)

@35C4n0r 35C4n0r requested a review from mafredri February 20, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentAPI: State persistence

3 participants