Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 34 additions & 34 deletions dist/main.js

Large diffs are not rendered by default.

188 changes: 188 additions & 0 deletions docs/audits/2026-03-28-session-continuity-post-ship-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
title: "Session Continuity Post-Ship Audit"
date: 2026-03-28
scope: v0.32.0+ (PR #376)
runs_sampled: 4 Fro Bot schedule runs (Mar 24-27, 2026)
---

## Session Continuity Post-Ship Audit

## Executive Summary

The session continuity feature (PR #376, v0.32.0) is **partially functioning**. Prompt restructuring (Non-Negotiable Rules, Constraint Reminder) works correctly. Session titles are set correctly at creation. But **session continuation never succeeds** — every run creates a fresh session because OpenCode auto-renames the title after the first prompt, and the title re-set after prompt was never implemented.

**Impact**: Zero session continuity across runs. The agent cannot recall its prior work on the same entity. The "Thread Identity" section is never populated in prompts.

## Runs Sampled

| Run ID | Date | Event | Session ID | Title at Creation |
| ----------- | ---------- | -------- | ------------------- | ---------------------------- |
| 23500894440 | 2026-03-24 | schedule | ses_2df4c0cafffe... | `fro-bot: schedule-c757a308` |
| 23552690701 | 2026-03-25 | schedule | ses_2da20617dffe... | `fro-bot: schedule-c757a308` |
| 23606276260 | 2026-03-26 | schedule | ses_2d4fc0340ffe... | `fro-bot: schedule-c757a308` |
| 23655941186 | 2026-03-27 | schedule | ses_2cfed642fffe... | `fro-bot: schedule-c757a308` |

All 4 runs: different session IDs despite identical logical key `schedule-c757a308`.

## Findings

### 1. CRITICAL: Session Title Auto-Overwrite (Root Cause)

**Severity**: Critical — blocks all session continuity

OpenCode auto-renames session titles based on the first message content. The title `fro-bot: schedule-c757a308` set at creation is overwritten (likely to something like "Daily maintenance report") before the cache is saved. The next run's resolver calls `findSessionByTitle()` looking for `fro-bot: schedule-c757a308` and finds no match.

**Evidence**:

- All 4 runs log: `Session continuity: no existing session found`
- All 4 runs log: `continueSessionId: null`
- All 4 runs: `Created new OpenCode session` with the correct title
- Cache IS restored from previous run (confirmed: `Cache hit for restore-key: opencode-storage-github-fro-bot-agent-main-Linux-23606276260`)
- 10 sessions exist at resolver time, none match the title

**Root cause**: The plan specified "After successful prompt, **re-set title** via `session.update()` to guard against OpenCode's auto-title behavior" — but this was **never implemented** in `execution.ts`. There is no `session.update()` call anywhere in the codebase.

**Fix**: Add `session.update()` call after each prompt to restore the logical key title. The OpenCode PATCH endpoint (`/session/{id}`) accepts `{ title }` — confirmed in `routes/session.ts:265`.

### 2. WORKING: Prompt Restructuring (Instruction Sandwich)

**Severity**: N/A — functioning correctly

The prompt shows:

- **Position 1**: `## Critical Rules (NON-NEGOTIABLE)` — 5-line hard constraints ✅
- **Last position**: `## Reminder: Critical Rules` — 1-line recency reminder ✅
- Task and Trigger Comment positioned near the top ✅
- Agent Context demoted below task content ✅

### 3. WORKING: Logical Key Computation

**Severity**: N/A — functioning correctly

The logical key `schedule-c757a308` is:

- Deterministic (same hash across all 4 runs) ✅
- Correctly derived from the schedule event ✅
- Used for session title and search fallback ✅

### 4. WORKING: Observability (Artifact Upload)

**Severity**: N/A — functioning correctly

All 4 Fro Bot schedule runs produced downloadable `opencode-logs` artifacts containing:

- OpenCode server log (4,000-8,700 lines each)
- Prompt artifact file (197 lines each)
- Unique artifact names (`opencode-logs-{runId}-1`) ✅

### 5. ISSUE: Thread Identity Section Missing from Prompts

**Severity**: High — dependent on session continuity

The prompt shows NO "Thread Identity" section. This is expected given continuity never succeeds — the section is only populated when `logicalKey` and `isContinuation` are provided to the prompt builder. When `continueSessionId` is null, no thread identity is injected.

**Fix**: Will resolve automatically once session continuity works. However, consider showing thread identity even on fresh sessions (e.g., "Fresh conversation — no prior thread found for schedule-c757a308") to aid debugging.

### 6. ISSUE: Task Content Duplicated in Prompt

**Severity**: Low — token waste

The schedule task description appears verbatim in both:

- `## Task` section (lines 8-37)
- `## Trigger Comment` section (lines 38-70)

This is ~30 lines / ~400 tokens of pure duplication. For schedule events, the Task and Trigger Comment are identical because the `prompt` input IS the task.

**Fix**: In `buildAgentPrompt()`, skip the Trigger Comment section when its content is identical to the Task section.

### 7. NOISE: `tool.registry ... invalid` Entries

**Severity**: Low — cosmetic noise

Every run logs 14-24 instances of:

```

Check warning on line 105 in docs/audits/2026-03-28-session-continuity-post-ship-audit.md

View workflow job for this annotation

GitHub Actions / Lint

Missing code block language
INFO service=tool.registry status=started invalid
INFO service=tool.registry status=completed duration=N invalid
```

The `invalid` here is NOT an error — it's the tool name. OpenCode's tool registry initializes a tool called `invalid` (likely a placeholder/sentinel). This is benign startup chatter from OpenCode internals.

**Recommendation**: No action needed. Document as expected noise.

### 8. NOISE: `Blocked 3 postinstalls`

**Severity**: Low — cosmetic noise

Every run logs exactly once:

```

Check warning on line 120 in docs/audits/2026-03-28-session-continuity-post-ship-audit.md

View workflow job for this annotation

GitHub Actions / Lint

Missing code block language
Blocked 3 postinstalls. Run `bun pm untrusted` for details.
```

This is Bun's security policy for oMo's dependencies. Expected behavior.

**Recommendation**: No action needed. Document as expected noise.

### 9. OBSERVATION: Prior Session Context Shows Stale Sessions

**Severity**: Medium — reduced context quality

The "Prior Session Context" table shows sessions from March 3-22 (pre-v0.32.0), none of which are schedule-related maintenance sessions. The recent schedule sessions (Mar 24-26) don't appear because they likely got renamed by OpenCode and are no longer matching any useful search query.

**Fix**: Will improve automatically once session continuity works. The "Current Thread Context" section will show the actual prior work from the same schedule thread.

## Prioritized Action Items

### P0: Fix session title persistence (blocks all continuity)

Add `session.update()` call in `execution.ts` after successful prompt to re-set the logical key title. Without this, no session continuity can ever work.

```typescript
// After successful prompt, re-set title to guard against auto-rename
if (sessionTitle != null) {
try {
await client.session.update({
path: {id: sessionId},
body: {title: sessionTitle} as Record<string, unknown>,
})
} catch {
logger.warning("Failed to re-set session title", {sessionId, sessionTitle})
}
}
```

### P1: Show Thread Identity on fresh sessions too

Currently Thread Identity only shows when `isContinuation` is true. Show it on fresh starts too so the logical key is visible in the prompt artifact for debugging:

```

Check warning on line 160 in docs/audits/2026-03-28-session-continuity-post-ship-audit.md

View workflow job for this annotation

GitHub Actions / Lint

Missing code block language
## Thread Identity
**Logical Thread**: `schedule-c757a308` (schedule)
**Status**: Fresh conversation — no prior thread found for this entity.
```

### P2: Deduplicate Task / Trigger Comment for schedule events

Skip the Trigger Comment section when it's identical to the Task section. Saves ~400 tokens per schedule run.

### P3: Add debug logging for session title state

Log the titles of sessions returned by `listSessionsForProject()` during resolution so we can see what titles the sessions actually have (vs. what we're searching for).

## Metrics

| Metric | Value |
| --------------------------------------- | ------------------------------------------------------------ |
| Runs sampled | 4 |
| Session continuity success rate | **0%** (0/4 runs) |
| Prompt restructuring working | **Yes** (Non-Negotiable Rules + Constraint Reminder present) |
| Artifact upload working | **Yes** (all 4 runs have artifacts) |
| Logical key computation working | **Yes** (deterministic `schedule-c757a308` across all runs) |
| OpenCode errors in logs | **0** |
| OpenCode warnings in logs | **0** |
| tool.registry "invalid" entries per run | 14-24 (benign noise) |
| Avg run duration | ~4-10 minutes |
| Avg OpenCode log size | 4,000-8,700 lines |
| Prompt size | 197 lines (~2,500 words) |
23 changes: 9 additions & 14 deletions src/features/agent/execution.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import * as crypto from 'node:crypto'
import * as fs from 'node:fs/promises'
import * as path from 'node:path'
import {createOpencode} from '@opencode-ai/sdk'
import {reassertSessionTitle} from '../../services/session/title-reassert.js'
import {sleep} from '../../shared/async.js'
import {DEFAULT_AGENT, DEFAULT_TIMEOUT_MS} from '../../shared/constants.js'
import {getGitHubWorkspace, getOpenCodeLogPath, isOpenCodePromptArtifactEnabled} from '../../shared/env.js'
Expand Down Expand Up @@ -116,23 +117,17 @@ export async function executeOpenCode(

const prompt = attempt === 1 ? initialPrompt : CONTINUATION_PROMPT
const files = attempt === 1 ? promptOptions.fileParts : undefined
const result = await sendPromptToSession(client, sessionId, prompt, files, directory, config, logger)
const result = await (async () => {
try {
return await sendPromptToSession(client, sessionId, prompt, files, directory, config, logger)
} finally {
await reassertSessionTitle(client, sessionId, config?.sessionTitle, logger)
}
})()

if (result.success) {
final = result.eventStreamResult

// Best-effort title re-assertion: OpenCode may auto-overwrite session titles
// based on first message content. Re-set to preserve deterministic lookup.
if (config?.sessionTitle != null) {
try {
await (client.session as unknown as {update: (args: Record<string, unknown>) => Promise<unknown>}).update({
sessionID: sessionId,
title: config.sessionTitle,
})
} catch {
logger.debug('Best-effort session title re-assertion failed', {sessionId})
}
}

return {
success: true,
exitCode: 0,
Expand Down
65 changes: 65 additions & 0 deletions src/features/agent/opencode.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ function createMockClient(options: {
create: options.throwOnCreate
? vi.fn().mockRejectedValue(new Error('Session creation failed'))
: vi.fn().mockResolvedValue({data: {id: 'ses_123', title: 'Test', version: '1'}}),
update: vi.fn().mockResolvedValue({data: {id: 'ses_123', title: 'Test', version: '1'}}),
promptAsync: options.throwOnPrompt
? vi.fn().mockRejectedValue(new Error('Prompt failed'))
: vi.fn().mockResolvedValue({data: options.promptResponse}),
Expand Down Expand Up @@ -444,6 +445,70 @@ describe('executeOpenCode', () => {
expect(result.duration).toBeGreaterThanOrEqual(0)
})

it('re-asserts session title with SDK update payload after prompt attempts', async () => {
// #given
const mockClient = createMockClient({
promptResponse: {parts: [{type: 'text', text: 'Agent response'}]},
})
const mockOpencode = createMockOpencode({client: mockClient})
vi.mocked(createOpencode).mockResolvedValue(mockOpencode as unknown as Awaited<ReturnType<typeof createOpencode>>)
const config: ExecutionConfig = {
agent: 'sisyphus',
model: null,
timeoutMs: 1800000,
omoProviders: {
claude: 'no',
copilot: 'no',
gemini: 'no',
openai: 'no',
opencodeZen: 'no',
zaiCodingPlan: 'no',
kimiForCoding: 'no',
},
sessionTitle: 'fro-bot: schedule-c757a308',
}

// #when
await executeOpenCode(createMockPromptOptions(), mockLogger, config)

// #then
expect(mockClient.session.update).toHaveBeenCalledWith({
path: {id: 'ses_123'},
body: {title: 'fro-bot: schedule-c757a308'},
})
})

it('re-asserts session title even when prompt attempt fails', async () => {
// #given
const mockClient = createMockClient({throwOnPrompt: true})
const mockOpencode = createMockOpencode({client: mockClient})
vi.mocked(createOpencode).mockResolvedValue(mockOpencode as unknown as Awaited<ReturnType<typeof createOpencode>>)
const config: ExecutionConfig = {
agent: 'sisyphus',
model: null,
timeoutMs: 1800000,
omoProviders: {
claude: 'no',
copilot: 'no',
gemini: 'no',
openai: 'no',
opencodeZen: 'no',
zaiCodingPlan: 'no',
kimiForCoding: 'no',
},
sessionTitle: 'fro-bot: schedule-c757a308',
}

// #when
await executeOpenCode(createMockPromptOptions(), mockLogger, config)

// #then
expect(mockClient.session.update).toHaveBeenCalledWith({
path: {id: 'ses_123'},
body: {title: 'fro-bot: schedule-c757a308'},
})
})

it('returns failure result when prompt fails', async () => {
// #given
const mockClient = createMockClient({throwOnPrompt: true})
Expand Down
50 changes: 50 additions & 0 deletions src/features/agent/prompt.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,25 @@ describe('buildAgentPrompt', () => {
expect(prompt).toContain('**Status**: Continuing previous conversation thread.')
})

it('shows fresh thread identity status when logical key exists without continuation', () => {
// #given
const options: PromptOptions = {
context: createMockContext(),
customPrompt: null,
cacheStatus: 'hit',
logicalKey: createMockLogicalKey(),
isContinuation: false,
}

// #when
const prompt = buildAgentPrompt(options, mockLogger)

// #then
expect(prompt).toContain('## Thread Identity')
expect(prompt).toContain('**Logical Thread**: `pr-42` (pr #42)')
expect(prompt).toContain('**Status**: Fresh conversation — no prior thread found for this entity.')
})

it('places current thread context above environment and historical context for continuation runs', () => {
// #given
const sessionContext: SessionContext = {
Expand Down Expand Up @@ -570,6 +589,37 @@ describe('buildAgentPrompt', () => {
expect(prompt).toContain('## GitHub Operations (Use gh CLI)')
})

it('skips Trigger Comment section when schedule prompt text matches trigger comment', () => {
// #given
const duplicatedTask = 'Run daily maintenance tasks and update the report issue.'
const context = createMockContext({
eventName: 'schedule',
issueNumber: null,
issueTitle: null,
issueType: null,
commentBody: duplicatedTask,
})
const triggerContext = createMockTriggerContext({
eventType: 'schedule',
commentBody: duplicatedTask,
target: undefined,
})
const options: PromptOptions = {
context,
customPrompt: ` ${duplicatedTask} `,
cacheStatus: 'hit',
triggerContext,
}

// #when
const prompt = buildAgentPrompt(options, mockLogger)

// #then
expect(prompt).toContain('## Task')
expect(prompt).toContain(duplicatedTask)
expect(prompt).not.toContain('## Trigger Comment')
})

describe('session context', () => {
it('includes session context section when sessionContext is provided', () => {
// #given
Expand Down
11 changes: 10 additions & 1 deletion src/features/agent/prompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,16 @@ Respond to the trigger comment above. Follow all instructions and requirements l
`)
}

if (context.commentBody != null) {
const trimmedCustomPrompt = customPrompt?.trim() ?? null
const trimmedCommentBody = context.commentBody?.trim() ?? null
const triggerCommentDuplicatesTask =
trimmedCustomPrompt != null &&
trimmedCustomPrompt.length > 0 &&
trimmedCommentBody != null &&
trimmedCommentBody.length > 0 &&
trimmedCustomPrompt === trimmedCommentBody

if (context.commentBody != null && !triggerCommentDuplicatesTask) {
parts.push(`## Trigger Comment
**Author:** ${context.commentAuthor ?? 'unknown'}

Expand Down
3 changes: 3 additions & 0 deletions src/harness/phases/execute.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ export async function runExecute(
recentSessions: sessionPrep.recentSessions,
priorWorkContext: sessionPrep.priorWorkContext,
},
logicalKey: sessionPrep.logicalKey ?? null,
isContinuation: sessionPrep.isContinuation,
currentThreadSessionId: sessionPrep.continueSessionId ?? null,
triggerContext: routing.triggerResult.context,
fileParts: sessionPrep.attachmentResult?.fileParts,
}
Expand Down
Loading
Loading