diff --git a/CHANGELOG.md b/CHANGELOG.md index 88e3db5c..9f68194f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,94 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Enhanced + +- **Dashboard segment-level progress indicators (TP-197, #464):** Multi-segment + task rows now show a horizontal pill row of per-segment status badges — + one pill per segment with a status icon (✅ succeeded · ⏳ running · ⬚ + pending · ❌ failed · ⏸ stalled · ↷ skipped) plus the segment’s repo + ID. The currently-executing segment is visually emphasized. This closes + the operator-visibility gap introduced by TP-145’s `.DONE` suppression for + non-final segments: previously, multi-segment lanes sat “running” with + no segment-level signal during the suppression window, which made wave 2+ + batches where all tasks were mid-segment appear stuck. With the pill row + in place, operators can see at a glance which segments have finished, + which is running, and which remain. The progress bar itself is unchanged + — TP-174 already made it segment-scoped via the V2 lane snapshot’s + per-segment counts; the new pill row provides the missing context that + makes the existing bar legible as “current segment’s progress.” + + Backwards-compatibility: single-segment tasks render an empty pill row + (auto-collapsed grid sub-row), so the DOM and visual layout for + non-segmented batches are identical to before. The pill row lives in a + new grid row 3 of `.task-row` (cols 3–7), mirroring the + `task-title-subtitle` pattern from TP-485, and is intentionally placed + *outside* the `.task-step` cell so the existing `@media (max-width: 900px)` + rule that hides `.task-step` does not hide segment context on narrow + viewports. No `dashboard/server.cjs` change was required — the existing + API response already exposed `batch.segments[]`, `task.segmentIds`, and + `runtimeLaneSnapshots[*].segmentId`. + +### Fixed + +- **Multi-segment engine hardening (TP-196, #462 + #502 + #503 + #508):** + closes four follow-up issues from the multi-repo task execution rollout + with a single coherent hardening pass against the multi-segment engine. + + - **`.DONE` authority guards (#462)** — three defense-in-depth checks now + refuse to honor a stale or premature `.DONE` in multi-segment tasks: + (a) `resolveTaskMonitorState` (`execution.ts`) accepts an optional + `multiSegmentContext: { isFinalSegment, segmentId }` parameter; when + `isFinalSegment === false` and `.DONE` is present, Priority 1 is + skipped and a WARN is logged via `execLog`; `monitorLanes` populates + this context from `task.segmentIds` + `task.activeSegmentId`. (b) + `collectDoneTaskIdsForResume` (`resume.ts`) now refuses to add a + taskId to the done set when persisted segment records exist AND any + segment is not `succeeded`/`skipped` — the task re-reconciles instead + of silently being marked complete. (c) A new exported + `checkDoneAuthoritySafeguard` helper (`discovery.ts`) emits a + doctor-style `console.warn` when `.DONE` coexists with unchecked + STATUS.md checkboxes during area scans. The pre-existing TP-135 + "keeps .DONE authoritative even when segment frontier is incomplete" + test was updated to assert the inverted (post-#462) contract. + + - **SegmentScopeMode unification (#502 + #503)** — promotes the + FULL_TASK / SEGMENT_SCOPED decision to a first-class + `SegmentScopeMode = "FULL_TASK" | "SEGMENT_SCOPED"` type in `types.ts` + plus a `computeSegmentScopeMode(stepSegmentMap, repoStepNumbers, + currentRepoId, currentStepNumber)` helper in `lane-runner.ts`. The + iteration loop now derives both the authoritative `segmentScopeMode` + and the legacy `isSegmentScoped` boolean alias from one call, and + the segment-prompt injection block is gated on `isSegmentScoped` + instead of the previous scattered `stepSegmentMap && currentRepoId + && repoStepNumbers && remainingSteps.length > 0` composite. New + behavioural regression suite + (`extensions/tests/segment-scope-mode-prompt.test.ts`, 9 tests + across 4 describe blocks) mocks `spawnAgent` to capture the worker + prompt + env + system prompt and verifies the FULL_TASK, + SEGMENT_SCOPED, polyrepo single-segment, and legacy/partial-marker + contracts end-to-end. + + - **Wasted-iteration elimination (#508)** — lane-runner now performs + an explicit pre-spawn segment-completion check between the existing + `remainingSteps.length === 0` guard and the `totalIterations++` + increment, delegating to a new pure helper + `shouldSkipSpawnForCompleteSegment(statusContent, repoStepNumbers, + currentRepoId)`. When every segment-scoped step for the active repo + is already complete, the loop logs `"Pre-spawn segment-completion + check"` and breaks before incurring a worker spawn. Behavioural + test (`extensions/tests/early-exit-segment-spawn-skip.test.ts`) + mocks `agent-host.spawnAgent` via `mock.module` and asserts + `spawnAgentCallCount === 0` for a fixture worktree whose checkboxes + are pre-checked. + + - **Validation:** typecheck / lint / format:check all exit 0. Fast + test suite passes at 3678 / 0 fail / 1 skip — net +51 new tests + spread across 3 new test files plus targeted updates to + `segment-scoped-lane-runner.test.ts`, `resume-segment-frontier.test.ts`, + and `engine-runtime-v2-routing.test.ts` (slice-window widening for + the longer `resolveTaskMonitorState` body). + ## [0.30.0] - 2026-05-10 ### Fixed diff --git a/dashboard/public/app.js b/dashboard/public/app.js index 18db48c1..dd45d12d 100644 --- a/dashboard/public/app.js +++ b/dashboard/public/app.js @@ -382,6 +382,53 @@ function taskSegmentProgress(task, segmentStatusMap, forcedActiveSegmentId) { }; } +// TP-197 (#464): Render a horizontal pill row of per-segment status badges for a +// multi-segment task. Each pill shows an icon + repoId for one segment. The icon +// reflects the segment's status (succeeded / running / pending / failed / stalled / +// skipped). The current segment (the one actively executing on its lane) gets an +// emphasis class. Returns "" for single-segment tasks so the rendered DOM is +// byte-identical to today for the non-segmented common case (no regression). +// +// Consumes: +// - task.segmentIds: string[] (ordered, from PersistedTaskRecord) +// - segmentStatusMap: Map built by +// buildSegmentStatusMap() from batch.segments[] +// - activeSegmentId: string|null — current executing segment (from V2 lane +// snapshot's segmentId, or the task's activeSegmentId field) +function taskSegmentPillRow(task, segmentStatusMap, activeSegmentId) { + const segmentIds = Array.isArray(task?.segmentIds) + ? task.segmentIds.filter(id => typeof id === "string") + : []; + if (segmentIds.length <= 1) return ""; + + // Status -> { icon, className } table. Keep emoji simple/monospace-friendly. + // ✅ succeeded, ⏳ running, ⬚ pending, ❌ failed, ⏸ stalled, ↷ skipped. + const styles = { + succeeded: { icon: "\u2705", cls: "seg-succeeded" }, + running: { icon: "\u23F3", cls: "seg-running" }, + pending: { icon: "\u2B1A", cls: "seg-pending" }, + failed: { icon: "\u274C", cls: "seg-failed" }, + stalled: { icon: "\u23F8", cls: "seg-stalled" }, + skipped: { icon: "\u21B7", cls: "seg-skipped" }, + }; + + const pills = segmentIds.map((segId) => { + const status = segmentStatusMap.get(segId) || "pending"; + const style = styles[status] || styles.pending; + const parsed = parseSegmentId(segId); + const repoLabel = parsed?.repoId || segId; + const isCurrent = activeSegmentId && segId === activeSegmentId; + const currentCls = isCurrent ? " seg-pill-current" : ""; + const title = `${segId} \u00b7 ${status}`; + return `` + + `${style.icon}` + + `${escapeHtml(repoLabel)}` + + ``; + }).join(""); + + return `
${pills}
`; +} + function laneActiveSegmentInfo(v2snap, laneTasks, segmentStatusMap) { if (!v2snap || !v2snap.segmentId) return null; const parsed = parseSegmentId(v2snap.segmentId); @@ -620,7 +667,13 @@ function renderSummary(batch) { // their assigned lane: tasks on the same lane render with `→` (serial), // tasks on different lanes render with ` | ` (parallel). Tooltip shows // the expanded lane breakdown. - const { compact, tooltip } = formatWaveLaneBreakdown(taskIds, batch.lanes || [], i + 1); + // TP-197 post-merge fold: pass `batch.tasks` as the task→lane source. + // The previous arg `batch.lanes` only carries live Runtime V2 lane + // state for the *currently active* wave — past/future wave chips + // would fall back to comma-separated. `batch.tasks[].laneNumber` is + // persisted for the entire batch lifecycle, so all waves render with + // the correct parallelization separator regardless of active state. + const { compact, tooltip } = formatWaveLaneBreakdown(taskIds, batch.lanes || [], batch.tasks || [], i + 1); const titleAttr = tooltip ? ` title="${escapeHtml(tooltip)}"` : ""; wavesHtml += `W${i + 1} [${compact}]`; }); @@ -648,16 +701,46 @@ function renderSummary(batch) { * are shown with the previous flat formatting and no tooltip is generated * — this preserves backward compatibility with future-wave display. */ -function formatWaveLaneBreakdown(taskIds, lanes, waveNumber) { +function formatWaveLaneBreakdown(taskIds, lanes, tasks, waveNumber) { if (!Array.isArray(taskIds) || taskIds.length === 0) { return { compact: "", tooltip: "" }; } - // Build taskId → laneNumber map for the lanes that have any of these tasks. + // Build taskId → laneNumber map. Prefer the persisted-per-task + // `tasks[i].laneNumber` (covers all waves, lifecycle-stable). Fall back + // to live `lanes[]` only when tasks data is missing or doesn't carry + // laneNumber for a given task. + // + // TP-197 post-merge fold: the previous implementation read ONLY from + // `lanes`, which is Runtime V2 live state and only populated for the + // currently active wave. That caused inactive waves' chips to fall back + // to comma-separated display (no parallelization indicator), giving the + // impression that the separator changed as the batch progressed. Using + // the persisted `tasks[].laneNumber` makes the indicator stable across + // all waves regardless of active state. const taskToLane = new Map(); + if (Array.isArray(tasks)) { + for (const t of tasks) { + // Persistence assigns `laneNumber: 0` as a sentinel meaning + // "unallocated" (see persistence.ts:1378 — `lane?.laneNumber ?? + // outcome?.laneNumber ?? 0`). Real lane numbers start at 1. We must + // skip 0 here so future-wave tasks (which all have the 0 sentinel + // until their wave starts) don't get falsely grouped under a fake + // "lane 0" and rendered as serial. + if ( + t && + t.taskId && + typeof t.laneNumber === "number" && + t.laneNumber >= 1 && + !taskToLane.has(t.taskId) + ) { + taskToLane.set(t.taskId, t.laneNumber); + } + } + } + // Fallback: anything `tasks` didn't cover, try `lanes` (live state). for (const lane of lanes) { if (!lane || !Array.isArray(lane.taskIds)) continue; for (const tid of lane.taskIds) { - // First lane to claim a task wins (lanes shouldn't overlap, but be defensive). if (!taskToLane.has(tid)) taskToLane.set(tid, lane.laneNumber); } } @@ -859,8 +942,24 @@ function renderLanesTasks(batch, sessions) { stepHtml = `${escapeHtml(task.exitReason || "—")}`; } + // TP-197 (#464): Compute the per-segment pill row for multi-segment tasks. + // Returns "" for single-segment tasks (no DOM regression for the common case). + // For multi-segment tasks we render the pill row in the task-row's grid row 3 + // (via .task-segment-row CSS) and suppress the inline "Segment N/T: repo" text + // in detailBits to avoid duplicating signal — the pill row already shows the + // current segment (via seg-pill-current) and total count (via pill count). + const segmentPillRowHtml = taskSegmentPillRow( + task, + segmentStatusMap, + v2snap && v2snap.taskId === task.taskId ? v2snap.segmentId : (segmentInfo?.segmentId || null), + ); + const hasSegmentPillRow = segmentPillRowHtml !== ""; + const detailBits = []; - if (segmentInfo) { + if (segmentInfo && !hasSegmentPillRow) { + // Single-segment + non-segmented tasks: existing inline text (unchanged). + // Multi-segment tasks: suppressed because the new pill row carries the same + // information more legibly. detailBits.push(`${escapeHtml(segmentProgressText(segmentInfo))}`); } if (showPacketHome) { @@ -950,8 +1049,17 @@ function renderLanesTasks(batch, sessions) { const titleHtml = task.taskTitle ? `
${escapeHtml(task.taskTitle)}
` : ""; + // TP-197 (#464): segmentPillRowHtml is empty for single-segment tasks so + // the rendered DOM is byte-identical to today for non-segmented tasks. + // For multi-segment tasks it renders as grid-row 3 of .task-row. + // Sage post-merge fold: the .has-segments class opts the .task-row + // grid into a 3-row template only when we actually have a pill row; + // otherwise the default 2-row template preserves single-segment task + // spacing exactly (an unconditional 3-row template would add an 8px + // row-gap even when row 3 is empty, breaking the no-regression contract). + const taskRowClass = hasSegmentPillRow ? "task-row has-segments" : "task-row"; html += ` -
+
${eyeHtml} ${escapeHtml(task.taskId)}${showRepos ? repoBadgeHtml(tRepo, "repo-badge-task") : ""} @@ -960,6 +1068,7 @@ function renderLanesTasks(batch, sessions) { ${progressHtml} ${stepHtml}${workerHtml} ${titleHtml} + ${segmentPillRowHtml}
`; html += reviewerRowHtml; } @@ -1054,7 +1163,15 @@ function renderMergeAgents(batch, sessions) { } let html = ''; - html += ''; + // TP-197 post-merge fold: removed 'Session ID' and 'Details' columns. + // SESSION ID was hardcoded to '—' in every row — dead weight. + // DETAILS only populated for `mr.failureReason` (rare failure cases); + // for the common all-merges-succeeded case it's always '—' too. + // When a real failure happens, the operator sees status='failed' in + // the Status column and can dig into engine logs for the reason — + // we'll re-add a focused DETAILS column if/when we have meaningful + // structured failure-reason data to surface in the dashboard table. + html += ''; html += ''; // Track sessions shown in wave result rows so we don't duplicate them below @@ -1135,10 +1252,6 @@ function renderMergeAgents(batch, sessions) { html += ``; // Full telemetry cell html += ``; - html += ``; - html += ``; html += ``; // Per-repo sub-rows: show when workspace mode has repo results @@ -1159,8 +1272,6 @@ function renderMergeAgents(batch, sessions) { html += ``; html += ``; html += ``; /* telemetry placeholder */ - html += ``; /* attach placeholder */ - html += ``; html += ``; } } @@ -1177,8 +1288,6 @@ function renderMergeAgents(batch, sessions) { html += ``; // Full telemetry cell for active merge session html += ``; - html += ``; - html += ``; html += ``; } diff --git a/dashboard/public/style.css b/dashboard/public/style.css index 2b2603c1..60855c08 100644 --- a/dashboard/public/style.css +++ b/dashboard/public/style.css @@ -608,7 +608,13 @@ body { display: grid; grid-template-columns: 36px 24px 100px 90px 80px 200px 1fr; /* #485 (revised): row 1 holds the primary cells; row 2 (auto, collapses to - * 0 when empty) holds the optional task-title-subtitle spanning cols 3–6. */ + * 0 when empty) holds the optional task-title-subtitle spanning cols 3–6. + * Default: 2 rows. TP-197 (#464) adds the optional per-segment pill row in + * grid-row 3 ONLY when the .has-segments class is set (multi-segment tasks). + * Non-segmented tasks keep the pre-TP-197 2-row layout exactly — 'auto' + * row tracks don't fully collapse with row-gap declared, so adding a third + * row track unconditionally would introduce an 8px visible gap for every + * single-segment task (sage post-merge fold). */ grid-template-rows: auto auto; align-items: center; gap: 8px 8px; @@ -617,6 +623,13 @@ body { transition: background 0.15s; } +/* TP-197 (#464): multi-segment tasks opt-in to a 3-row grid for the segment- + * pill row. JS adds .has-segments to .task-row only when the pill row is + * non-empty (see app.js taskSegmentPillRow + task-row template). */ +.task-row.has-segments { + grid-template-rows: auto auto auto; +} + .task-row:last-child { border-bottom: none; } .task-row:hover { background: var(--bg-surface-hover); } @@ -652,6 +665,71 @@ body { margin-top: -2px; } +/* ─── TP-197 (#464): Per-segment pill row for multi-segment tasks ─────── */ + +.task-segment-row { + /* Sub-row beneath the primary task row, mirroring the title-subtitle pattern + * (TP-485). Spans cols 3 → 7 so it shares the title area's horizontal space. + * Placed at grid row 3 so it sits *below* the optional title-subtitle. + * Container collapses to 0 height when not rendered (single-segment tasks), + * so non-segmented tasks render with identical row height to today. + * Lives OUTSIDE .task-step intentionally so the @media (max-width: 900px) + * rule that hides .task-step does NOT hide the pill row — keeps segment + * context visible at narrow viewports. */ + grid-column: 3 / 7; + grid-row: 3; + display: flex; + flex-wrap: wrap; + align-items: center; + gap: 4px; + margin-top: 2px; + min-width: 0; +} + +.seg-pill { + display: inline-flex; + align-items: center; + gap: 3px; + font-family: var(--font-mono); + font-size: 0.68rem; + font-weight: 500; + line-height: 1.4; + padding: 1px 7px; + border-radius: 8px; + border: 1px solid transparent; + max-width: 140px; + white-space: nowrap; +} + +.seg-pill .seg-pill-icon { + font-size: 0.72rem; + line-height: 1; + flex-shrink: 0; +} + +.seg-pill .seg-pill-label { + overflow: hidden; + text-overflow: ellipsis; + min-width: 0; +} + +/* Status variants — reuse the existing status-badge color tokens for + * consistency with .status-badge.status-{succeeded,running,failed,…}. */ +.seg-pill.seg-succeeded { background: var(--badge-succeeded-bg); color: var(--green); } +.seg-pill.seg-running { background: var(--badge-running-bg); color: var(--accent); } +.seg-pill.seg-pending { background: var(--bg-surface); color: var(--text-muted); border-color: var(--border-default, var(--border-subtle)); } +.seg-pill.seg-failed { background: var(--badge-failed-bg); color: var(--red); } +.seg-pill.seg-stalled { background: var(--badge-failed-bg); color: var(--yellow, var(--red)); opacity: 0.85; } +.seg-pill.seg-skipped { background: var(--bg-surface); color: var(--text-muted); opacity: 0.7; } + +/* Current-segment emphasis: brighter border + slight weight bump so operator + * can spot "we're here right now" at a glance independent of icon. */ +.seg-pill.seg-pill-current { + border-color: var(--accent); + font-weight: 600; + box-shadow: 0 0 0 1px var(--accent-dim, transparent); +} + .task-duration { font-family: var(--font-mono); font-size: 0.8rem; @@ -783,7 +861,10 @@ body { font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; - color: var(--text-faint); + /* TP-197 post-merge fold: bumped --text-faint → --text-muted for + * readability. Matches other dashboard section headers (lines 224, 248, + * 458 of this file already use --text-muted for the same role). */ + color: var(--text-muted); border-bottom: 1px solid var(--border); background: var(--bg-surface); } diff --git a/docs/tutorials/use-the-dashboard.md b/docs/tutorials/use-the-dashboard.md index b6287372..3e448ee4 100644 --- a/docs/tutorials/use-the-dashboard.md +++ b/docs/tutorials/use-the-dashboard.md @@ -101,6 +101,12 @@ dashboard automatically shows repo-aware features: - **Repo badges** appear on lanes and tasks, showing which repo each belongs to - **Repo filter dropdown** lets you focus on a single repository - **Merge outcomes** are grouped per repo, showing individual branch/status details +- **Per-segment progress pills** appear under each multi-segment task row, + showing one pill per segment with a status icon (✅ succeeded · ⏳ running · + ⬚ pending · ❌ failed · ⏸ stalled · ↷ skipped) and the segment’s repo ID. + The currently-executing segment is highlighted so you can see which segment + the lane is working on right now. Single-segment tasks render no pill row, + so non-segmented batches look identical to before. These features activate when the batch is in workspace mode and involves 2+ distinct repositories. For single-repo batches, the dashboard looks and diff --git a/extensions/taskplane/discovery.ts b/extensions/taskplane/discovery.ts index eab1301f..87f13b48 100644 --- a/extensions/taskplane/discovery.ts +++ b/extensions/taskplane/discovery.ts @@ -809,6 +809,41 @@ export function parsePromptForOrchestrator( // ── Area Scanning ──────────────────────────────────────────────────── +/** + * TP-196 / #462 — Discovery safeguard for `.DONE` authority drift. + * + * Discovery has no access to persisted segment state, so it cannot make a + * hard `.DONE` vs. segment-frontier authority decision (that lives in the + * monitor/resume guards). What it CAN do cheaply is detect the most common + * symptom of a stale or premature `.DONE`: a `.DONE` file exists alongside + * a STATUS.md that still has unchecked checkboxes. When that pattern is + * found, emit a one-line `console.warn` so operators see the inconsistency + * during scan. Behaviour of `scanAreaForTasks` is unchanged — the task is + * still skipped — this is a doctor-style warning only. + * + * Returns `true` when the safeguard issued a warning (used by tests). + */ +export function checkDoneAuthoritySafeguard( + taskFolder: string, + logger: (msg: string) => void = console.warn, +): boolean { + const statusPath = join(taskFolder, "STATUS.md"); + if (!existsSync(statusPath)) return false; + let content: string; + try { + content = readFileSync(statusPath, "utf-8"); + } catch { + return false; + } + // Look for any unchecked checkbox `- [ ]` on its own line. + const hasUnchecked = /^\s*-\s*\[\s\]\s+/m.test(content); + if (!hasUnchecked) return false; + logger( + `[discovery] WARN: .DONE present in ${taskFolder} but STATUS.md contains unchecked checkboxes — possible stale/premature .DONE (#462 safeguard).`, + ); + return true; +} + /** * Scan an area path for pending tasks. * @@ -858,8 +893,14 @@ export function scanAreaForTasks( continue; } - // Skip if .DONE exists (already complete) - if (existsSync(join(entryPath, ".DONE"))) continue; + // Skip if .DONE exists (already complete). + // TP-196 / #462: doctor-style safeguard — if .DONE coexists with + // unchecked checkboxes in STATUS.md, warn so operators can investigate + // before the task is silently treated as complete. + if (existsSync(join(entryPath, ".DONE"))) { + checkDoneAuthoritySafeguard(entryPath); + continue; + } // Skip if no PROMPT.md const promptPath = join(entryPath, "PROMPT.md"); diff --git a/extensions/taskplane/execution.ts b/extensions/taskplane/execution.ts index c1fe0707..bf305515 100644 --- a/extensions/taskplane/execution.ts +++ b/extensions/taskplane/execution.ts @@ -885,6 +885,13 @@ async function parseStatusMdContent( * @param tracker - Mtime tracker for stall detection * @param stallTimeoutMs - Stall timeout in milliseconds * @param now - Current timestamp (epoch ms) for deterministic testing + * @param multiSegmentContext - Optional segment-authority context (TP-196 / #462). + * When provided AND `isFinalSegment === false`, + * `.DONE` is treated as a non-authoritative signal + * (Priority 1 is skipped). This guards against a + * stale or premature `.DONE` from a non-final + * segment short-circuiting the task to succeeded + * before the remaining segments have run. */ export async function resolveTaskMonitorState( taskId: string, @@ -896,6 +903,7 @@ export async function resolveTaskMonitorState( now: number, runtimeBackend?: RuntimeBackend, v2Context?: { stateRoot: string; batchId: string; laneNumber: number }, + multiSegmentContext?: { isFinalSegment: boolean; segmentId: string }, ): Promise { // TP-115/TP-127: Backend-aware liveness check. // V2: read the lane snapshot file written by lane-runner every second. @@ -1035,7 +1043,27 @@ export async function resolveTaskMonitorState( } // ── Priority 1: .DONE file found → succeeded ──────────────── - if (doneFileFound) { + // TP-196 / #462: Monitor guard for multi-segment tasks. When the caller + // has provided a segment-authority context AND tells us the active segment + // is NOT the final segment in the task plan, `.DONE` MUST NOT be accepted + // as authoritative — a non-final segment's worker should never have + // produced one. We log a WARN and fall through to the lower priorities + // (which keep the task in a non-terminal state so the engine can recover). + const doneAcceptedAsAuthority = + doneFileFound && !(multiSegmentContext && multiSegmentContext.isFinalSegment === false); + if (doneFileFound && !doneAcceptedAsAuthority) { + execLog( + "monitor", + taskId, + `WARN: .DONE present for non-final segment '${multiSegmentContext?.segmentId}' — ignoring (#462 guard)`, + { + session: sessionName, + segmentId: multiSegmentContext?.segmentId, + donePath, + }, + ); + } + if (doneAcceptedAsAuthority) { return { taskId, status: "succeeded", @@ -1315,6 +1343,19 @@ export async function monitorLanes( const statusPath = unit.packet.statusPath; const statusResult = await parseStatusMdAtPath(statusPath); + // TP-196 / #462: Build multi-segment authority context so + // `.DONE` from a non-final segment is not accepted as terminal. + const taskSegmentIds = task.task.segmentIds ?? []; + const taskActiveSegmentId = task.task.activeSegmentId ?? null; + let multiSegmentContext: { isFinalSegment: boolean; segmentId: string } | undefined; + if (taskSegmentIds.length > 1 && taskActiveSegmentId) { + const finalSegmentId = taskSegmentIds[taskSegmentIds.length - 1]; + multiSegmentContext = { + isFinalSegment: taskActiveSegmentId === finalSegmentId, + segmentId: taskActiveSegmentId, + }; + } + const snapshot = await resolveTaskMonitorState( task.taskId, donePath, @@ -1331,6 +1372,7 @@ export async function monitorLanes( laneNumber: lane.laneNumber, } : undefined, + multiSegmentContext, ); currentTaskSnapshot = snapshot; diff --git a/extensions/taskplane/lane-runner.ts b/extensions/taskplane/lane-runner.ts index 551cd52f..a031a2cb 100644 --- a/extensions/taskplane/lane-runner.ts +++ b/extensions/taskplane/lane-runner.ts @@ -68,6 +68,7 @@ import { type LaneTaskStatus, type SupervisorAlertCallback, type StepSegmentMapping, + type SegmentScopeMode, } from "./types.ts"; const LANE_RUNNER_DIR = dirname(fileURLToPath(import.meta.url)); @@ -178,6 +179,75 @@ export function isSegmentComplete( return result.unchecked === 0; } +/** + * Compute the authoritative `SegmentScopeMode` for one worker iteration. + * + * This is the single source of truth for the FULL_TASK vs SEGMENT_SCOPED + * decision (TP-196 / #502). All segment-related side-effects (env vars, + * system-prompt overlay, prompt content, tool registration) should derive + * their behaviour from this mode rather than re-evaluating the underlying + * boolean conditions in isolation, which is what created the drift risk + * documented in #502. + * + * Returns `SEGMENT_SCOPED` iff ALL of the following hold: + * - The task has a non-empty `stepSegmentMap` (parsed from PROMPT.md markers). + * - The lane has an associated `currentRepoId` (segmentId set, so we know + * which repo this lane is iterating). + * - The (legacy-fallback-filtered) `repoStepNumbers` set is non-null (the + * repo has at least one step with explicit segment markers). + * - A `currentStepNumber` is provided (there is a step to evaluate). + * - The current step's segment mapping contains an entry for `currentRepoId` + * (the worker actually has segment-scoped work in the current step). + * + * In any other case the mode is `FULL_TASK`. + * + * @since TP-196 + */ +export function computeSegmentScopeMode( + stepSegmentMap: StepSegmentMapping[] | undefined | null, + repoStepNumbers: Set | null, + currentRepoId: string | null, + currentStepNumber: number | null, +): SegmentScopeMode { + if (!stepSegmentMap || !currentRepoId || !repoStepNumbers) return "FULL_TASK"; + if (currentStepNumber === null) return "FULL_TASK"; + const currentStepMapping = stepSegmentMap.find((s) => s.stepNumber === currentStepNumber); + if (!currentStepMapping) return "FULL_TASK"; + const mySegment = currentStepMapping.segments.find((seg) => seg.repoId === currentRepoId); + return mySegment ? "SEGMENT_SCOPED" : "FULL_TASK"; +} + +/** + * Pre-spawn segment-completion check (TP-196 / #508). + * + * Returns `true` when the lane-runner iteration loop should SKIP spawning + * a worker because all of the segment's checkboxes for this repo are + * already complete. The lane should `break` out of its iteration loop and + * fall through to post-loop completion handling. + * + * Contract: + * - Returns `false` for FULL_TASK iterations (`currentRepoId === null` or + * `repoStepNumbers === null` or empty). Those rely on the existing + * `remainingSteps.length === 0` exit, not this check. + * - Returns `true` iff EVERY step in `repoStepNumbers` is + * `isSegmentComplete(statusContent, stepNum, currentRepoId)`. + * + * Pure function: no filesystem access, no global state. The caller reads + * the STATUS.md content once per iteration and passes it in. + * + * @since TP-196 + */ +export function shouldSkipSpawnForCompleteSegment( + statusContent: string, + repoStepNumbers: Set | null, + currentRepoId: string | null, +): boolean { + if (!repoStepNumbers || !currentRepoId || repoStepNumbers.size === 0) return false; + return [...repoStepNumbers].every((stepNum) => + isSegmentComplete(statusContent, stepNum, currentRepoId), + ); +} + // ── Types ──────────────────────────────────────────────────────────── /** @@ -418,6 +488,27 @@ export async function executeTaskV2( if (remainingSteps.length === 0) break; // All done + // TP-196 / #508: Pre-spawn segment-completion check. + // + // When the lane is iterating a segment-scoped task, verify that NOT ALL + // `repoStepNumbers` are segment-complete before incurring the cost of + // spawning a worker. The `remainingSteps` filter above already enforces + // this implicitly (via `isSegmentComplete`), but expressing the check + // explicitly at the spawn boundary: + // 1. Makes the wasted-iteration prevention contract visible. + // 2. Provides a defensive backstop for cases where `parsed.steps` and + // `repoStepNumbers` diverge (e.g., legacy/partial-marker tasks). + // 3. Gives behavioural tests a clean assertion target (via the pure + // helper `shouldSkipSpawnForCompleteSegment`). + if (shouldSkipSpawnForCompleteSegment(iterStatusContent, repoStepNumbers, currentRepoId)) { + logExecution( + statusPath, + "Pre-spawn segment-completion check", + `all segment checkboxes already complete for repo '${currentRepoId}' — skipping worker spawn (#508)`, + ); + break; + } + totalIterations++; updateStatusField( statusPath, @@ -454,16 +545,16 @@ export async function executeTaskV2( /* ignore */ } - // TP-174/TP-501: Compute segment scope mode BEFORE building prompt. - const isSegmentScoped = !!( - stepSegmentMap && - currentRepoId && - repoStepNumbers && - remainingSteps.length > 0 && - stepSegmentMap - .find((s) => s.stepNumber === remainingSteps[0].number) - ?.segments.find((seg) => seg.repoId === currentRepoId) + // TP-174/TP-501/TP-196: Compute segment scope mode BEFORE building prompt. + // `segmentScopeMode` is the authoritative TP-196 flag; `isSegmentScoped` is + // preserved as a boolean alias for ergonomics at the many existing call sites. + const segmentScopeMode: SegmentScopeMode = computeSegmentScopeMode( + stepSegmentMap, + repoStepNumbers, + currentRepoId, + remainingSteps.length > 0 ? remainingSteps[0].number : null, ); + const isSegmentScoped = segmentScopeMode === "SEGMENT_SCOPED"; const promptLines = [ `Read your task instructions at: ${promptPath}`, @@ -513,16 +604,27 @@ export async function executeTaskV2( // Segment scope mode is determined by which system prompt was loaded. // No SegmentScopeMode line needed — the prompt IS the mode. - // TP-174: Segment-scoped prompt — show only this segment's checkboxes - if (stepSegmentMap && currentRepoId && repoStepNumbers && remainingSteps.length > 0) { + // TP-174/TP-196: Segment-scoped prompt — show only this segment's checkboxes. + // Gated on the authoritative `isSegmentScoped` (derived from `segmentScopeMode`) + // rather than the raw composite condition, so the prompt branch can't drift + // from the mode decision (TP-196 / #502). + if (isSegmentScoped) { const currentStepNum = remainingSteps[0].number; - const currentStepMapping = stepSegmentMap.find((s) => s.stepNumber === currentStepNum); + // Defensive guards: when `isSegmentScoped === true`, `computeSegmentScopeMode` + // has already verified `stepSegmentMap`, `currentRepoId`, and that the + // current step's mapping contains an entry for the active repo. We re-fetch + // the structures here for clarity. If any are missing we log and skip the + // segment block (defense-in-depth — should never trip in practice). + const currentStepMapping = stepSegmentMap?.find((s) => s.stepNumber === currentStepNum); const mySegment = currentStepMapping?.segments.find((seg) => seg.repoId === currentRepoId); - // Only inject segment-scoped prompt when the current step has an explicit - // segment for this repoId. If mySegment is missing (legacy task without - // markers, or step has no work for this repo), skip and preserve legacy behavior. - if (currentStepMapping && mySegment) { + if (!currentStepMapping || !mySegment) { + logExecution( + statusPath, + "WARN", + `segmentScopeMode === SEGMENT_SCOPED but current step mapping missing — skipping segment prompt block (currentRepoId=${currentRepoId}, stepNum=${currentStepNum})`, + ); + } else { const otherSegments = currentStepMapping.segments.filter((seg) => seg.repoId !== currentRepoId); // Count total segments for this repo across all steps diff --git a/extensions/taskplane/resume.ts b/extensions/taskplane/resume.ts index 747f45e0..1d1406e0 100644 --- a/extensions/taskplane/resume.ts +++ b/extensions/taskplane/resume.ts @@ -295,11 +295,40 @@ export function collectAllRepoRoots( // ── Resume Pure Functions ──────────────────────────────────────────── +/** + * Determine whether a multi-segment task's persisted segment frontier is + * complete — i.e., every segment for the task reached a terminal-success + * status ("succeeded" or "skipped"). + * + * Returns: + * - `true` when the task has segments AND all of them are terminal-success. + * - `true` when the task has no segments recorded (single-segment / legacy + * tasks — the guard does not apply and `.DONE` is authoritative). + * - `false` when at least one segment is pending/running/failed/stalled. + * + * Used by `collectDoneTaskIdsForResume` (TP-196 / #462) to refuse a stale or + * premature `.DONE` from suppressing re-execution of remaining segments. + */ +function isSegmentFrontierCompleteForResume( + persistedState: PersistedBatchState, + taskId: string, +): boolean { + const segments = (persistedState.segments ?? []).filter((s) => s.taskId === taskId); + if (segments.length === 0) return true; // No segments recorded — guard does not apply. + return segments.every((s) => s.status === "succeeded" || s.status === "skipped"); +} + /** * Collect task IDs with authoritative .DONE markers. * - * Segment frontier state does not suppress .DONE authority. If a marker exists, - * resume reconciliation will mark the task complete regardless of segment state. + * Segment frontier state does not suppress .DONE authority for tasks WITHOUT + * persisted segment records (single-segment / legacy). For tasks WITH segment + * records (multi-segment), TP-196 / #462 adds a resume guard: when `.DONE` + * exists but the segment frontier is incomplete (at least one segment is not + * yet succeeded/skipped), we DO NOT add the taskId to the done set — the + * task will be re-reconciled instead of silently marked complete. A WARN is + * logged so operators can spot the inconsistency. The on-disk `.DONE` marker + * is left alone; the engine will re-establish authoritative state. */ export function collectDoneTaskIdsForResume( persistedState: PersistedBatchState, @@ -308,22 +337,38 @@ export function collectDoneTaskIdsForResume( ): Set { const doneTaskIds = new Set(); for (const task of persistedState.tasks) { + let markerFound = false; + let markerLocation: string | null = null; if (task.taskFolder && hasTaskDoneMarker(task.taskFolder)) { - doneTaskIds.add(task.taskId); - continue; + markerFound = true; + markerLocation = task.taskFolder; + } + if (!markerFound) { + const laneRec = persistedState.lanes.find((l) => l.taskIds.includes(task.taskId)); + if (laneRec?.worktreePath && task.taskFolder) { + const resolved = resolveCanonicalTaskPaths( + task.taskFolder, + laneRec.worktreePath, + repoRoot, + !!workspaceConfig, + ); + if (existsSync(resolved.donePath)) { + markerFound = true; + markerLocation = resolved.donePath; + } + } } - const laneRec = persistedState.lanes.find((l) => l.taskIds.includes(task.taskId)); - if (laneRec?.worktreePath && task.taskFolder) { - const resolved = resolveCanonicalTaskPaths( - task.taskFolder, - laneRec.worktreePath, - repoRoot, - !!workspaceConfig, + if (!markerFound) continue; + + // TP-196 / #462: Resume guard — refuse `.DONE` authority for multi-segment + // tasks with an incomplete segment frontier. + if (!isSegmentFrontierCompleteForResume(persistedState, task.taskId)) { + console.warn( + `[resume] WARN: .DONE present for task ${task.taskId} at ${markerLocation} but segment frontier is incomplete — not marking complete (#462 guard). Task will re-reconcile.`, ); - if (existsSync(resolved.donePath)) { - doneTaskIds.add(task.taskId); - } + continue; } + doneTaskIds.add(task.taskId); } return doneTaskIds; } diff --git a/extensions/taskplane/types.ts b/extensions/taskplane/types.ts index 7518961b..cc92a04b 100644 --- a/extensions/taskplane/types.ts +++ b/extensions/taskplane/types.ts @@ -180,6 +180,29 @@ export function buildExpansionRequestId(timestamp = Date.now()): string { // ── Step-Segment Mapping (Phase A: segment-scoped worker visibility) ──── +/** + * Authoritative segment-scope mode for a single worker iteration. + * + * - `FULL_TASK`: the worker sees the entire PROMPT.md, all steps, all checkboxes. + * No `Active segment ID` / `Your checkboxes for this step` prose is injected. + * Segment-related environment variables (`TASKPLANE_ACTIVE_SEGMENT_ID`, + * `TASKPLANE_SEGMENT_ID`) are hard-cleared so that runtime tools keyed on + * them (e.g., `request_segment_expansion`) cannot accidentally register. + * + * - `SEGMENT_SCOPED`: the worker is iterating a specific segment of a + * multi-segment task. Only that segment's steps and checkboxes are shown; + * `Active segment ID` is announced; segment-related env vars carry the + * active `segmentId`; the segment-overlay system prompt is appended. + * + * This is the single authoritative flag for the segment-scope decision + * (TP-196 / #502). Call sites should derive their behaviour from this mode + * rather than re-evaluating the underlying boolean conditions, which prevents + * the multiple branches drifting out of sync. + * + * @since TP-196 + */ +export type SegmentScopeMode = "FULL_TASK" | "SEGMENT_SCOPED"; + /** A group of checkboxes scoped to a single repo within a step. */ export interface SegmentCheckboxGroup { repoId: string; diff --git a/extensions/tests/dashboard-segment-pill-row.test.ts b/extensions/tests/dashboard-segment-pill-row.test.ts new file mode 100644 index 00000000..1ea2e4c9 --- /dev/null +++ b/extensions/tests/dashboard-segment-pill-row.test.ts @@ -0,0 +1,274 @@ +/** + * TP-197 (#464) — Dashboard segment-level progress indicators + * + * Verifies the `taskSegmentPillRow` renderer helper in `dashboard/public/app.js`. + * + * The dashboard frontend is a vanilla-JS browser script (no ESM exports), so + * we test the helper by extracting its source from `app.js` and evaluating it + * in an isolated sandbox with a minimal `escapeHtml` polyfill. This catches + * regressions in the segment pill rendering without spinning up a browser. + * + * Background: TP-145 introduced `.DONE` suppression for non-final segments, + * which left a visibility gap on the dashboard during multi-segment task + * execution. TP-197 closes that gap by rendering a per-segment status pill + * row. The helper MUST return an empty string for single-segment tasks so + * the rendered DOM for non-segmented tasks is byte-identical to today. + */ + +import { describe, it } from "node:test"; +import { readFileSync } from "node:fs"; +import { dirname, resolve } from "node:path"; +import { fileURLToPath } from "node:url"; +import { expect } from "./expect.ts"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const APP_JS = resolve(__dirname, "../../dashboard/public/app.js"); + +function extractFn(source: string, name: string): string { + const needle = `function ${name}`; + const start = source.indexOf(needle); + if (start < 0) throw new Error(`fn ${name} not found in app.js`); + const braceStart = source.indexOf("{", start); + if (braceStart < 0) throw new Error(`no opening brace for ${name}`); + let depth = 1; + let i = braceStart + 1; + while (i < source.length && depth > 0) { + const ch = source[i]; + if (ch === "{") depth++; + else if (ch === "}") depth--; + i++; + } + if (depth !== 0) throw new Error(`unbalanced braces for ${name}`); + return source.slice(start, i); +} + +// Minimal escapeHtml polyfill matching app.js's DOM-based version semantically. +function escapeHtml(s: unknown): string { + const map: Record = { + "&": "&", + "<": "<", + ">": ">", + '"': """, + "'": "'", + }; + return String(s ?? "").replace(/[&<>"']/g, (c) => map[c] || c); +} + +interface Helpers { + taskSegmentPillRow: ( + task: + | { taskId?: string; segmentIds?: string[]; activeSegmentId?: string | null } + | null + | undefined, + segmentStatusMap: Map, + activeSegmentId: string | null, + ) => string; + parseSegmentId: (id: string) => { taskId: string; repoId: string } | null; +} + +function loadHelpers(): Helpers { + const src = readFileSync(APP_JS, "utf8"); + const sandbox = [ + extractFn(src, "parseSegmentId"), + extractFn(src, "segmentProgressText"), + extractFn(src, "taskSegmentPillRow"), + "return { parseSegmentId, segmentProgressText, taskSegmentPillRow };", + ].join("\n"); + // biome-ignore lint/security/noGlobalEval: Test-only sandbox for browser-script extraction. + return new Function("escapeHtml", sandbox)(escapeHtml) as Helpers; +} + +describe("TP-197: taskSegmentPillRow renderer", () => { + it("renders a pill row for a 3-segment task with the running segment marked current", () => { + const helpers = loadHelpers(); + const segMap = new Map([ + ["TP-X::shared-libs", "succeeded"], + ["TP-X::web-client", "running"], + ["TP-X::admin", "pending"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-X", segmentIds: ["TP-X::shared-libs", "TP-X::web-client", "TP-X::admin"] }, + segMap, + "TP-X::web-client", + ); + expect(out.startsWith('
')).toBe(true); + expect(out.endsWith("
")).toBe(true); + expect((out.match(/class="seg-pill /g) || []).length).toBe(3); + expect((out.match(/seg-pill-current/g) || []).length).toBe(1); + expect(out).toContain("seg-succeeded"); + expect(out).toContain("seg-running seg-pill-current"); + expect(out).toContain("seg-pending"); + expect(out).toContain(">shared-libs<"); + expect(out).toContain(">web-client<"); + expect(out).toContain(">admin<"); + }); + + it("returns empty string for single-segment tasks (no regression for non-segmented tasks)", () => { + const helpers = loadHelpers(); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-Y", segmentIds: ["TP-Y::default"] }, + new Map([["TP-Y::default", "running"]]), + "TP-Y::default", + ); + expect(out).toBe(""); + }); + + it("returns empty string for tasks with no segmentIds", () => { + const helpers = loadHelpers(); + expect(helpers.taskSegmentPillRow({ taskId: "TP-Z" }, new Map(), null)).toBe(""); + expect(helpers.taskSegmentPillRow({ taskId: "TP-Z", segmentIds: [] }, new Map(), null)).toBe(""); + }); + + it("returns empty string for null/undefined task input", () => { + const helpers = loadHelpers(); + expect(helpers.taskSegmentPillRow(null, new Map(), null)).toBe(""); + expect(helpers.taskSegmentPillRow(undefined, new Map(), null)).toBe(""); + }); + + it("emits no current-segment emphasis when activeSegmentId is null", () => { + const helpers = loadHelpers(); + const segMap = new Map([ + ["TP-W::a", "succeeded"], + ["TP-W::b", "pending"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-W", segmentIds: ["TP-W::a", "TP-W::b"] }, + segMap, + null, + ); + expect(out.includes("seg-pill-current")).toBe(false); + expect((out.match(/class="seg-pill /g) || []).length).toBe(2); + }); + + it("renders failed / skipped / stalled status pills with correct classes", () => { + const helpers = loadHelpers(); + const segMap = new Map([ + ["TP-Q::a", "failed"], + ["TP-Q::b", "skipped"], + ["TP-Q::c", "stalled"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-Q", segmentIds: ["TP-Q::a", "TP-Q::b", "TP-Q::c"] }, + segMap, + null, + ); + expect(out).toContain("seg-failed"); + expect(out).toContain("seg-skipped"); + expect(out).toContain("seg-stalled"); + }); + + it("falls back to pending styling for unknown or missing statuses", () => { + const helpers = loadHelpers(); + const unknownMap = new Map([["TP-U::a", "weird-status"]]); + const out1 = helpers.taskSegmentPillRow( + { taskId: "TP-U", segmentIds: ["TP-U::a", "TP-U::b"] }, + unknownMap, + null, + ); + // Both 'weird-status' (unknown) and missing TP-U::b fall back to pending styling. + expect((out1.match(/seg-pending/g) || []).length).toBe(2); + }); + + it("includes a status-bearing tooltip on each pill", () => { + const helpers = loadHelpers(); + const segMap = new Map([ + ["TP-T::a", "succeeded"], + ["TP-T::b", "running"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-T", segmentIds: ["TP-T::a", "TP-T::b"] }, + segMap, + null, + ); + expect(out).toContain('title="TP-T::a · succeeded"'); + expect(out).toContain('title="TP-T::b · running"'); + }); + + it("falls back to the raw segmentId as the label when the id is unparseable", () => { + const helpers = loadHelpers(); + const segMap = new Map([ + ["malformed-id", "running"], + ["TP-M::ok", "pending"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-M", segmentIds: ["malformed-id", "TP-M::ok"] }, + segMap, + null, + ); + // malformed-id has no `::` separator → parseSegmentId returns null; + // renderer falls back to the raw id as the displayed label. + expect(out).toContain(">malformed-id<"); + expect(out).toContain(">ok<"); + }); + + it("escapes HTML in repoIds and segmentIds (XSS guard)", () => { + const helpers = loadHelpers(); + const evilId = "TP-E::"; + const segMap = new Map([ + [evilId, "running"], + ["TP-E::safe", "pending"], + ]); + const out = helpers.taskSegmentPillRow( + { taskId: "TP-E", segmentIds: [evilId, "TP-E::safe"] }, + segMap, + null, + ); + expect(out.includes("
WaveStatusSessionTelemetrySession IDDetailsWaveStatusSessionTelemetry
${effectiveAlive ? escapeHtml(effectiveSession) : "—"}${mergeTelemetryHtml(mergeTel, effectiveAlive)}`; - html += ''; - html += `${mr.failureReason ? escapeHtml(mr.failureReason) : "—"}
${rr.status}${rrLanes}${rrDetail}
${escapeHtml(sess)}${mergeTelemetryHtml(sessTel, true)}