feat(TP-196 + TP-197): multi-segment engine hardening + dashboard segment-progress indicators#577
Merged
Merged
Conversation
…(responsive-safe)
…ify scope decision - Add 'export type SegmentScopeMode = FULL_TASK | SEGMENT_SCOPED' to types.ts - Add computeSegmentScopeMode() helper in lane-runner.ts \u2014 single source of truth for the FULL_TASK vs SEGMENT_SCOPED decision (TP-196 / #502). - Iteration loop now derives both 'segmentScopeMode' (typed) and the legacy 'isSegmentScoped' boolean alias from one computation, preventing the multi-condition drift documented in #502. - 16 new tests (11 unit + 5 source-analysis contracts) in segment-scoped-lane-runner.test.ts cover the helper's truth table and guard the unified gating sites. - TASKPLANE_ACTIVE_SEGMENT_ID and the segment system-prompt overlay are already gated via the (now unified) isSegmentScoped alias; the request_segment_expansion tool registration inherits this gating through the env var. Targeted suite: 62/62 pass. Full fast suite: 3643 pass / 0 fail.
Per R002 code review: the segment-scoped prompt injection branch was still keyed on the raw 'stepSegmentMap && currentRepoId && repoStepNumbers && remainingSteps.length > 0' composite, leaving the very drift surface #502 is meant to eliminate. - lane-runner.ts: outer gate is now 'if (isSegmentScoped)'; an inner defensive guard logs a WARN and skips when currentStepMapping/mySegment is unexpectedly absent (should never trip in practice because computeSegmentScopeMode already verifies those preconditions). - segment-scoped-lane-runner.test.ts: update 4.1 + 7.3 to assert the mode-driven gate + the absence of the legacy composite-condition string. Targeted: 62/62 pass. Full fast suite: 3643 pass / 0 fail. Gates green.
Adds a horizontal pill row of per-segment status badges to each multi-segment task row in the dashboard. Each pill shows an icon (succeeded / running / pending / failed / stalled / skipped) plus the repoId for that segment, with the currently-executing segment visually emphasized. This closes the operator visibility gap left by TP-145's .DONE suppression for non-final segments: previously, multi-segment lanes sat 'running' with no segment-level signal during the suppression window. Now operators can see at a glance which segments have finished, which is running, and which remain. Implementation: - dashboard/public/app.js: new taskSegmentPillRow() helper; integrated into renderLanesTasks() as grid row 3 of .task-row (mirrors the task-title-subtitle pattern from TP-485). Returns '' for single- segment tasks so non-segmented tasks render byte-identical DOM to today. - dashboard/public/style.css: extends .task-row grid-template-rows to 'auto auto auto'; adds .task-segment-row container plus .seg-pill + status variants. Pill row lives OUTSIDE .task-step intentionally so the existing @media (max-width: 900px) { .task-step { display: none } } rule does NOT hide segment context on narrow viewports. - extensions/tests/dashboard-segment-pill-row.test.ts: 11 new unit tests cover multi-segment rendering, current-segment emphasis, single-segment fallback (no regression), malformed input handling, and XSS-safe escaping. No server.cjs change required: batch.segments[], task.segmentIds, and runtimeLaneSnapshots[*].segmentId were already exposed (verified in Step 0). The progress bar is unchanged \u2014 TP-174 already segment-scoped it via v2Progress; the new pill row provides the missing context that makes the existing bar legible as 'current segment's progress.' Tests: 3638 pass / 0 fail / 1 skipped (was 3627; +11 new). Gates: typecheck 0, lint 0, format:check 0.
…ery)
Three defense-in-depth guards harden the .DONE authority model against
stale or premature markers in multi-segment tasks:
1. Monitor guard (execution.ts::resolveTaskMonitorState):
- New optional 'multiSegmentContext: { isFinalSegment, segmentId }' parameter.
- When isFinalSegment === false and .DONE is present, log a WARN via
execLog and SKIP Priority 1, falling through to lower priorities.
- monitorLanes populates the context from task.task.segmentIds +
task.task.activeSegmentId (active === last in deterministic order).
2. Resume guard (resume.ts::collectDoneTaskIdsForResume):
- New internal helper isSegmentFrontierCompleteForResume().
- When persisted segments exist AND any segment is not succeeded/skipped,
refuse to add the taskId to the done set. Emit console.warn carrying
'#462 guard' so operators see the inconsistency. The on-disk .DONE is
left alone; resume re-reconciles instead of silently marking complete.
3. Discovery safeguard (discovery.ts::checkDoneAuthoritySafeguard):
- New exported helper warning when .DONE coexists with unchecked
checkboxes in STATUS.md. Wired into scanAreaForTasks (no behaviour
change to the scan itself \u2014 doctor-style warning only).
Tests:
- New: extensions/tests/done-authority-multi-segment.test.ts (14 tests).
- Updated: resume-segment-frontier.test.ts 'keeps .DONE authoritative...'
test now asserts the inverted (post-#462) contract.
- Adjusted: engine-runtime-v2-routing.test.ts 14.5 widens the slice window
to accommodate the new monitor-guard prelude in resolveTaskMonitorState.
Full fast suite: 3657 pass / 0 fail. Typecheck / lint / format:check clean.
…ted iteration Adds an explicit pre-spawn check in the lane-runner iteration loop. When the lane is iterating a segment-scoped task AND all of its repoStepNumbers are already isSegmentComplete (e.g., a worker already checked every box on the final iteration but exited slightly before the loop's existing 'remainingSteps.length === 0' break could fire), the engine now logs a 'Pre-spawn segment-completion check' execution-log line and breaks out of the loop \u2014 saving the ~30-60s + token cost of spawning a worker that would find nothing to do. Defense properties: - Lives AFTER the existing 'remainingSteps.length === 0' break and BEFORE 'totalIterations++' so a fully-complete segment skips the spawn without incrementing the iteration counter. - Gated on so FULL_TASK iterations remain unaffected. - Falls through to the existing post-loop completion handling \u2014 no new branching. 5 new source-analysis tests in segment-scoped-lane-runner.test.ts (sections 10.0\u201310.4) verify the check exists at the spawn boundary, uses isSegmentComplete on every repoStepNumber, breaks out of the loop, and is properly gated. Full fast suite: 3662 pass / 0 fail. Gates green.
…havioural test (R005)
Per R005 code review: Step 4 needed an end-to-end behavioural test
verifying that 'spawnAgent' is NEVER called when a segment-scoped task's
checkboxes are already all complete.
Changes:
- Extract the inline iteration-loop check into a pure helper:
'shouldSkipSpawnForCompleteSegment(statusContent, repoStepNumbers,
currentRepoId): boolean'. The lane-runner now delegates to this helper.
- New test file 'extensions/tests/early-exit-segment-spawn-skip.test.ts'
with two describe blocks:
1. Helper-level behavioural tests (6 cases) covering the decision
contract directly with realistic STATUS.md fixtures.
2. End-to-end executeTaskV2 test that mocks spawnAgent via
'mock.module("../taskplane/agent-host.ts", ...)' and asserts
'spawnAgentCallCount === 0' and 'iterations === 0' for a fixture
worktree whose segment checkboxes are all checked.
- Update segment-scoped-lane-runner.test.ts 10.x source-analysis tests
to assert the new helper-based wiring.
Full fast suite: 3669 pass / 0 fail. Gates green.
Adds extensions/tests/segment-scope-mode-prompt.test.ts \u2014 9 behavioural tests across 4 describe blocks, mocking spawnAgent to capture the worker prompt and env, then driving executeTaskV2 with realistic fixtures: 1. FULL_TASK (3 tests): prompt does NOT contain 'Active segment ID', 'Your checkboxes for this step:', 'Other segments in this step (NOT yours)', or 'Segment-scoped context'; TASKPLANE_ACTIVE_SEGMENT_ID and TASKPLANE_SEGMENT_ID env vars are hard-cleared to ''; system prompt is BASE only (no segment overlay). 2. SEGMENT_SCOPED (3 tests): prompt DOES contain 'Active segment ID: <segmentId>', 'Your checkboxes for this step:', 'Other segments in this step (NOT yours', 'Segment-scoped context'; env carries the active segment ID; system prompt appends the segment overlay AFTER the base prompt. 3. Polyrepo single-segment regression (1 test): task with segment markers for only ONE repo still injects the segment-scoped block, proving the worker is not silently scoped to step 0. 4. Legacy / partial-marker fallback (2 tests): (a) task with NO segment markers at all falls back to FULL_TASK; (b) task with markers for OTHER repos but not the active repo also falls back to FULL_TASK. Architectural note: the original #503 wording asks for assertions that the prompt includes 'SegmentScopeMode: FULL_TASK' literal text. That prose was deliberately removed in commit 97816c0 ("hard mode separation for worker segment scoping") in favor of separate system-prompt files ("the prompt IS the mode"). These tests assert the architecturally- current contract \u2014 prompt content + env vars + system-prompt overlay reflect the mode \u2014 which preserves the intent of #503 while honoring the post-#502 design. Full fast suite: 3678 pass / 0 fail. Gates green.
… (R008) Per R008: test 3.1 'proves the worker proceeds beyond Step 0' was relying on a too-aggressive mock that checked off ALL unchecked boxes per spawn, collapsing the loop to a single iteration and making the test pass even if the engine silently scoped to Step 0. Fix: - Added 'workerAdvanceMode' toggle on the mock spawnAgent: 'all' (default) for the FULL_TASK/SEGMENT_SCOPED prompt-content tests that just need one spawn captured, and 'first' for the polyrepo regression test which needs the loop to iterate step-by-step. - Test 3.1 now asserts capturedSpawns.length >= 2 AND that the second prompt mentions 'Step 1' + 'Create endpoint'. A silent-scope-to-Step-0 regression would fail this test. - Reviewer style: use optional chaining for env access in the mock. Full fast suite: 3678 pass / 0 fail. Gates green.
…es green, 3678 pass)
…ATUS finalize Adds a 4-paragraph Fixed entry under [Unreleased] in CHANGELOG.md covering: 1. .DONE authority guards (#462 \u2014 monitor + resume + discovery) 2. SegmentScopeMode unification (#502 + #503) 3. Wasted-iteration elimination (#508) 4. Validation summary (gates green, 3678 pass / 0 fail / 1 skip, +51 tests) Adds four issue-close comment drafts (#462, #502, #503, #508) to the STATUS.md Notes section for the operator to post after merge. Marks STATUS.md Status: ✅ Complete (task done).
…t-merge fold)
Sage caught that the in-batch implementation changed .task-row's
'grid-template-rows' from 'auto auto' to 'auto auto auto'
unconditionally. 'auto' rows don't fully collapse when 'row-gap' is
set, so a non-segmented task got an 8px visible gap below row 2
where the empty pill row sat. Violates the PROMPT's 'non-segmented
batches render identically to today' contract.
## Fix (sage's recommended pattern)
dashboard/public/style.css:
- Default .task-row keeps the pre-TP-197 'grid-template-rows: auto auto'
- New rule '.task-row.has-segments' adds the 3rd row only when needed
dashboard/public/app.js:
- Compute taskRowClass = hasSegmentPillRow ? 'task-row has-segments' : 'task-row'
- Template renders <div class="${taskRowClass}">
## Layout-parity regression tests
extensions/tests/dashboard-segment-pill-row.test.ts (+2 tests):
- source check: default .task-row has 2-row grid; .has-segments has 3
- source check: app.js conditionally adds the class via ternary
These source-pattern guards prevent re-introduction of the
unconditional 3-row template or the dropped conditional-class
pattern in future edits.
## Validation
- Format pass auto-fixed my test code (1 file, Biome's wrap+quote-style preferences)
- All 4 hard gates exit 0
- 3691 tests passing (was 3689 before this fold; +2 new sage-fold tests)
- Dashboard tests: 13/13 pass (was 11; +2 layout-parity guards)
…ontrast bump (polyrepo-test folds)
Two user-flagged folds caught during operator polyrepo test:
## Fold 1: Wave-chip lane parallelization across all waves
User observed that wave chips in the top-of-dashboard 'Waves' bar
showed different separators based on which wave was active:
- Active wave: parallel/serial separators worked correctly (e.g.,
'W2 [TP-004 | TP-005]')
- Inactive waves: fell back to comma-separated ('W3 [TP-004, TP-005]')
The separator visibly 'flickered' as waves transitioned active —
W1 used '|' when active, then reverted to ',' when W2 became active.
### Root cause
formatWaveLaneBreakdown() built its task→lane map only from
batch.lanes, which is Runtime V2 live state and only populated for
the currently-active wave. For past/future waves, taskToLane.has(t)
returned false → hasAnyLaneData false → comma fallback.
### Fix
Derive task→lane map from batch.tasks[].laneNumber (persisted for
the entire batch lifecycle), with batch.lanes as a fallback. The
parallelization indicator is now stable across all wave states.
Call-site updated to pass batch.tasks as the new third argument.
formatWaveLaneBreakdown signature changed:
(taskIds, lanes, waveNumber) →
(taskIds, lanes, tasks, waveNumber)
### Tests (NEW: extensions/tests/dashboard-wave-lane-breakdown.test.ts, 8 cases)
- multi-lane wave uses '|' separator
- single-lane wave uses '→' separator
- REGRESSION: derives task→lane from tasks when lanes is empty
- REGRESSION: derives task→lane from tasks when lanes is unrelated
- backward-compat: falls back to lanes when tasks is empty
- no lane data: comma-separated legacy fallback
- mixed: same lane '→', different lanes '|'
- source: pending pill CSS uses --text-muted (fold 2 lock-down)
## Fold 2: Pending segment pill contrast bump
User observed that the pending state of the new TP-197 segment pills
(⬚ web-client in screenshot) had very low contrast on the dark theme,
making the repo name hard to read.
### Root cause
.seg-pill.seg-pending used:
color: var(--text-faint) /* #484f58 on dark = ~3.7:1 contrast */
Below WCAG AA (4.5:1) for normal text.
### Fix
color: var(--text-muted) /* #8b949e on dark = ~6.5:1 contrast */
Also: border-color: var(--border-default, var(--border-subtle)) so
the pill picks up a slightly stronger border if --border-default is
ever defined; falls back to --border-subtle (the current behavior)
otherwise.
Visually: pending pills are now readable while remaining clearly
muted vs the saturated success (green) and running (accent) states.
## Validation
- npm run typecheck → exit 0
- npm run lint → exit 0
- npm run format:check → exit 0 (Biome auto-formatted 1 file in this commit)
- npm run test:fast → 3699 passing (+8 from prior baseline 3691)
- Manual visual confirm pending in polyrepo test will be operator's job
…p rendering (polyrepo-test fold 2) User's polyrepo retest after the previous fold caught a new issue: future-wave chips were showing '→' (serial) for tasks that would actually run in parallel once their wave started. Specifically W2 and W3 chips both showed 'TP-004 → TP-005' before W2 became active, then switched to 'TP-004 | TP-005' when W2 started. ## Root cause The previous fold (commit 46d7b69) made formatWaveLaneBreakdown derive task→lane from batch.tasks[].laneNumber. But the persistence layer assigns laneNumber=0 as a SENTINEL for unallocated tasks (persistence.ts:1378 + 2538: 'lane?.laneNumber ?? outcome?.laneNumber ?? 0'). Real lane numbers start at 1. My previous code accepted 0 as a real lane → all pending tasks got grouped under fake 'lane 0' → rendered as serial '→' display. Worse than the original behavior, which at least fell back to comma ('I don't know'). ## Fix Treat laneNumber >= 1 as a real lane assignment; skip 0 (sentinel). Per-task allocation: if ( t && t.taskId && typeof t.laneNumber === 'number' && t.laneNumber >= 1 && !taskToLane.has(t.taskId) ) { taskToLane.set(t.taskId, t.laneNumber); } Now: - Future-wave tasks with sentinel laneNumber=0 → not added to map - hasAnyLaneData becomes false for that wave - Comma fallback display ('TP-004, TP-005') - When the wave starts and real lanes are allocated, laneNumber updates to 1/2/3/... → wave chip switches to '|' display The display goes from comma (unknown) → '|' (parallel) when allocation happens. No more misleading '→' for pending tasks. ## Regression tests (+2) Added 2 tests to extensions/tests/dashboard-wave-lane-breakdown.test.ts: - 'laneNumber=0 sentinel does NOT count as a real lane' Both tasks with sentinel 0 → comma fallback (not '→') - 'mixed sentinel-0 and real laneNumbers — only real ones count' Tasks with mixed values: real ones render in their lanes, sentinel-0 tasks fall into the 'unassigned' trailing group ## Validation - typecheck/lint/format:check all exit 0 - 3701 tests passing (+2 from previous baseline 3699) - Sentinel-0 behavior locked down via source-pattern + behavioral tests
…header contrast (polyrepo-test fold 3) User caught three things in the final-state merge-agents panel: ## 1. Removed SESSION ID and DETAILS columns (dead weight) SESSION ID was HARDCODED to '\u2014' in every row \u2014 the dashboard had a column placeholder but never populated it. DETAILS was only used for mr.failureReason (rare failure cases); for the common all-merges- succeeded case it was always '\u2014' too. User has never seen real values in either column in actual runs. Both columns removed from the merge-table header and all three row templates (main merge result, per-repo sub-row, active session). When a real failure happens, the operator sees status='failed' in the Status column (still present) and can dig into engine logs for the reason. If structured failure data warrants a dedicated column in the future, easy to add back. ## 2. Bumped merge-table header contrast .merge-table th was using --text-faint (#484f58 on dark = ~3.7:1 contrast, below WCAG AA). Bumped to --text-muted (#8b949e = ~6.5:1). Matches other section headers in the dashboard (style.css lines 224, 248, 458 already use --text-muted for the same role) \u2014 the merge table was an inconsistency, now fixed. ## 3. Off-by-one wave attribution (#509) NOT addressed here User also observed (3rd time confirmed) that Wave 2 chip displays telemetry referencing W1's merge-result and Wave 4 displays W3's \u2014 the off-by-one wave attribution from issue #509. That's an existing tracked issue with a separate deeper diagnostic path. Will comment on #509 with the latest reproduction evidence separately from this fold (out of scope for a cosmetic-cleanup fold). ## Regression tests (+2) extensions/tests/dashboard-wave-lane-breakdown.test.ts: - 'merge-agents table header has 4 columns; Session ID + Details removed' \u2014 locks down the column removal so future edits don't bring back the dead columns - '.merge-table th uses --text-muted' \u2014 locks down the contrast bump so future style consolidation doesn't revert ## Validation - typecheck/lint/format:check all exit 0 - 3703 tests passing (+2 from 3701 baseline) - Visual fix: 4 columns instead of 6, headers readable on dark theme
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundles two task packets run in parallel because their file scopes don't overlap. Closes 5 segment-related issues that survived the #51 closure as polish/hardening work.
Issues closed
.DONEauthority for multi-segment tasksTP-196 — Multi-segment engine hardening (#462 + #502 + #503 + #508)
Size M, Review Level 2, 9 in-batch reviews (1 plan + 8 code, all APPROVE).
#502 (foundational, Step 2): SegmentScopeMode unification
Promoted
SegmentScopeModeto a first-class type intypes.ts. Lane-runner now computes it once viacomputeSegmentScopeMode(...)and threads via lane config. Key sites unified onisSegmentScoped:TASKPLANE_ACTIVE_SEGMENT_ID/TASKPLANE_SEGMENT_IDenv vars (now cleared in FULL_TASK mode)request_segment_expansiontool registration (gated via env-driven extension logic)#462 (Step 3):
.DONEauthority guards built atop #502's flagThree defense-in-depth guards:
resolveTaskMonitorState: demotes.DONEwhen active segment is known non-final (segmentIds.length > 1 && activeSegmentId != lastSegment)collectDoneTaskIdsForResume: requires all persisted segments to be terminal-success before trusting.DONE.DONE+ incomplete-frontier inconsistencyextensions/tests/done-authority-multi-segment.test.ts(372 LOC): non-final unlink failure, transient.DONErace, resume with stale marker#508 (Step 4): Early-exit optimization
Pre-spawn segment-completion check in
lane-runner.tsiteration loop — skips wasted spawn when all segment checkboxes are already complete. NEWextensions/tests/early-exit-segment-spawn-skip.test.ts(351 LOC) covers the optimization + soft-progress interaction.#503 (Step 5): SegmentScopeMode prompt-injection regression tests
NEW
extensions/tests/segment-scope-mode-prompt.test.ts(855 LOC) covering:TP-197 — Dashboard segment-progress indicators (#464)
Size S-M, Review Level 1, 2 plan reviews (1 REVISE → 1 APPROVE).
dashboard/public/app.js(+69 LOC):taskSegmentPillRow()renderer; segment-aware progress bar; current-segment highlightdashboard/public/style.css(+73 LOC):.seg-pillstyles for ✅ / ⏳ / ⬚ states + responsive layoutdashboard/server.cjs: no changes (segment data was already exposed via the API)extensions/tests/dashboard-segment-pill-row.test.ts(231 LOC): rendering assertions + XSS guards + malformed-input resiliencedocs/tutorials/use-the-dashboard.md: documents the new indicatorsSage post-merge fold (commit
533a6fa)Sage spot-check caught one real regression: the in-batch TP-197 implementation changed
.task-row'sgrid-template-rowsfromauto autotoauto auto autounconditionally. Withrow-gap: 8px, this added a visible 8px gap below row 2 for single-segment tasks where row 3 is empty. Violated the PROMPT's "non-segmented batches render identically to today" contract.Fix (sage's recommended pattern):
.task-rowstays atgrid-template-rows: auto auto(pre-TP-197 layout).task-row.has-segmentsadds the 3rd row only when settaskRowClass = hasSegmentPillRow ? "task-row has-segments" : "task-row"and renders<div class="${taskRowClass}">Layout-parity regression tests added (+2 in
dashboard-segment-pill-row.test.ts):.task-rowhas 2-row grid;.has-segmentsopts into 3Sage's other findings (no fold needed):
Validation
npm run typechecknpm run lintnpm run format:checknpm run test:fastWhat this does NOT do
.DONEsuppression behavior (TP-145 stays as-is)dashboard/public/to Biome lint scope (intentionally out per spec section 3)Release framing
Both packets are small targeted hardening/UX. Suggest v0.30.1 patch release after merge (CHANGELOG entries from both packets already in
[Unreleased]).Alternatively, hold for v0.31.0 with Tier-1.5 follow-ups (TS strictness ratchet, CHANGELOG fragments) — your call.