Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion taskplane-tasks/CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Last Updated:** 2026-05-10
**Status:** Active
**Next Task ID:** TP-196
**Next Task ID:** TP-198

---

Expand Down
220 changes: 220 additions & 0 deletions taskplane-tasks/TP-196-multi-segment-engine-hardening/PROMPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Task: TP-196 - Multi-segment engine hardening: `.DONE` authority + scope-mode unification + early-exit optimization + test hardening

**Created:** 2026-05-10
**Size:** M

## Review Level: 2 (Plan and Code)

**Assessment:** Bundles 4 segment-engine follow-up issues (#462, #502, #503, #508) into one task. All four touch overlapping files (`lane-runner.ts`, `execution.ts`, `resume.ts`, `discovery.ts`) and share a conceptual theme: hardening multi-segment execution against edge cases and drift. Plan review evaluates the unification design (single authoritative `SegmentScopeMode` flag, defense-in-depth `.DONE` guards). Code review evaluates the per-fix correctness and test adequacy. Per-step reviews fit naturally — each issue's work is independent enough that a step boundary maps to an issue.

**Score:** 4/8 — Blast radius: 2, Pattern novelty: 1, Security: 0, Reversibility: 1

## Canonical Task Folder

```
taskplane-tasks/TP-196-multi-segment-engine-hardening/
├── PROMPT.md ← This file (immutable above --- divider)
├── STATUS.md ← Execution state (worker updates this)
├── .reviews/ ← Reviewer output (created by the orchestrator runtime)
└── .DONE ← Created when complete
```

## Mission

Close out four segment-engine polish/hardening issues that survived the closure of #51 (multi-repo task execution) because they're defense-in-depth, not core feature work:

- **#462** — Harden `.DONE` authority for multi-segment tasks (monitor, resume, discovery guards)
- **#502** — Segment scope mode should be a single enum gating all segment signals
- **#503** — Add regression tests for SegmentScopeMode prompt injection
- **#508** — Lane-runner should check segment completion before spawning next iteration

These four are conceptually cohesive and overlap heavily in file scope. Bundling lets the worker reuse the segment-engine context once and ship a single coherent hardening pass.

By the end of TP-196:
- `.DONE` cannot prematurely terminate a multi-segment task in monitor/resume/discovery edge cases (#462)
- `SegmentScopeMode` is the single authoritative flag — env vars, tool registration, and execution branches all gate on it (#502)
- Regression tests verify both `FULL_TASK` and `SEGMENT_SCOPED` prompt content + the polyrepo single-segment case + the legacy/partial-marker fallback (#503)
- Lane-runner skips wasted iteration when all segment checkboxes are already complete (#508)
- All existing tests still pass; new behavioral tests cover the four fixes

## Dependencies

**None** — all referenced predecessor work is merged. The following are informational cross-references:

- TP-081 / TP-133 / TP-134 / TP-135 (multi-repo task execution Phase B-E, shipped): the foundational segment infrastructure these guards harden.
- TP-145 (already shipped): the four-layer `.DONE` defense that #462 builds atop.
- TP-501 (already shipped, predecessor for #502/#503): the SegmentScopeMode prompt-injection fix that #502 unifies and #503 regression-tests.
- TP-194 (gates flip, shipped v0.30.0): means `typecheck` / `lint` / `format:check` are now hard gates — any new code in this task must keep them green.

## Context to Read First

> Only list docs the worker actually needs. Less is better.

**Tier 2 (area context):**
- `taskplane-tasks/CONTEXT.md`

**Tier 3 (load only if needed):**
- `docs/specifications/taskplane/multi-repo-task-execution.md` — operative spec for the segment subsystem (the broader context for all 4 issues)
- Each issue's body, fetched via `gh issue view <num>`:
- `gh issue view 462`
- `gh issue view 502`
- `gh issue view 503`
- `gh issue view 508`
- `extensions/taskplane/lane-runner.ts` — segment-scope-mode computation; iteration loop (pre-spawn check site for #508)
- `extensions/taskplane/execution.ts` — `resolveTaskMonitorState` (monitor guard for #462); segment env var + tool registration (#502)
- `extensions/taskplane/resume.ts` — `collectDoneTaskIdsForResume` / reconciliation (resume guard for #462)
- `extensions/taskplane/discovery.ts` — `.DONE` skip logic in segmented contexts (discovery safeguard for #462)
- `extensions/tests/segment-scoped-lane-runner.test.ts` — existing segment scope test file (add #503 assertions here)
- `extensions/tests/lane-runner-v2.test.ts` — update existing segment contracts if changed by #502 work

## Environment

- **Workspace:** `extensions/` (engine + tests)
- **Services required:** None

## File Scope

> The orchestrator uses this to avoid merge conflicts. Worker hydrates Step 2-5
> with the specific files touched per issue based on the plan in Step 1.

- `extensions/taskplane/lane-runner.ts` (segment-scope computation, iteration loop)
- `extensions/taskplane/execution.ts` (monitor guard, env var, tool registration)
- `extensions/taskplane/resume.ts` (resume guard)
- `extensions/taskplane/discovery.ts` (discovery safeguard)
- `extensions/taskplane/types.ts` (if `SegmentScopeMode` enum needs to be promoted to a first-class type — likely)
- `extensions/tests/segment-scoped-lane-runner.test.ts` (#503 assertions)
- `extensions/tests/lane-runner-v2.test.ts` (segment contract updates)
- New test files as needed (e.g., `extensions/tests/done-authority-multi-segment.test.ts` for #462 edge cases)
- `CHANGELOG.md` — `[Unreleased]` entry under `Fixed` (or `Internal` if framed as hardening)

## Steps

> **Hydration:** STATUS.md tracks outcomes per-issue. Worker expands Steps 2-5
> with concrete checkboxes after Step 1 plan-review APPROVE.

### Step 0: Preflight

- [ ] On `main` (lane worktree, fresh from v0.30.0 release)
- [ ] All four gates pass on baseline: `npm run typecheck` exit 0, `npm run lint` exit 0, `npm run format:check` exit 0, `npm run test:fast` 3627+
- [ ] All four issue bodies read: #462, #502, #503, #508
- [ ] Tier 3 context files read
- [ ] Live grep verification: confirm `stepSegmentMap && currentRepoId && repoStepNumbers` is still the condition pattern referenced in #502 (or document the post-TP-194 equivalent)
- [ ] Decision: introduce a `SegmentScopeMode` enum/type in `types.ts` (vs. inline string union)? Recommendation in Discoveries.

### Step 1: Plan all four fixes

> ⚠️ Plan-review checkpoint. Reviewer evaluates architectural cohesion across the 4 issues.

- [ ] #462 design: monitor guard (suppress `.DONE` as success signal for known non-final active segments), resume guard (don't accept `.DONE` for incomplete frontier), discovery safeguard (sanity check or doctor warning). Document each guard's exact check + the "fail-loud vs auto-recover" stance per guard.
- [ ] #502 design: `SegmentScopeMode` promotion to a first-class type; gate env var, tool registration, and execution branches on it. List every site that currently checks `stepSegmentMap && currentRepoId` and the unified-condition replacement.
- [ ] #503 design: test file structure (extend existing `segment-scoped-lane-runner.test.ts` vs new dedicated file). Per-case checklist matches the 4 scenarios in the issue body.
- [ ] #508 design: pre-spawn segment-completion check site in `lane-runner.ts` iteration loop. Document exit-condition semantics (skip to segment-completion handling vs. break to next-task).
- [ ] Cross-issue coordination: any interaction between #462's monitor guard and #508's pre-spawn check? Document.
- [ ] Drafts in Discoveries.

### Step 2: Implement #502 first (foundational refactor)

> ⚠️ Code-review fires after this step.

> Rationale: promoting `SegmentScopeMode` to a first-class type creates the
> authoritative flag that #462 and #508 can also reference. Doing this first
> avoids retrofitting after the other work lands.

- [ ] `SegmentScopeMode` promoted (likely as enum in `types.ts`)
- [ ] `lane-runner.ts` computes it once + threads via lane config
- [ ] `execution.ts` `TASKPLANE_ACTIVE_SEGMENT_ID` env var gated on it
- [ ] `execution.ts` `request_segment_expansion` tool registration gated on it
- [ ] Scattered `stepSegmentMap && currentRepoId` checks replaced with single-flag reference
- [ ] Targeted tests pass; full fast suite passes

### Step 3: Implement #462 guards

> ⚠️ Code-review fires after this step.

- [ ] Monitor guard in `resolveTaskMonitorState` (`execution.ts`)
- [ ] Resume guard in `collectDoneTaskIdsForResume` (`resume.ts`)
- [ ] Discovery safeguard in `discovery.ts` (sanity check or doctor warning per plan decision)
- [ ] 3-4 behavioral tests covering: non-final unlink failure, transient `.DONE` monitor race, resume with `.DONE` + incomplete frontier
- [ ] Full fast suite passes

### Step 4: Implement #508 early-exit optimization

> ⚠️ Code-review fires after this step.

- [ ] Pre-spawn segment-completion check in `lane-runner.ts` iteration loop
- [ ] Exit-condition wiring per plan decision (skip to segment-completion handling)
- [ ] Behavioral test asserting wasted iteration is skipped when all segment checkboxes are pre-complete
- [ ] Full fast suite passes

### Step 5: Implement #503 prompt-injection regression tests

> ⚠️ Code-review fires after this step.

- [ ] `FULL_TASK` prompt assertions: includes `SegmentScopeMode: FULL_TASK`, NOT `Active segment ID`, NOT segment-scoped checkbox block
- [ ] `SEGMENT_SCOPED` prompt assertions: includes `SegmentScopeMode: SEGMENT_SCOPED`, `Active segment ID`, segment-scoped checkbox block, "Other segments in this step (NOT yours)"
- [ ] Polyrepo single-segment regression: worker proceeds beyond Step 0 (does not exit after one step)
- [ ] Legacy/partial-marker case: fallback behavior does not silently one-step scope
- [ ] Tests pass in isolation + full fast suite

### Step 6: Testing & Verification

> ZERO test failures allowed. ALL FOUR GATES must remain green (post-TP-194: typecheck, lint, format:check, tests).

- [ ] `npm run typecheck` exits 0
- [ ] `npm run lint` exits 0
- [ ] `npm run format:check` exits 0
- [ ] `npm run test:fast` passes (target: 3627+ baseline + new tests from this task; record final count)
- [ ] Full integration suite passes
- [ ] CLI smoke clean

### Step 7: Documentation & Delivery

- [ ] CHANGELOG entry under `[Unreleased]` → `Fixed` (or `Internal` if framed as hardening):
- Title: `**Multi-segment engine hardening (TP-196, #462 + #502 + #503 + #508)**`
- Body: 2-3 paragraph summary covering: (1) `.DONE` authority guards, (2) SegmentScopeMode unification + regression tests, (3) wasted-iteration elimination, (4) validation (tests + gates green)
- [ ] Discoveries logged: per-issue final fix summary; any latent bugs uncovered during hardening
- [ ] Step boundaries committed with `feat(TP-196): ...` / `fix(TP-196): ...` / `test(TP-196): ...` prefixes
- [ ] Issue-close comments drafted in Discoveries for #462, #502, #503, #508 — to be posted by operator after PR merges

## Documentation Requirements

**Must Update:**
- `CHANGELOG.md` — Fixed/Internal entry per Step 7

**Check If Affected:**
- `docs/specifications/taskplane/multi-repo-task-execution.md` — if any of these fixes change a contract documented there, update; otherwise leave alone

## Completion Criteria

- [ ] All four issues' acceptance criteria met (per their issue bodies)
- [ ] All four CI gates pass (`typecheck`, `lint`, `format:check`, `test:fast`)
- [ ] Per-step plan + code reviews APPROVE'd
- [ ] CHANGELOG entry added
- [ ] Issue-close comment drafts ready for operator

## Git Commit Convention

Commits happen at **step boundaries** AND at issue boundaries within combined steps. All commits MUST include the task ID:

- **Step completion:** `chore(TP-196): complete Step N — description`
- **Per-issue fix:** `fix(TP-196, #<issue>): description`
- **Test addition:** `test(TP-196, #<issue>): description`

## Do NOT

- **Don't split into separate PRs unless plan-review reveals a clear architectural split.** The 4 issues are bundled deliberately because they share files and the segment-engine mental model.
- **Don't break the post-TP-194 hard gates.** Any change must keep `typecheck` / `lint` / `format:check` all exit 0. The reviewer agent now downgrades APPROVE → REVISE on any failing gate, so plan accordingly.
- **Don't change behavior beyond what each issue specifies.** Hardening = guards + drift prevention + tests, not new feature work.
- **Don't address the dashboard segment-progress visibility issue (#464)** — that's TP-197's scope, separate file domain (`dashboard/public/`).
- **Don't load docs not listed in "Context to Read First."**
- **Don't commit without the `TP-196` prefix.**

---

## Amendments (Added During Execution)

<!-- Workers add amendments here if issues discovered during execution.
Format:
### Amendment N — YYYY-MM-DD HH:MM
**Issue:** [what was wrong]
**Resolution:** [what was changed] -->
164 changes: 164 additions & 0 deletions taskplane-tasks/TP-196-multi-segment-engine-hardening/STATUS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# TP-196: Multi-segment engine hardening — Status

**Current Step:** Not Started
**Status:** 🔵 Ready for Execution
**Last Updated:** 2026-05-10
**Review Level:** 2
**Review Counter:** 0
**Iteration:** 0
**Size:** M

> **Hydration:** Worker expands Steps 2-5 with concrete per-file checkboxes
> after Step 1 plan-review APPROVE. Each step maps to one of the 4 absorbed
> issues (#462, #502, #503, #508).

> **⚠️ Post-TP-194 hard-gate environment.** All four code-quality gates
> (typecheck, lint, format:check, tests) are now required at PR time. The
> reviewer agent downgrades APPROVE → REVISE on any failure. Plan accordingly.

---

### Step 0: Preflight
**Status:** ⬜ Not Started

- [ ] On `main` (fresh from v0.30.0)
- [ ] All four gates pass on baseline (typecheck 0, lint 0, format:check 0, tests 3627+)
- [ ] All four issue bodies read: #462, #502, #503, #508
- [ ] Tier 3 context files read (lane-runner.ts segment scope, execution.ts monitor + tool registration, resume.ts reconciliation, discovery.ts skip logic, segment-scoped-lane-runner test file)
- [ ] Live grep verification of `#502` condition pattern
- [ ] Decision: SegmentScopeMode promotion to first-class enum/type (recommendation in Discoveries)

---

### Step 1: Plan all four fixes
**Status:** ⬜ Not Started

> ⚠️ Plan-review checkpoint.

- [ ] #462 design (3 guards + edge-case tests)
- [ ] #502 design (SegmentScopeMode promotion + gate sites)
- [ ] #503 design (test file structure + 4 scenarios)
- [ ] #508 design (pre-spawn check site + exit-condition semantics)
- [ ] Cross-issue coordination documented
- [ ] Drafts in Discoveries

---

### Step 2: Implement #502 first (foundational refactor)
**Status:** ⬜ Not Started

> ⚠️ Code-review fires after this step.

- [ ] `SegmentScopeMode` promoted to first-class type
- [ ] `lane-runner.ts` threads via lane config
- [ ] `execution.ts` env var + tool registration gated
- [ ] Scattered `stepSegmentMap && currentRepoId` checks unified
- [ ] Targeted + full fast suite pass

---

### Step 3: Implement #462 guards
**Status:** ⬜ Not Started

> ⚠️ Code-review fires after this step.

- [ ] Monitor guard in `resolveTaskMonitorState`
- [ ] Resume guard in `collectDoneTaskIdsForResume`
- [ ] Discovery safeguard
- [ ] 3-4 behavioral tests for edge cases
- [ ] Full fast suite passes

---

### Step 4: Implement #508 early-exit optimization
**Status:** ⬜ Not Started

> ⚠️ Code-review fires after this step.

- [ ] Pre-spawn segment-completion check
- [ ] Exit-condition wiring
- [ ] Behavioral test asserting wasted iteration skipped
- [ ] Full fast suite passes

---

### Step 5: Implement #503 prompt-injection regression tests
**Status:** ⬜ Not Started

> ⚠️ Code-review fires after this step.

- [ ] FULL_TASK assertions
- [ ] SEGMENT_SCOPED assertions
- [ ] Polyrepo single-segment regression
- [ ] Legacy/partial-marker fallback case
- [ ] Tests pass in isolation + full suite

---

### Step 6: Testing & Verification
**Status:** ⬜ Not Started

> ZERO test failures allowed. ALL FOUR GATES green.

- [ ] `npm run typecheck` exit 0
- [ ] `npm run lint` exit 0
- [ ] `npm run format:check` exit 0
- [ ] `npm run test:fast` passes (target: 3627+ + new tests; record final count)
- [ ] Full integration suite passes
- [ ] CLI smoke clean

---

### Step 7: Documentation & Delivery
**Status:** ⬜ Not Started

- [ ] CHANGELOG entry under [Unreleased] → Fixed (or Internal)
- [ ] Discoveries logged: per-issue final fix summary
- [ ] Issue-close comment drafts for #462, #502, #503, #508 in Discoveries
- [ ] All commits include `TP-196` prefix

---

## Reviews

| # | Type | Step | Verdict | File |
|---|------|------|---------|------|

---

## Discoveries

| Discovery | Disposition | Location |
|-----------|-------------|----------|

---

## Execution Log

| Timestamp | Action | Outcome |
|-----------|--------|---------|
| 2026-05-10 | Task staged | PROMPT.md and STATUS.md created (bundles #462/#502/#503/#508) |

---

## Blockers

*None*

---

## Notes

**Why bundle 4 issues into one task:**

All 4 touch overlapping files (`lane-runner.ts`, `execution.ts`, `resume.ts`, `discovery.ts`, `segment-scoped-lane-runner.test.ts`). The segment-engine mental model is consistent across all of them — `.DONE` authority guards (#462), scope-mode unification (#502), regression tests for scope mode (#503), and early-exit optimization (#508). Bundling lets the worker reuse the context once and ship a coherent hardening pass.

If plan-review reveals a clear architectural split during Step 1, splitting is allowed but should be explicit (and the spec should document why).

**Sequencing within the task:**

#502 is implemented FIRST because it promotes `SegmentScopeMode` to a first-class type that #462 and #508 can also reference. Implementing it first avoids retrofitting the others. #503 (tests for #502) is the last implementation step — gives the most stable surface to write assertions against.

**Hard-gate compliance:**

Post-TP-194, the reviewer agent downgrades APPROVE → REVISE on any failing `typecheck` / `lint` / `format:check`. This is the first task to run entirely under hard gates; the worker should expect that gate failures will be surfaced in code reviews and cannot be ignored. Plan accordingly: don't break gates anywhere mid-step.
Loading