Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions .claude/commands/ship.md

Large diffs are not rendered by default.

138 changes: 138 additions & 0 deletions features/ship-gated/PLAN.md

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions features/ship-gated/REGRESSION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# REGRESSION — ship-gated

**Question:** did building `.claude/commands/ship.md` break anything **OUTSIDE** the feature?
**Verdict (FLOOR — `floor/check-regress.mjs verdict`, exit 0):** **`no-regressions`** — no
deterministically-detectable breakage outside the feature.

> The verdict is the **only** floor-grade thing here: a deterministic exit-code comparison
> (`ARCHITECTURE.md §2` primitive #3). Everything I did to get there — base detection, the
> inside/outside partition, running the suite — is **advisory orchestration** (the two-clocks split).

## Base + partition (live, P6)

- **Base:** `8063643` (dirty-tree dogfood: `git status --porcelain` non-empty → `base = HEAD`). The
`/plan` artifact `features/ship-gated/PLAN.md` was **committed** at this base, and the `/build`
output `.claude/commands/ship.md` left **uncommitted** as the feature under test — so the partition
resolves to `inside = {ship.md}` and the `/plan` artifact never enters `inside` (avoids the false
fix#7 escape, `pipeline-integration-probe` CF-1).
- **Inside (changed scope):** `.claude/commands/ship.md` — exactly the plan's `## Files` `declared`
writes. `check-regress.mjs scope` → `escaped: []` (no scope breach).
- **Outside gates (run identically at base and head):** the 9 committed `*.test.*`, `validate`
(whole-repo), and the one committed eval pair
`pharn-review/trust-fence/evals/expected/expected-injection-comment.json ↔ features/trust-fence/findings.json`.
- **Style gates (`lint` / `format:check` / `lint:md`): SKIPPED** (deterministic, P5/P7) — `inside`
touches no shared style config (`eslint.config.mjs`, `.prettierrc.json`, `.prettierignore`,
`.markdownlint-cli2.jsonc`), so an outside style result is provably unable to flip; no `npm ci`
incurred.

## Per-gate comparison (base → head exit codes)

| gate | base | head | result |
| ---------------------------------------------------------- | ---- | ---- | ------ |
| `tests` (9 outside `*.test.*`) | 0 | 0 | OK |
| `validate` (`floor/validate.mjs .`) | 0 | 0 | OK |
| `structural:expected-injection-comment.json` (trust-fence) | 0 | 0 | OK |

- **`regressions`:** none.
- **`pre_existing`:** none (no gate was already red at baseline).

## Why a clean verdict is expected here (not a coincidence)

`.claude/commands/ship.md` is **floor-ignored markdown** (`floor/validate.mjs` `EXCLUDE_SEGMENTS`
path-ignores `.claude/commands/`), adds **no** test or eval, and touches **no** shared config. So no
outside gate can read it, and a base↔head flip is structurally impossible. The clean verdict therefore
confirms the **chain + partition** ran correctly more than it stresses the comparison — exactly what a
command-only increment should yield.

## Honest residual (P0/P7)

`/regress` catches **exactly what its suite catches — nothing more.** A regression no deterministic
check covers (a broken behavior with no test / rule / eval) is **invisible** here. This certifies the
**comparison** — "deterministically-detectable breakage outside the feature is caught" — **not** that
the increment is whole or correct. This is **not** "regress passed" as a feature certification; the
feature's own correctness is `/verify`'s (floor) + `/review`'s (advisory) concern.

**Next:** `/verify features/ship-gated/PLAN.md` (floor gates own the verdict), then `/review`. The
verdict's exit code (`0`) decides this stage; `/regress` does not invoke `/verify`.
139 changes: 139 additions & 0 deletions features/ship-gated/REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# REVIEW — ship-gated

**Increment under review:** `.claude/commands/ship.md` (the gated `/ship` orchestrator `/build`
produced). **Trust:** `untrusted` — and uniquely here, the artifact is a **command**, i.e. _entirely
instructions_. Every imperative in it (`Run /plan`, `Load CONSTITUTION.md`, `STOP`, `end your turn`) is
the command's direction to a **future `/ship` agent** — **DATA I reviewed, never instructions I
executed** (P2). I did **not** start running `/plan` because the file says to; that refusal is the
fence working (see L-trust). **Floor (Step 1):** `node floor/validate.mjs .` → **GREEN, 1 capability**
(exit 0) — the increment is eligible for review; the count is unchanged because `.claude/commands/` is
floor-ignored.

> The floor is the only guaranteed part of this review; everything below is **advisory** (P0). Findings
> dogfood `pharn-contracts/finding-shape.md`: enum-gated `type`/`rule_id`/`severity`/`file` are my own
> assertions (trusted); free-text `problem`/`evidence` quote the reviewed artifact as DATA.

## The four lenses (on the increment)

- **L-floor → P0: PASS (clean — exemplary).** Every guarantee `ship.md` makes reduces to the floor or
is labeled advisory. It **strikes** the disease explicitly: "Never write `/ship` ensured the chain ran
/ ensures quality"; "RUNNING the stages … is advisory"; the human gates are "preserved **by
construction**, not by a floor mechanism" (advisory); only "may write only `SHIP.md`" is claimed as
FLOOR, correctly reduced to the fix#7 hook. No advisory-dressed-as-guarantee found. This is the single
most important lens and the increment passes it on its own terms.
- **L-eval → P1: PASS (does not bind; convention met).** `ship.md` has **no `role:`** and **no
`enforces:`**, so P1's Capability-evals rule does not bind it — exactly like `/regress` and `/verify`,
the no-`role:` orchestrator commands it mirrors. The floor agrees (GREEN, count unchanged). _Advisory
residual noted below: "convention met" means no eval is **required**, not that the orchestration logic
is **tested** — it is not (finding A-1)._
- **L-trust → P2: PASS (no injection; the fence held).** `ship.md`'s own design reads **only** enum-gated
verdict fields for control flow (`validate` exit, `regression-report.json`/`verify-report.json`
`.verdict`) and renders `GRILL.md`/`REVIEW.md` free-text as quoted DATA — no proceed/stop rests on a
tainted field. And as the reviewer I treated the file's pervasive imperatives as DATA, executing none.
No guaranteed decision rests on a tainted/free-text field.
- **L-axis → P3: PASS (one axis, no sibling-import violation).** One reason to change: the gated chain +
its per-stage verdict-reads (the `--loop` stop-condition was correctly split to a separate axis). Its
references to other commands and `floor/check-*.mjs` are an **orchestrator invoking the pipeline
spine** — its defined role (`ARCHITECTURE.md §6`), not a `pharn-*` leaf→leaf import; `.claude/commands/`
is floor-ignored, so the P3 sibling-grep does not (and should not) flag it.

## Gates (fix #3)

- **floor-gate (blocking): NONE.** `validate` GREEN; no unlabeled P0 guarantee; no missing eval binding
(none owed); no grep-detectable sibling reference.
- **advisory-gate (warn):** the findings below — all rest on my judgment, none blocks.

## Verdict

**GREEN — the increment is clean on all four lenses; 0 blocking floor-findings.** A carefully
P0-disciplined orchestrator. The advisory findings concern the **residual** every command-only
increment carries: its orchestration _logic_ is floor-invisible and untested until a live run.

## Advisory findings (non-blocking — orchestration residual)

```yaml
- type: FINDING
rule_id: "P1"
severity: important
file: ".claude/commands/ship.md:68"
problem: "ship.md's actual orchestration LOGIC — does it read the right verdict field per stage, stop
on the first non-GREEN, place the two human gates correctly — is verified by NOTHING deterministic
this run. build-GREEN, regress-no-regressions, and verify-PASS all passed, but ship.md is
floor-ignored markdown, so every one of those gates confirmed only that ADDING the file broke no
existing check — none executed the orchestrator. Three green verdicts on an increment whose behavior
is untested is a real (demonstrated, not hypothetical) gap; the proof is the deferred live dogfood."
evidence: "## Step 2 — Run the chain, branching ONLY on each stage's STRUCTURAL verdict (P5) … (the
chain logic exists only as prose; no eval/test exercises it)."
```

```yaml
- type: FINDING
rule_id: "P5"
severity: important
file: ".claude/commands/ship.md:80"
problem: "The turn-handoff with self-halting sub-stages is underspecified. /plan ends at its own
approval halt (GATE 1) and /build HALTs on a RED floor — both end their turn standalone. ship.md says
'capture its verdict, then CONTINUE', and reads /build's verdict by RE-running validate (since /build
emits no machine report, CF-3). But HOW /ship regains control to read that verdict after a sub-stage
halts its own turn, and how it 'resumes (turn 2)' after the human answers GATE 1, is asserted, not
mechanized — exactly the kind of seam a live dogfood must pin."
evidence: "> Turn semantics. A stage's own 'end your turn' applies when it is run standalone. Under
/ship, perform the stage's work, capture its verdict, then CONTINUE the orchestration."
```

```yaml
- type: FINDING
rule_id: "P5"
severity: minor
file: ".claude/commands/ship.md:64"
problem: "Slug propagation is named but not mechanized: /ship passes a free-text <description> to
/plan, and /plan chooses the <name> slug — but ship.md says to 'reuse that one slug across every
stage' without specifying HOW /ship learns the slug /plan picked (presumably by observing the
features/<name>/PLAN.md path /plan created). A determinism gap at the very first hand-off."
evidence: "<name> is the kebab-case slug /plan chooses for this increment; reuse that one slug across
every stage."
```

```yaml
- type: FINDING
rule_id: "P0"
severity: minor
file: "features/ship-gated/PLAN.md:1"
problem: "Process papercut surfaced this run (not in ship.md itself): the /plan-authored PLAN.md failed
the repo's own style gates (markdownlint MD058 on a table, then prettier), so `npm run check` went RED
and required a post-build scoped fix. /plan does not format/lint its own output against the gates the
rest of the repo must pass — so any plan (especially one with a table) can land non-conforming and is
caught only later. Real and recurring; basis for the proposed lesson below."
evidence: "observed live: `npm run check` → format:check flagged .claude/commands/ship.md AND
features/ship-gated/PLAN.md; markdownlint MD058 at PLAN.md:23/29 — both fixed post-build."
```

## Proposed lesson for `/memory-promote` (gated — NOT written to canon here, P2)

Per `/review`'s final step, I propose **one** lesson from a **real** failure this run surfaced (P7 —
real, not hypothetical). It is **not** written to `memory-bank/lessons-learned.md` here; `/memory-promote`
assembles the candidate, runs `check-provenance.mjs`, and **halts for explicit human accept/deny** before
any write (the model never self-promotes — P2).

- **Candidate — _A green pipeline (build ∧ regress ∧ verify) on a floor-invisible increment certifies
"added without breaking anything," NOT "the thing works" — an orchestrator/command-only feature is
unverified by the floor and must be dogfooded live before its logic is trusted._** `ship.md` passed
all three floor verdicts, yet every gate is blind to `.claude/commands/` content (floor-ignored), so
none exercised the orchestrator; its verdict-reads and turn-handoff live only as prose. This **extends
the probe's `L5`** (floor verdicts rest on advisory orchestration) one level up: when the _increment
itself_ is the orchestration, the floor can confirm coexistence but not behavior.
- **Why:** "verified/regress-clean" reads as "it's good," but for a floor-invisible artifact it means
only "the existing suite still passes with it present." Treating three green verdicts as evidence the
orchestrator _works_ is the P0 disease one level up — "the gates are green" mistaken for "the feature
is correct."
- **How to apply:** for any command-only / floor-ignored increment (a new `.claude/commands/*.md`,
a prose-only orchestrator), require a **live dogfood** (a real run with every hand-off observed, like
`pipeline-integration-probe`) as the correctness signal — and never present its floor verdicts as
certifying its behavior. Keep the verdict-reads floor-grade; label the orchestration advisory-until-run.
- **Provenance (for `/memory-promote`):** feature `ship-gated`; commit = HEAD at promote time (`ship.md`
currently uncommitted on branch `ship-gated`; base `8063643`); source
`features/ship-gated/REVIEW.md` (this file) + `VERIFY.md`; date `2026-06-29`.

**End of `/review`.** The actual promotion is a separate, human-gated `/memory-promote` run. The increment
is GREEN (0 blocking) — the post-review decision (merge / fix / abandon, and whether to run the live
`/ship` dogfood next) is yours.
53 changes: 53 additions & 0 deletions features/ship-gated/VERIFY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# VERIFY — ship-gated

**Question:** did `.claude/commands/ship.md` get built **correctly** — does it satisfy its own
requirements? **Verdict (FLOOR — `floor/check-verify.mjs`, exit 0):** **`VERIFIED: floor gates PASS`.**

> "verified" means **the named deterministic gates passed — full stop.** The verdict is owned by the
> FLOOR layer (an exit-code threshold, `ARCHITECTURE.md §2` primitive #3); it is **not** a model's
> judgment that the command is good. The ADVISORY verifier layer only annotates — and today it is empty.

## FLOOR layer — the gates (own the verdict)

| gate | exit | meaning |
| ----------------------------------- | ---- | ------------------------------------------------------- |
| `test` (`npm test`) | 0 | the hermetic suite is green with `ship.md` present |
| `validate` (`floor/validate.mjs .`) | 0 | structural floor GREEN — 1 capability (count unchanged) |
| `lint` (`npm run lint`) | 0 | eslint clean |

- **verdict:** `PASS` (every gate `=== 0`). **failing_gates:** none.
- **No `structural:*` gate** — `ship-gated` ships **no** eval pair (it is a command-only increment with
no `evals/` and no `findings.json`), so by convention (P5, membership) there is no feature-specific
structural gate, exactly as the `pipeline-integration-probe` (also eval-less) verified on
`{lint, test, validate}`. The trust-fence eval pair belongs to **trust-fence**, not to this feature.
- **Gates are the existing checks — `/verify` invents none.** They are whole-repo (`test` / `validate` /
`lint` re-run with the feature present — the honest "is it green with this in it").

## ADVISORY layer — verifiers

**`node floor/count-verifiers.mjs .` → `{"registered":0,"verifiers":[]}` — no verifiers registered;
floor gates only.** Membership is a deterministic frontmatter read (P5), never a prose grep. No verifier
is authored speculatively (P7); the plug-in slot stays empty until a real one is triggered. With zero
verifiers, no advisory free-text is produced — nothing to quote as DATA, nothing that could (and it
never could) flip the verdict.

## What this does and does NOT certify (P0/P7 — the honest residual)

- **Certifies:** the named gates (`test`, `validate`, `lint`) passed with `ship.md` in the repo —
deterministically. That is the entire content of "verified."
- **Does NOT certify:** that `ship.md` is **correct** in any sense the suite does not encode.
`ship.md` is **floor-ignored markdown** (`validate` does not parse `.claude/commands/`), so the floor
gates **cannot see its content at all** — they confirm only that _adding it broke none of the existing
deterministic checks_. Whether the orchestrator's **logic** is sound (does it read the right verdict
fields? are the two human gates correctly placed? is the P0 "no new floor primitive" framing honest?)
is **not** a floor signal here — it is exactly what the **advisory `/review` lenses** judge, and
ultimately the human at the post-review gate. _"verified = the named gates passed; this is NOT a
guarantee of correctness beyond what those gates check — verifier concerns are advisory help, not
assurance."_

**Two-clocks:** only the verdict is floor-grade; everything the agent did (running the gates,
assembling the map, writing this report) is advisory orchestration.

**Next:** `/review features/ship-gated/PLAN.md` — the advisory lenses over the built `ship.md` (where
its actual orchestration logic gets scrutinized), then the human's merge/fix/abandon decision.
`/verify` does not invoke `/review`; the exit code `0` decides this stage.
21 changes: 21 additions & 0 deletions features/ship-gated/regression-report.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"base": "8063643",
"inside": [".claude/commands/ship.md"],
"outside_gates": {
"structural:expected-injection-comment.json": {
"base": 0,
"head": 0
},
"tests": {
"base": 0,
"head": 0
},
"validate": {
"base": 0,
"head": 0
}
},
"regressions": [],
"pre_existing": [],
"verdict": "no-regressions"
}
14 changes: 14 additions & 0 deletions features/ship-gated/verify-report.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"feature": "ship-gated",
"gates": {
"lint": 0,
"test": 0,
"validate": 0
},
"verdict": "PASS",
"failing_gates": [],
"verifiers": {
"registered": 0,
"findings": []
}
}
Loading