Skip to content

feat(knowledge): Phase 8 — release process#257

Open
leeovery wants to merge 48 commits intofeat/knowledge-base-phase-7from
feat/knowledge-base-phase-8
Open

feat(knowledge): Phase 8 — release process#257
leeovery wants to merge 48 commits intofeat/knowledge-base-phase-7from
feat/knowledge-base-phase-8

Conversation

@leeovery
Copy link
Copy Markdown
Owner

Summary

Formalises the release pipeline so every tagged release ships a fresh knowledge bundle and runs the full knowledge-base test suite in CI.

  • ./release rebuilds skills/workflow-knowledge/scripts/knowledge.cjs between the dirty-tree gate and the tag. npm install + npm run build run in all version strategies (including none); a bundle commit lands only when the rebuild produces a diff. Build failure aborts before any tag is created. Dry-run path is unchanged — no npm invocation in preview.
  • .github/workflows/release.yml gains npm install (dev deps for unit tests) plus explicit test-workflow-manifest.sh, globbed test-knowledge-*.cjs, and globbed test-knowledge-*.sh runs. Existing migration and discovery tests continue to run. CI does not rebuild the bundle — the committed artifact is authoritative.
  • tests/scripts/test-release-build.sh (new) exercises the release-script build block via a throwaway temp repo with npm/git tag/git push stubs. 23/23 tests green locally.

Stacking

Base: feat/knowledge-base-phase-7. Do not merge to main until the knowledge-base rollout reaches go-live. The 4-PR stack (#254#255#256 → this) stays intact.

Test plan

  • bash tests/scripts/test-release-build.sh — 23 passed
  • node --test for test-knowledge-{embeddings,store,chunker,config,retry,openai,integration}.cjs — all pass
  • test-knowledge-openai-integration.cjs — skips without OPENAI_API_KEY (expected)
  • ./release -d --no-ai dry-run — npm install / build not invoked, preview output unchanged
  • Tag-push validation — happens after merge via the updated CI workflow
  • CI validates committed bundle without rebuilding — verified at first tag push

Notes: migration tests fail locally only due to a sandbox /tmp write restriction (pre-existing, CI-clean). test-knowledge-cli.sh Test 11+ and test-knowledge-build.sh bundle-size threshold are pre-existing on phase-7 — not caused by this PR.

leeovery added 30 commits April 20, 2026 19:59
Inserts npm install + npm run build + conditional bundle commit into
./release between the dirty-tree gate and version-file update. Runs in
all VERSION_STRATEGY branches, including "none". Build failure aborts
before tagging so no stale bundle ships. Dry-run path unaffected.

Refs tick-7f4725.
Adds npm install before the test step (unit tests need dev deps from
node_modules). Extends the test job with explicit test-workflow-manifest.sh
plus globbed test-knowledge-*.cjs and test-knowledge-*.sh suites.

Does not run npm run build — the committed bundle is authoritative; the
release script rebuilds and commits it pre-tag.

Refs tick-de9b1a.
npm install mutates package-lock.json on any resolvable version change,
bypassing the dirty-tree gate and leaving uncommitted state after the
tag is pushed. npm ci refuses to modify the lockfile and errors if drift
exists — the correct primitive for a release flow that has already
gated on a clean working tree.
Triggered on tag push, so tests can't gate a release — the tag is
already live by the time the job starts. AGNTC installs from git tags
directly, so the GitHub Release page the workflow produces is never
consumed. Both jobs were cost with no return. Phase 8 Task 8-2 was
layering more tests onto this dead pipeline; reverting that along with
the pre-existing cruft.
…nits)

cmdList previously used the project manifest registry when populated,
falling back to a filesystem scan only when the registry was empty.
Work units created before the registry existed are therefore invisible
to 'manifest list' — and to every knowledge-base consumer (bulk index,
status, compact). Real-data test: 9 dirs on disk, 1 registered, only
2 files indexed.

Union both sources so downstream tools see every unit that actually
exists. Deduped via Set. Dot-prefix and malformed-manifest handling
unchanged.

Closes deferred-issues #17.
cmdIndex silently stayed in keyword-only mode when the user configured
a provider after an initial stub-mode index — only 'query' and 'status'
warned. On the first index call after the upgrade, emit a one-shot
note to stderr telling the user to run 'knowledge rebuild' to switch to
vector search.

Deduplicated via a module-level flag so bulk indexing of N files does
not spam N identical warnings.

Also fix test isolation: the CLI test suite leaked the developer's real
~/.config/workflows/config.json into tempdir-based tests, breaking
keyword-only scenarios (Test 11 onward) on any machine with a
real knowledge-base setup. Pin HOME/XDG_CONFIG_HOME to a tempdir.

Closes deferred-issues #11.
knowledge remove previously had no retry — a locked store or transient
I/O failure left stale chunks in the index forever and told the user to
retry manually. Indexing already has a pending queue; removals now
mirror it.

Changes:
- metadata.pending_removals tracks failed removals with workUnit/phase/
  topic, attempts counter (cap 10, then evict with stderr warning).
- cmdRemove drains the queue first, attempts the new removal, queues on
  failure.
- cmdCompact drains the queue on entry.
- status surfaces pending removals alongside pending indexes.
- Skill callsites (cancellation, supersession, promotion) updated to
  describe auto-retry instead of 'run knowledge remove manually later'.

Closes deferred-issues #18.
…missing path

Two related silent-failure modes rolled into one commit:

MANIFEST_JS resolution (deferred #5): when neither candidate path
existed, the const silently held a non-existent path. First manifest
call ENOENT'd inside a silent try/catch and bulk operations returned
empty. Now throws at module load with both candidate paths — an
installation problem is surfaced immediately, not disguised as 'no
artifacts to index'.

Error swallowing (deferred #4): discoverArtifacts, getWorkUnitMeta, and
the status unindexed-artifact block caught every execFileSync failure
identically. 'Work unit not found' is expected (orphans), but corrupt
JSON / bad work-unit name / missing manifest file were indistinguishable
and silently skipped. Added isManifestKeyNotFound() to recognise
expected misses, reportUnexpectedManifestError() to write everything
else to stderr with the CLI's own message. Behaviour unchanged for the
expected-miss path; real breakage is now visible.

Closes deferred-issues #4, #5.
cmdRebuild previously deleted store.msp + metadata.json before running
bulk index. If bulk threw (network outage, OpenAI down, Ctrl-C), the
user was left with no store and no metadata — a transient failure
turned into total data loss.

Rename to .bak before the wipe, run bulk index, delete .bak on success,
restore .bak on failure. Leftover .bak from prior aborts is cleaned at
rebuild entry. If rollback itself fails, stderr prints both errors and
the .bak paths so the user can recover manually.

Closes deferred-issues #3.
Items in metadata.pending that fail every catch-up retry stayed in the
queue forever — a permanent failure (renamed work unit, malformed file)
would cost 3 OpenAI calls per item per bulk-index run, indefinitely.

Each entry now carries an attempts counter. processPendingQueue checks
the counter at entry and evicts with a stderr warning once attempts
reach PENDING_MAX_ATTEMPTS (10). The catch block that previously
swallowed the retry failure now writes the failure back to the queue,
bumping the counter so eviction eventually fires.

status output updated to show attempt/max so users can see an item on
the verge of eviction.

Closes deferred-issues #2.
Design doc specifies a 30s lock timeout; implementation had 10s — minor
correctness concern under sustained concurrent phase-completion
indexing (multiple Claude sessions writing artefacts at once). Align
with design. LOCK_STALE_MS was already 30s.
…OCTOU)

indexSingleFile computes embeddings with effectiveProvider.dimensions()
BEFORE acquiring the store lock. If a concurrent rebuild rewrote the
store schema with different dimensions between embed and insert,
insertDocument would fail (wrong vector width) or corrupt the index.

Rare in single-developer use, but Phase 5+ skill orchestration can
run multiple index/rebuild flows back to back.

Inside the lock, after reloading the store, compare the reloaded
metadata's dimensions to what we embedded with. On mismatch, throw —
the withRetry wrapper at the CLI entry re-runs from the top with
fresh metadata and a fresh embedBatch against the new schema.

Closes deferred-issues #1.
…fig null unset

Three small correctness/style fixes bundled:

- OpenAIProvider.embed() now throws a descriptive error if OpenAI
  returns an empty data array instead of a cryptic TypeError reading
  res.data[0].embedding. Unlikely in practice but costs nothing to
  guard. (deferred #13)

- OpenAIProvider.embedBatch() no longer mutates res.data in place.
  Spread before sort. (deferred #12)

- config.loadConfig() now treats explicit null in system/project config
  as an unset sentinel, letting a project config clear a system-level
  default. (deferred #14)

Closes deferred-issues #12, #13, #14.
…g-queue fail, pause stdin after rebuild confirm

Three targeted internal cleanups:

- withRetry now lets TypeError / ReferenceError / SyntaxError through
  immediately instead of waiting 7s of backoff. A typo shouldn't burn
  retry budget before the real trace reaches the developer.
  (deferred #9)

- cmdIndexBulk catch now writes err.stack to stderr after the summary
  line. Debug info is no longer lost just because the user was piping
  stdout. (deferred #10)

- cmdRebuild's readStdinLine now pauses stdin after consuming the
  confirmation line. Irrelevant for the CLI but prevents a library
  consumer from keeping the event loop alive on an idle stream.
  (deferred #16)

Closes deferred-issues #9, #10, #16.
…ack to text

Two retrieval-layer fixes:

- cmdStatus used to shell out to 'manifest get' per spec topic for the
  superseded-spec consistency check — N processes for N specs, ~5s on
  50-spec repos. Now loads the full manifest tree once via 'manifest
  list' and resolves topic statuses in memory. Same behaviour, O(1)
  processes. (deferred #6)

- searchHybrid previously returned zero hits when Orama's similarity
  post-filter masked a query with no strong vector matches, even when
  BM25 would have found good text matches. Fall back to a fulltext
  search before returning empty. Search results can only improve.
  (deferred #15)

Closes deferred-issues #6, #15.
…ng changes

report_update was called unconditionally after the node block, even
when zero manifests were modified. The orchestrator's counter (and the
'review changes' prompt it gates) now fires only on real updates. The
node pass exits 2 on no-op; the bash wrapper routes that to
report_skip instead.

Also made the rc capture robust under the orchestrator's 'set -eo
pipefail' — previously a non-zero node exit aborted the orchestrator
at the rc=$? line.

Closes deferred-issues #8.
…oc-only

Closes the deferred-issues ledger for the knowledge-base pre-merge
cleanup pass. 16 of 18 items fixed in code across earlier commits on
this branch; #7 ('--work-unit' as boost not filter) stays open as a
deliberate design choice already documented in SKILL.md.
… withdrawn)

The Tick data that motivated deferred-issues #17 (9 dirs on disk, 1 in
registry) was a one-off project-manifest corruption, not a systemic
issue. Migration 031 already populates the registry from the filesystem
on first run, and work units created via 'manifest init' register
atomically. The registry is the authoritative source — a filesystem
scan papering over a corrupted registry masks real problems.

Reverts the cmdList union added in bbe53a4. Test covering the union
case is dropped. deferred-issues.md entry updated to reflect the
misdiagnosis and withdraw the fix.
The knowledge CLI resolves the system config path via os.homedir(),
which only honours $HOME. XDG_CONFIG_HOME was dead weight.
…ferred #8

withRetry now bails on RangeError alongside TypeError/ReferenceError/
SyntaxError. Bad Math.min args, invalid array sizes, and similar
programming errors shouldn't burn 7s of retry budget.

Also withdraws deferred-issues #8 (migration 036/035 counter). Fixing
migrations retroactively only helps users who haven't yet run them —
the migration id is recorded in .workflows/.state/migrations after
first run and never re-executed. Snapshot principle > one-time
counter-reporting accuracy.
Eliminates the stderr-regex fragility in the knowledge-base helpers that
classify manifest CLI failures. manifest.cjs now uses:

  exit 0 — success
  exit 1 — real error (corrupt JSON, validation failure, bad args, I/O)
  exit 2 — expected miss (work unit not found, path not found, value
           not found in list, project manifest not found)

die() takes an optional code (default 1 preserves existing behaviour).
10 'not found' call sites opt in to code 2.

Knowledge helpers (discoverArtifacts, getWorkUnitMeta, cmdStatus
unindexed-artifact block) check err.status === 2 instead of matching
/not found|does not exist/i in stderr. Any future reword of a die()
message no longer risks flipping expected↔unexpected classification.

Test suite: 2 existing assertions updated to expect exit 2 on
not-found paths; new test exercises the 1-vs-2 contract across
missing work unit, missing path, invalid work-type, and corrupt JSON.

Closes deferred-issues #4 (properly this time).
The flag --work-unit meant different things on different commands:

  knowledge query  --work-unit X → re-rank BOOST (results from other
                                    work units still appear)
  knowledge remove --work-unit X → FILTER (only X's chunks are removed)

Same spelling, opposite semantics. Docs explained it, but the flag
name itself pattern-matched --phase / --topic / --work-type — all
hard filters — so users naturally assumed query's --work-unit was a
filter too.

Split:
- query now takes --prefer-work-unit (re-rank boost). The --prefer-
  prefix makes boost semantics obvious at the call site and cannot be
  confused with a filter.
- remove keeps --work-unit (filter). Consistent with the other filter
  flags, which is how remove uses them anyway.

No command has an overloaded flag. Skill files (contextual-query.md,
knowledge-usage.md, SKILL.md) updated to use --prefer-work-unit.
Test updated. Ledger #7 resolved properly this time.
Makes the query-command CLI fully orthogonal:

  Filters (hard; non-matching chunks excluded):
    --work-unit  --work-type  --phase  --topic

  Re-ranking (additive, +0.1 per match, repeat for multi-dim):
    --boost:<field> <value>
      valid fields: work-unit, work-type, phase, topic, confidence

Replaces the prior split where --work-unit was a boost on query but a
filter on remove, and the short-lived --prefer-work-unit from this
stack's last attempt.

Skill templates can now compose multi-dimensional bias in a single
invocation that wasn't expressible before, e.g.

  knowledge query "foo" --boost:work-unit {wu} --boost:phase research

which surfaces prior research across work units while nudging results
toward the current one.

Parser: anything matching --boost:<field> is collected into an ordered
list; validation runs at command entry and fails fast on unknown field
or missing value. Skill templates don't silently no-op on typos.

Schema mapping: CLI uses kebab-case (--boost:work-unit) consistent with
the existing filter flags; mapped to the snake_case schema field
(work_unit) internally.

Callers updated: SKILL.md (flag table + guidance), contextual-query.md,
knowledge-usage.md. Four new tests cover: boost ordering vs cross-wu
context, --work-unit as filter, unknown-field error, missing-value
error.
Orama and @msgpack/msgpack both publish ESM dists with
sideEffects: false. Previously our esbuild config defaulted to the
'require' condition (because we emit CJS output), pulling the CJS
barrel files that prevent meaningful tree-shaking.

Adding:
  conditions: ['node', 'import']
  mainFields: ['module', 'main']

forces ESM resolution for dependencies while keeping our own source
(which still uses require()) and the output format (CJS) unchanged.
esbuild handles the ESM→CJS conversion internally.

Measured: 175.9 KB → 154.1 KB. 141 CLI tests + 198 Node tests green
against the new bundle.
Added while auditing the ESM-resolution build change. Imports the built
bundle directly and exercises every Orama API path we use:

- create store with vector schema
- insert (100 docs across 3 work units × 3 phases)
- searchFulltext with where filter
- searchVector with similarity threshold
- searchHybrid normal path
- searchHybrid → fulltext fallback (the #15 fix)
- removeByFilter
- saveStore / loadStore (MsgPack round-trip)
- metadata write/read
- all search modes on the loaded store

Not wired into the automated suite — run ad hoc when verifying
bundle-level changes. Caught one bogus assertion in my own earlier
smoke pass (Orama does not echo embedding fields on search hits by
design) and confirmed no behaviour change between CJS- and
ESM-resolved bundles.
ESM-resolution build landed the bundle at 154 KB, comfortably under the
previous 150 KB test (actually still failing — threshold was set in
phase 1 when the bundle was smaller and a dependency drift tripped it).
Bump to 175 KB with ~20 KB headroom to catch real regressions without
chasing a threshold that had no principled source.

Looked at post-minification terser, esbuild mangleProps/legalComments/
drop, and a self-extracting brotli wrapper. Terser: 2 KB (not worth a
dev dep). esbuild knobs: 0.3 KB (noise). Brotli wrapper: 97 KB saved,
byte-identical, +2–7ms startup tax — elegant but solves a problem we
do not have. Keeping the plain ESM bundle.
The pending-removal queue (deferred-issue #18 fix, commit b5c9430)
shipped broken. writeMetadata whitelists a fixed 5-field schema and
pending_removals was never added — every write silently stripped the
queue. addPendingRemoval wrote the entry, next readMetadata+writeMetadata
cycle dropped it, processPendingRemovals found nothing to drain.

Automatic retry on failed remove (cancellation / supersession /
promotion paths) therefore never happened. Stale chunks from failed
removes persisted forever, contradicting the user-facing promise in
manage-work-unit.md, spec-completion.md, promote-to-cross-cutting.md.

Changes:
- Add pending_removals to writeMetadata's whitelist (store.js:443-444).
- Update the whitelist comment with a loud warning that missing a field
  silently breaks every downstream feature using it.
- METADATA_FIELDS constant updated for consistency.
- Test 81 rewritten: Part A is now a true regression guard — seeds a
  pending_removals entry, runs an index (which triggers writeMetadata),
  asserts the entry survives. This assertion fails on the pre-fix code,
  confirmed by temporarily reverting the whitelist. Part B retains the
  drain-the-queue check.

All existing tests still pass (142/142 CLI, integration, smoke).
knowledge remove --dry-run accepted the flag but performed a real
deletion. Help text and buildOptions advertised the preview semantics;
only cmdCompact implemented them. Running 'knowledge remove --work-unit X
--dry-run' silently dropped chunks as if the flag were absent.

Changes:
- src/knowledge/store.js: added countByFilter(db, where) — same query
  shape as removeByFilter so dry-run count matches what a real run
  would remove. Exported.
- src/knowledge/index.js cmdRemove: observe dryRun at the top, count-
  only path writes 'Would remove N chunks for <desc>' and exits. Does
  NOT drain the pending-removal queue either — dry-run is strictly
  observational.
- Usage line updated to mention --dry-run.
- Regression test (Test 38b): seeds chunks, runs dry-run, asserts
  output says 'Would remove' not 'Removed', chunk count unchanged;
  then follows with a real remove proving the non-dry path still
  works. Confirmed the three dry-run assertions fail on the pre-fix
  code by reverting the branch and re-running.

146/146 CLI tests pass. Smoke + integration green.
leeovery added 18 commits April 21, 2026 21:40
embedBatch assumed res.data.length === requested length. If OpenAI
returned fewer rows (rare but not impossible — partial availability,
certain rate-limit interactions), results[offset + i] for missing
indices stayed undefined. Downstream: doc.embedding = undefined,
Orama inserted the chunk with no embedding field, chunk silently
degraded to keyword-only with no warning.

embed() had this guard already (deferred-issue #13); embedBatch was
missed.

Added length check after each _fetch in both branches of embedBatch:
- Single-batch (texts.length <= MAX_BATCH_SIZE): validate
  res.data.length === texts.length.
- Chunked (>MAX_BATCH_SIZE): validate per slice.
Error message includes requested vs received counts and chunk offset
when applicable.

Tests: two new cases in test-knowledge-openai.cjs — short response
(2 rows for 3 inputs) and missing data array both assert rejection
with /response length mismatch/ pattern. Confirmed both fail on
pre-fix code by reverting the guards.
…exists

cmdCheck only verified config.json's existence — a corrupted file
passed the check and the user only saw a cryptic JSON parse error
later on index or query. Worse, skills parse check output to gate
their flow; 'ready' on corrupt config meant the skill then invoked a
downstream command that crashed.

Added Condition 2b between the existence check and the store check:
attempt config.readConfigFile() (the authoritative load path — it
already throws with descriptive errors for JSON parse failures,
missing top-level 'knowledge' key, wrong type). On throw: write the
specific diagnostic to stderr, output 'not-ready' on stdout, return.

Tests: three new cases.
- Test 37b: invalid JSON → not-ready + 'config error' on stderr.
- Test 37c: missing 'knowledge' key → not-ready + diagnostic mentions
  the key.
- Test 37d: corrupted config over an otherwise-healthy KB flips check
  from 'ready' to 'not-ready' — the strongest guard since it isolates
  the config path from 'fails for other reasons'.

Confirmed all three fail on pre-fix code. 153/153 CLI pass. Smoke +
integration green.
User-visible validation failures previously threw plain Error, so the
top-level main().catch printed the full stack trace to stderr. Ten
lines of Node internal frames followed every 'bad path' or 'provider
mismatch' message — user-hostile for what are normal validation errors.

New UserError class sits alongside existing helpers in index.js. Three
contracts:
1. Thrown at input-validation sites only (bad path, unindexed phase,
   provider/model mismatch, missing chunking config, missing work unit
   manifest, empty-file refusal).
2. withRetry does not retry UserError (same treatment as TypeError et
   al — retrying a permanent user-input failure is waste).
3. main().catch prints 'Error: <message>' alone for UserError; full
   stack for anything else (likely a real bug).

Converted throw sites:
- deriveIdentity: 7 throws (path mismatch, invalid wu/topic, phase not
  indexed, per-phase subpath validation).
- readWorkType: 2 throws (manifest missing, work_type field missing).
- resolveProviderState: 2 throws (index-side provider mismatch).
- resolveQueryMode: 2 throws (query-side provider mismatch).
- indexSingleFile: 2 throws (missing chunking config, empty file).

Kept as plain Error:
- Store schema TOCTOU throw in indexSingleFile (intentionally retryable
  via withRetry).

UserError exported from module.exports for downstream consumers.

Tests: Test 14 extended to assert no stack frames on stderr for an
unindexed-phase index invocation. New Test 14b covers the non-.workflows
path case. Confirmed both stack-suppression assertions fail on pre-fix
code (reverted the catch-handler branch, re-ran, saw FAIL on both).

157/157 CLI tests, smoke, retry, integration, store all green.
…card

Orama treats an empty query term as 'match everything' and returns up
to --limit arbitrary chunks. Previously 'knowledge query ""' silently
succeeded with 10 unrelated hits — almost always a caller bug
(unsubstituted template variable, accidental empty positional) rather
than an intentional 'give me anything' request.

cmdQuery now rejects any positional term that is empty or whitespace-
only with a UserError. Error message suggests 'knowledge status' for
users who actually wanted to list what's indexed.

Tests:
- Test 23c: empty, whitespace-only, and valid positional terms.
  Asserts non-zero exit + error message surfaces for empty/ws,
  zero exit for valid.
- Test 38b updated: replaced query "" chunk-counting calls with
  query "content" (term appears in the fixture).

Three regression guards confirmed to fail on pre-fix code. 161/161
CLI tests, smoke, integration all green.

Note: other 'exit 0 on error' cases Agent 4 flagged (query no-args,
unknown command, rebuild abort, unknown --boost:<field>) were verified
to already exit 1 correctly — they were either fixed incidentally by
earlier commits in this cleanup pass or were stale claims.
knowledge remove --work-unit <typo> silently succeeded with 'Removed 0
chunks' and exit 0 — a fat-finger was indistinguishable from a
legitimate no-op. Skills that pass a template-substituted name had no
way to detect that the variable resolved wrong.

cmdRemove now performs a project-registry lookup (manifest project get
<wu>) at entry, before the store-existence check. Registry miss
(manifest exits 2) → UserError with the wu name and a hint to run
knowledge status. Any other manifest error is rethrown with stack.

The check lives in cmdRemove only, NOT performRemoval. The pending-
removal queue's drain path calls performRemoval directly for entries
whose wu may have been deleted from the registry after the failure was
queued (e.g. absorption). Those still need to clean up chunks that
remain in the store.

Verified every skill flow that calls knowledge remove (cancel, supersede,
promote, absorb-into-epic, epic-topic-cancel) invokes it while the wu is
still in the registry — so no caller breaks.

Tests: Test 41 flipped from 'reports 0 removed' for nonexistent wu to
'exits non-zero + error names wu + mentions project manifest'. Test 44
updated to create the WU first before asserting empty-store no-op.
Three guard assertions confirmed to fail on pre-fix code.

165/165 CLI, smoke, integration all green.
Changing the config (e.g. 128-dim stub → 1536-dim OpenAI) and running
'knowledge index' silently 'succeeded' with 'N already indexed.' The
preflight isIndexed() check short-circuited per file before anything
could notice the mismatch, so the existing store kept its old dims
while the config drifted. 'query' surfaced the mismatch cryptically
later; 'index' fooled the user into thinking nothing needed doing.

Added a preflight resolveProviderState() call at the top of
cmdIndexBulk, right after ensuring the knowledge dir. If metadata
exists and diverges from current cfg/provider, the existing UserError
('Run knowledge rebuild to reindex') is thrown — same message as
query, no stack noise thanks to the centralised UserError handling.
First-ever bulk index (no metadata yet) skips the check naturally.

Single-file cmdIndex already surfaces the mismatch through
indexSingleFile → resolveProviderState, unaffected by this change.
performRemoval and the pending-removal drain path are unaffected.

Tests: Test 22b added after the query-mismatch test (22), same
scenario flipped to bulk index. Three assertions (exits non-zero,
error mentions rebuild, no 'already indexed' in output) all confirmed
to fail on pre-fix code by reverting the preflight block.

168/168 CLI tests, smoke, integration, store all green.
…usage

Both SKILL.md files had a signpost blockquote at their knowledge-usage
load step saying 'proactive querying is available' — directly
contradicting knowledge-usage.md Section E, which says 'do not query
during planning' and 'rare in practice' for implementation.

Signposts render first; a model reads the invitation before loading
the reference that restricts it. Rewrote both to match the guidance:

- Planning: 'Planning operates from the spec as the golden source —
  the guide documents the narrow cases where a KB query is warranted,
  and those where it is not.'
- Implementation: 'Implementation reads the code as the source of
  truth for *what* exists — the guide documents the rare cases where
  the KB is useful for the *why* behind an existing pattern.'

Prose-only change. No code, no tests touched. 168/168 CLI + smoke
green (sanity).
CLAUDE.md defines exactly one form for terminal stops:
  **STOP.** Do not proceed — terminal condition.

Nine sites used the non-canonical 'This is terminal — do not return to
the caller/backbone.' or 'the invoked skill takes over.' These risk a
model continuing past the handoff because the sentence doesn't parse
as a stop gate by convention.

Agent 1's audit flagged 2 sites (manage-work-unit.md, absorb-into-
epic.md); grep across skills/ found 7 more with the identical pattern.
All 9 replaced with 'Invoke … <skill>.\n\n**STOP.** Do not proceed —
terminal condition.'

Files changed:
- continue-bugfix/SKILL.md
- continue-cross-cutting/SKILL.md
- continue-feature/SKILL.md
- continue-quickfix/SKILL.md
- workflow-discussion-entry/references/display-options.md
- workflow-scoping-process/references/complexity-check.md (2 sites)
- workflow-start/references/absorb-into-epic.md
- workflow-start/references/manage-work-unit.md

Prose-only. 168/168 CLI + smoke green. Final grep confirms zero
non-canonical phrases remain.
cfg.similarity_threshold || 0.8 silently overrode an explicit 0 with
the default — classic JS footgun where 0 is legitimate data but gets
coerced as falsy. Setting similarity_threshold=0 means 'accept all
vector matches, no filtering', a valid config the user was blocked
from using.

Changed to ?? (nullish coalescing). Only null/undefined fall back to
0.8; explicit 0 is now respected.

Checked adjacent || fallbacks — dimensions, limit, provider, model —
all have string/positive-integer semantics where 0 is meaningless.
Kept them as-is. Only similarity_threshold had the real zero-is-valid
case.

168/168 CLI tests, smoke, store, integration, config all green.
Migration 037 is on phase-5+ only — never deployed to main, never
present in any project's migration log. So editing it now actually
helps the first real user who runs the post-merge AGNTC update.
Contrast with migrations 031-036 which are deployed and off-limits.

Two issues fixed:

1. report_update was called unconditionally on node-exit-0, even when
   zero work units needed backfilling. Now: the node block counts
   modifications and exits 2 when nothing ran; the bash wrapper
   routes exit 2 to report_skip. Keeps the orchestrator's
   FILES_UPDATED counter honest and prevents the false 'review
   changes' prompt on a no-op run.

2. node stderr was being swallowed via 2>/dev/null. A future bug
   inside the node script would leave the migration silently
   skipping. Removed the redirect — any stderr now passes through
   to the orchestrator. Verified manually by injecting a typo:
   ReferenceError stack is now visible.

Wrapper pattern 'node … && rc=0 || rc=$?' chosen so the non-zero
exit-2 doesn't trip 'set -eo pipefail' in the orchestrator (same
pattern I built for 036 before that fix was withdrawn on the
deployed-migration grounds).

Tests: test-migration-037.sh harness updated — report_update /
report_skip stubs now record which was called via $REPORT_CALLED.
Three new tests:
- test_counter_reports_update_when_modified
- test_counter_reports_skip_when_nothing_to_do
- test_counter_reports_skip_on_empty_workflows

Confirmed the two 'should skip' assertions fail on pre-fix code by
reverting the wrapper temporarily. 16/16 migration-037 tests pass
on the fix. 168/168 CLI + smoke unaffected.
Every other entry-point skill's Step 0.2 (start-bugfix, start-feature,
start-epic, start-cross-cutting, continue-*, workflow-migrate-entry)
wraps the migration invocation in a gate:

  #### If the /workflow-migrate skill has already been invoked in
      this conversation
  → Proceed to Step 0.3.
  #### Otherwise
  [invoke migrations]

workflow-start ran migrations unconditionally. In normal flow it is
always the first entry-point so the gate never fires either way, but
the drift is real: a future maintainer reading workflow-start vs any
other entry-point sees inconsistent patterns and has to guess which
is canonical. Aligned with the convention.

Prose only. 168/168 CLI + smoke green.
When a completed work unit is reactivated via view-completed.md the
status flipped to in-progress but completed_at was left behind —
manifest then carried a completion date for an in-progress work unit.

Cosmetic today (nothing reads completed_at except compact, which
gates on status first), but the pair is self-contradictory data and
breaks if any future feature keys on completed_at.

Added an exists-check + conditional delete in the 'was completed'
branch. completed_at is backfilled by migration 036/037 for normal
flows, but a completed work unit with no artifact files skips backfill,
so check before deleting rather than accepting an error.

Prose-only change in a skill file.
knowledge --help previously wrote USAGE to stderr and exited 1. Scripts
probing the CLI treat that as a failure. Informational help should
exit 0 by convention.

Handled before arg parsing so the help paths are cheap and independent
of other flag logic:

  if (rawArgs.includes('--help') || rawArgs.includes('-h') || rawArgs[0] === 'help') {
    process.stdout.write(USAGE + '\n');
    process.exit(0);
  }

No-args (user forgot a command) stays stderr + exit 1 — that is still
a usage error.

USAGE string updated with a '--help, -h' line. SKILL.md Invocation
section notes both the help forms (stdout/0) and the no-args form
(stderr/1).

Verified all four paths empirically: --help, -h, help → stdout=1073B,
stderr=0, exit 0. No-args → stdout=0, stderr=1073B, exit 1.
cmdRebuild's abort message wrote 'Aborted.\n' with no leading newline,
so on an EOF-without-newline at the prompt the message collided with
whatever the user had typed. Added a leading newline. Exit code stays
1 (already correct). Purely cosmetic.
…uting

Scoping was the only exploratory phase missing a structured Contextual
Query step. knowledge-usage.md Section E explicitly invites scoping to
'query throughout' (same posture as discussion/investigation), and the
existing Step 2 nudge says 'query the knowledge base while gathering
context' — but with no structural query step, the prompt was harder
for Claude to action. Added Step 3 between Gather Context and
Complexity Check, modelled on workflow-investigation-process Step 4.
Subsequent steps renumbered 3→4, 4→5, 5→6, 6→7, 7→8.

Also fixed a pre-existing bug in Step 0's resume routing: when a spec
existed but the plan was incomplete, the comment said 'resume from
format selection' but the link routed to Step 1 (Knowledge Usage),
not the format-selection step (was Step 5, now Step 6). Updated the
link to actually go to format selection.

No external skill references the scoping step numbers — confirmed via
grep across skills/. Markdown-only change.
withRetry skips retry for TypeError, ReferenceError, SyntaxError,
RangeError, and UserError — the guard added incrementally across
deferred-issue #9 (commit f984476), Important #7's UserError
introduction, and the RangeError addition. Behaviour was untested:
a future refactor reversing the guard would burn 7s of retry budget
on permanent failures with no test catching it.

Five new cases in test-knowledge-retry.cjs, one per error class.
Each throws the specific type, asserts exactly one call and the
original class survives. Confirmed all five fail on pre-guard code
(seen 'actual: 3, expected: 1' — 3 retries instead of 1).

11/11 retry tests now pass.
Empirical probing of Orama's hybrid mode shows the deferred-issue #15
concern doesn't manifest. With a flat or poor-quality vector and a
tight similarity threshold (e.g. 0.99), hybrid still returns BM25-
driven hits — text matches come through regardless. The only way to
get zero hybrid hits is when the term itself doesn't match anything,
at which point a fulltext fallback also returns zero.

The defensive fallback added in commit 33303da was therefore dead
code: in every realistic scenario the original hybrid call already
returns the right thing. Removing it.

deferred-issues.md #15 reclassified RESOLVED → WITHDRAWN with the
empirical evidence and a pointer to the brief commit history.

kb-smoke.cjs: relabelled the previous '#15 fallback' check. The
assertion still holds — hybrid surfaces BM25 hits even with weak
vector matches — but it now describes what's actually being proven
(Orama hybrid handles this natively) rather than the now-removed
fallback path.

168/168 CLI + smoke + store + integration green.
…how batch example

The reference contradicted itself. Opening prose said 'a single
targeted query' / 'One query, one interpretation step', then later
prose pointed at the batch idiom. The example showed only the single
form. Skill files learn the shape from the example, so Claude
defaulted to single-term queries even when the calling phase had
multiple distinct angles to query.

- Opening reframed: 'a focused query' / 'One invocation, one
  interpretation step' (covers both single and batch).
- New paragraph naming when batch is appropriate (investigation has
  symptoms + subsystem; research/discussion/scoping usually have
  thinner starting context, single is fine).
- Example block now shows both forms — single first, batch second
  with three positional placeholders.

Prose only, single file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant