feat(knowledge): Phase 8 — release process by leeovery · Pull Request #257 · leeovery/agentic-workflows

leeovery · 2026-04-20T19:05:37Z

Summary

Formalises the release pipeline so every tagged release ships a fresh knowledge bundle and runs the full knowledge-base test suite in CI.

./release rebuilds skills/workflow-knowledge/scripts/knowledge.cjs between the dirty-tree gate and the tag. npm install + npm run build run in all version strategies (including none); a bundle commit lands only when the rebuild produces a diff. Build failure aborts before any tag is created. Dry-run path is unchanged — no npm invocation in preview.
.github/workflows/release.yml gains npm install (dev deps for unit tests) plus explicit test-workflow-manifest.sh, globbed test-knowledge-*.cjs, and globbed test-knowledge-*.sh runs. Existing migration and discovery tests continue to run. CI does not rebuild the bundle — the committed artifact is authoritative.
tests/scripts/test-release-build.sh (new) exercises the release-script build block via a throwaway temp repo with npm/git tag/git push stubs. 23/23 tests green locally.

Stacking

Base: feat/knowledge-base-phase-7. Do not merge to main until the knowledge-base rollout reaches go-live. The 4-PR stack (#254 → #255 → #256 → this) stays intact.

Test plan

bash tests/scripts/test-release-build.sh — 23 passed
node --test for test-knowledge-{embeddings,store,chunker,config,retry,openai,integration}.cjs — all pass
test-knowledge-openai-integration.cjs — skips without OPENAI_API_KEY (expected)
./release -d --no-ai dry-run — npm install / build not invoked, preview output unchanged
Tag-push validation — happens after merge via the updated CI workflow
CI validates committed bundle without rebuilding — verified at first tag push

Notes: migration tests fail locally only due to a sandbox /tmp write restriction (pre-existing, CI-clean). test-knowledge-cli.sh Test 11+ and test-knowledge-build.sh bundle-size threshold are pre-existing on phase-7 — not caused by this PR.

Inserts npm install + npm run build + conditional bundle commit into ./release between the dirty-tree gate and version-file update. Runs in all VERSION_STRATEGY branches, including "none". Build failure aborts before tagging so no stale bundle ships. Dry-run path unaffected. Refs tick-7f4725.

Adds npm install before the test step (unit tests need dev deps from node_modules). Extends the test job with explicit test-workflow-manifest.sh plus globbed test-knowledge-*.cjs and test-knowledge-*.sh suites. Does not run npm run build — the committed bundle is authoritative; the release script rebuilds and commits it pre-tag. Refs tick-de9b1a.

npm install mutates package-lock.json on any resolvable version change, bypassing the dirty-tree gate and leaving uncommitted state after the tag is pushed. npm ci refuses to modify the lockfile and errors if drift exists — the correct primitive for a release flow that has already gated on a clean working tree.

Triggered on tag push, so tests can't gate a release — the tag is already live by the time the job starts. AGNTC installs from git tags directly, so the GitHub Release page the workflow produces is never consumed. Both jobs were cost with no return. Phase 8 Task 8-2 was layering more tests onto this dead pipeline; reverting that along with the pre-existing cruft.

…nits) cmdList previously used the project manifest registry when populated, falling back to a filesystem scan only when the registry was empty. Work units created before the registry existed are therefore invisible to 'manifest list' — and to every knowledge-base consumer (bulk index, status, compact). Real-data test: 9 dirs on disk, 1 registered, only 2 files indexed. Union both sources so downstream tools see every unit that actually exists. Deduped via Set. Dot-prefix and malformed-manifest handling unchanged. Closes deferred-issues #17.

cmdIndex silently stayed in keyword-only mode when the user configured a provider after an initial stub-mode index — only 'query' and 'status' warned. On the first index call after the upgrade, emit a one-shot note to stderr telling the user to run 'knowledge rebuild' to switch to vector search. Deduplicated via a module-level flag so bulk indexing of N files does not spam N identical warnings. Also fix test isolation: the CLI test suite leaked the developer's real ~/.config/workflows/config.json into tempdir-based tests, breaking keyword-only scenarios (Test 11 onward) on any machine with a real knowledge-base setup. Pin HOME/XDG_CONFIG_HOME to a tempdir. Closes deferred-issues #11.

knowledge remove previously had no retry — a locked store or transient I/O failure left stale chunks in the index forever and told the user to retry manually. Indexing already has a pending queue; removals now mirror it. Changes: - metadata.pending_removals tracks failed removals with workUnit/phase/ topic, attempts counter (cap 10, then evict with stderr warning). - cmdRemove drains the queue first, attempts the new removal, queues on failure. - cmdCompact drains the queue on entry. - status surfaces pending removals alongside pending indexes. - Skill callsites (cancellation, supersession, promotion) updated to describe auto-retry instead of 'run knowledge remove manually later'. Closes deferred-issues #18.

…missing path Two related silent-failure modes rolled into one commit: MANIFEST_JS resolution (deferred #5): when neither candidate path existed, the const silently held a non-existent path. First manifest call ENOENT'd inside a silent try/catch and bulk operations returned empty. Now throws at module load with both candidate paths — an installation problem is surfaced immediately, not disguised as 'no artifacts to index'. Error swallowing (deferred #4): discoverArtifacts, getWorkUnitMeta, and the status unindexed-artifact block caught every execFileSync failure identically. 'Work unit not found' is expected (orphans), but corrupt JSON / bad work-unit name / missing manifest file were indistinguishable and silently skipped. Added isManifestKeyNotFound() to recognise expected misses, reportUnexpectedManifestError() to write everything else to stderr with the CLI's own message. Behaviour unchanged for the expected-miss path; real breakage is now visible. Closes deferred-issues #4, #5.

cmdRebuild previously deleted store.msp + metadata.json before running bulk index. If bulk threw (network outage, OpenAI down, Ctrl-C), the user was left with no store and no metadata — a transient failure turned into total data loss. Rename to .bak before the wipe, run bulk index, delete .bak on success, restore .bak on failure. Leftover .bak from prior aborts is cleaned at rebuild entry. If rollback itself fails, stderr prints both errors and the .bak paths so the user can recover manually. Closes deferred-issues #3.

Items in metadata.pending that fail every catch-up retry stayed in the queue forever — a permanent failure (renamed work unit, malformed file) would cost 3 OpenAI calls per item per bulk-index run, indefinitely. Each entry now carries an attempts counter. processPendingQueue checks the counter at entry and evicts with a stderr warning once attempts reach PENDING_MAX_ATTEMPTS (10). The catch block that previously swallowed the retry failure now writes the failure back to the queue, bumping the counter so eviction eventually fires. status output updated to show attempt/max so users can see an item on the verge of eviction. Closes deferred-issues #2.

Design doc specifies a 30s lock timeout; implementation had 10s — minor correctness concern under sustained concurrent phase-completion indexing (multiple Claude sessions writing artefacts at once). Align with design. LOCK_STALE_MS was already 30s.

…OCTOU) indexSingleFile computes embeddings with effectiveProvider.dimensions() BEFORE acquiring the store lock. If a concurrent rebuild rewrote the store schema with different dimensions between embed and insert, insertDocument would fail (wrong vector width) or corrupt the index. Rare in single-developer use, but Phase 5+ skill orchestration can run multiple index/rebuild flows back to back. Inside the lock, after reloading the store, compare the reloaded metadata's dimensions to what we embedded with. On mismatch, throw — the withRetry wrapper at the CLI entry re-runs from the top with fresh metadata and a fresh embedBatch against the new schema. Closes deferred-issues #1.

…fig null unset Three small correctness/style fixes bundled: - OpenAIProvider.embed() now throws a descriptive error if OpenAI returns an empty data array instead of a cryptic TypeError reading res.data[0].embedding. Unlikely in practice but costs nothing to guard. (deferred #13) - OpenAIProvider.embedBatch() no longer mutates res.data in place. Spread before sort. (deferred #12) - config.loadConfig() now treats explicit null in system/project config as an unset sentinel, letting a project config clear a system-level default. (deferred #14) Closes deferred-issues #12, #13, #14.

…g-queue fail, pause stdin after rebuild confirm Three targeted internal cleanups: - withRetry now lets TypeError / ReferenceError / SyntaxError through immediately instead of waiting 7s of backoff. A typo shouldn't burn retry budget before the real trace reaches the developer. (deferred #9) - cmdIndexBulk catch now writes err.stack to stderr after the summary line. Debug info is no longer lost just because the user was piping stdout. (deferred #10) - cmdRebuild's readStdinLine now pauses stdin after consuming the confirmation line. Irrelevant for the CLI but prevents a library consumer from keeping the event loop alive on an idle stream. (deferred #16) Closes deferred-issues #9, #10, #16.

…ack to text Two retrieval-layer fixes: - cmdStatus used to shell out to 'manifest get' per spec topic for the superseded-spec consistency check — N processes for N specs, ~5s on 50-spec repos. Now loads the full manifest tree once via 'manifest list' and resolves topic statuses in memory. Same behaviour, O(1) processes. (deferred #6) - searchHybrid previously returned zero hits when Orama's similarity post-filter masked a query with no strong vector matches, even when BM25 would have found good text matches. Fall back to a fulltext search before returning empty. Search results can only improve. (deferred #15) Closes deferred-issues #6, #15.

…ng changes report_update was called unconditionally after the node block, even when zero manifests were modified. The orchestrator's counter (and the 'review changes' prompt it gates) now fires only on real updates. The node pass exits 2 on no-op; the bash wrapper routes that to report_skip instead. Also made the rc capture robust under the orchestrator's 'set -eo pipefail' — previously a non-zero node exit aborted the orchestrator at the rc=$? line. Closes deferred-issues #8.

…oc-only Closes the deferred-issues ledger for the knowledge-base pre-merge cleanup pass. 16 of 18 items fixed in code across earlier commits on this branch; #7 ('--work-unit' as boost not filter) stays open as a deliberate design choice already documented in SKILL.md.

… withdrawn) The Tick data that motivated deferred-issues #17 (9 dirs on disk, 1 in registry) was a one-off project-manifest corruption, not a systemic issue. Migration 031 already populates the registry from the filesystem on first run, and work units created via 'manifest init' register atomically. The registry is the authoritative source — a filesystem scan papering over a corrupted registry masks real problems. Reverts the cmdList union added in bbe53a4. Test covering the union case is dropped. deferred-issues.md entry updated to reflect the misdiagnosis and withdraw the fix.

The knowledge CLI resolves the system config path via os.homedir(), which only honours $HOME. XDG_CONFIG_HOME was dead weight.

…en nothing changes" This reverts commit 9b1a199.

…ferred #8 withRetry now bails on RangeError alongside TypeError/ReferenceError/ SyntaxError. Bad Math.min args, invalid array sizes, and similar programming errors shouldn't burn 7s of retry budget. Also withdraws deferred-issues #8 (migration 036/035 counter). Fixing migrations retroactively only helps users who haven't yet run them — the migration id is recorded in .workflows/.state/migrations after first run and never re-executed. Snapshot principle > one-time counter-reporting accuracy.

Eliminates the stderr-regex fragility in the knowledge-base helpers that classify manifest CLI failures. manifest.cjs now uses: exit 0 — success exit 1 — real error (corrupt JSON, validation failure, bad args, I/O) exit 2 — expected miss (work unit not found, path not found, value not found in list, project manifest not found) die() takes an optional code (default 1 preserves existing behaviour). 10 'not found' call sites opt in to code 2. Knowledge helpers (discoverArtifacts, getWorkUnitMeta, cmdStatus unindexed-artifact block) check err.status === 2 instead of matching /not found|does not exist/i in stderr. Any future reword of a die() message no longer risks flipping expected↔unexpected classification. Test suite: 2 existing assertions updated to expect exit 2 on not-found paths; new test exercises the 1-vs-2 contract across missing work unit, missing path, invalid work-type, and corrupt JSON. Closes deferred-issues #4 (properly this time).

The flag --work-unit meant different things on different commands: knowledge query --work-unit X → re-rank BOOST (results from other work units still appear) knowledge remove --work-unit X → FILTER (only X's chunks are removed) Same spelling, opposite semantics. Docs explained it, but the flag name itself pattern-matched --phase / --topic / --work-type — all hard filters — so users naturally assumed query's --work-unit was a filter too. Split: - query now takes --prefer-work-unit (re-rank boost). The --prefer- prefix makes boost semantics obvious at the call site and cannot be confused with a filter. - remove keeps --work-unit (filter). Consistent with the other filter flags, which is how remove uses them anyway. No command has an overloaded flag. Skill files (contextual-query.md, knowledge-usage.md, SKILL.md) updated to use --prefer-work-unit. Test updated. Ledger #7 resolved properly this time.

Makes the query-command CLI fully orthogonal: Filters (hard; non-matching chunks excluded): --work-unit --work-type --phase --topic Re-ranking (additive, +0.1 per match, repeat for multi-dim): --boost:<field> <value> valid fields: work-unit, work-type, phase, topic, confidence Replaces the prior split where --work-unit was a boost on query but a filter on remove, and the short-lived --prefer-work-unit from this stack's last attempt. Skill templates can now compose multi-dimensional bias in a single invocation that wasn't expressible before, e.g. knowledge query "foo" --boost:work-unit {wu} --boost:phase research which surfaces prior research across work units while nudging results toward the current one. Parser: anything matching --boost:<field> is collected into an ordered list; validation runs at command entry and fails fast on unknown field or missing value. Skill templates don't silently no-op on typos. Schema mapping: CLI uses kebab-case (--boost:work-unit) consistent with the existing filter flags; mapped to the snake_case schema field (work_unit) internally. Callers updated: SKILL.md (flag table + guidance), contextual-query.md, knowledge-usage.md. Four new tests cover: boost ordering vs cross-wu context, --work-unit as filter, unknown-field error, missing-value error.

Orama and @msgpack/msgpack both publish ESM dists with sideEffects: false. Previously our esbuild config defaulted to the 'require' condition (because we emit CJS output), pulling the CJS barrel files that prevent meaningful tree-shaking. Adding: conditions: ['node', 'import'] mainFields: ['module', 'main'] forces ESM resolution for dependencies while keeping our own source (which still uses require()) and the output format (CJS) unchanged. esbuild handles the ESM→CJS conversion internally. Measured: 175.9 KB → 154.1 KB. 141 CLI tests + 198 Node tests green against the new bundle.

Added while auditing the ESM-resolution build change. Imports the built bundle directly and exercises every Orama API path we use: - create store with vector schema - insert (100 docs across 3 work units × 3 phases) - searchFulltext with where filter - searchVector with similarity threshold - searchHybrid normal path - searchHybrid → fulltext fallback (the #15 fix) - removeByFilter - saveStore / loadStore (MsgPack round-trip) - metadata write/read - all search modes on the loaded store Not wired into the automated suite — run ad hoc when verifying bundle-level changes. Caught one bogus assertion in my own earlier smoke pass (Orama does not echo embedding fields on search hits by design) and confirmed no behaviour change between CJS- and ESM-resolved bundles.

ESM-resolution build landed the bundle at 154 KB, comfortably under the previous 150 KB test (actually still failing — threshold was set in phase 1 when the bundle was smaller and a dependency drift tripped it). Bump to 175 KB with ~20 KB headroom to catch real regressions without chasing a threshold that had no principled source. Looked at post-minification terser, esbuild mangleProps/legalComments/ drop, and a self-extracting brotli wrapper. Terser: 2 KB (not worth a dev dep). esbuild knobs: 0.3 KB (noise). Brotli wrapper: 97 KB saved, byte-identical, +2–7ms startup tax — elegant but solves a problem we do not have. Keeping the plain ESM bundle.

The pending-removal queue (deferred-issue #18 fix, commit b5c9430) shipped broken. writeMetadata whitelists a fixed 5-field schema and pending_removals was never added — every write silently stripped the queue. addPendingRemoval wrote the entry, next readMetadata+writeMetadata cycle dropped it, processPendingRemovals found nothing to drain. Automatic retry on failed remove (cancellation / supersession / promotion paths) therefore never happened. Stale chunks from failed removes persisted forever, contradicting the user-facing promise in manage-work-unit.md, spec-completion.md, promote-to-cross-cutting.md. Changes: - Add pending_removals to writeMetadata's whitelist (store.js:443-444). - Update the whitelist comment with a loud warning that missing a field silently breaks every downstream feature using it. - METADATA_FIELDS constant updated for consistency. - Test 81 rewritten: Part A is now a true regression guard — seeds a pending_removals entry, runs an index (which triggers writeMetadata), asserts the entry survives. This assertion fails on the pre-fix code, confirmed by temporarily reverting the whitelist. Part B retains the drain-the-queue check. All existing tests still pass (142/142 CLI, integration, smoke).

knowledge remove --dry-run accepted the flag but performed a real deletion. Help text and buildOptions advertised the preview semantics; only cmdCompact implemented them. Running 'knowledge remove --work-unit X --dry-run' silently dropped chunks as if the flag were absent. Changes: - src/knowledge/store.js: added countByFilter(db, where) — same query shape as removeByFilter so dry-run count matches what a real run would remove. Exported. - src/knowledge/index.js cmdRemove: observe dryRun at the top, count- only path writes 'Would remove N chunks for <desc>' and exits. Does NOT drain the pending-removal queue either — dry-run is strictly observational. - Usage line updated to mention --dry-run. - Regression test (Test 38b): seeds chunks, runs dry-run, asserts output says 'Would remove' not 'Removed', chunk count unchanged; then follows with a real remove proving the non-dry path still works. Confirmed the three dry-run assertions fail on the pre-fix code by reverting the branch and re-running. 146/146 CLI tests pass. Smoke + integration green.

embedBatch assumed res.data.length === requested length. If OpenAI returned fewer rows (rare but not impossible — partial availability, certain rate-limit interactions), results[offset + i] for missing indices stayed undefined. Downstream: doc.embedding = undefined, Orama inserted the chunk with no embedding field, chunk silently degraded to keyword-only with no warning. embed() had this guard already (deferred-issue #13); embedBatch was missed. Added length check after each _fetch in both branches of embedBatch: - Single-batch (texts.length <= MAX_BATCH_SIZE): validate res.data.length === texts.length. - Chunked (>MAX_BATCH_SIZE): validate per slice. Error message includes requested vs received counts and chunk offset when applicable. Tests: two new cases in test-knowledge-openai.cjs — short response (2 rows for 3 inputs) and missing data array both assert rejection with /response length mismatch/ pattern. Confirmed both fail on pre-fix code by reverting the guards.

…exists cmdCheck only verified config.json's existence — a corrupted file passed the check and the user only saw a cryptic JSON parse error later on index or query. Worse, skills parse check output to gate their flow; 'ready' on corrupt config meant the skill then invoked a downstream command that crashed. Added Condition 2b between the existence check and the store check: attempt config.readConfigFile() (the authoritative load path — it already throws with descriptive errors for JSON parse failures, missing top-level 'knowledge' key, wrong type). On throw: write the specific diagnostic to stderr, output 'not-ready' on stdout, return. Tests: three new cases. - Test 37b: invalid JSON → not-ready + 'config error' on stderr. - Test 37c: missing 'knowledge' key → not-ready + diagnostic mentions the key. - Test 37d: corrupted config over an otherwise-healthy KB flips check from 'ready' to 'not-ready' — the strongest guard since it isolates the config path from 'fails for other reasons'. Confirmed all three fail on pre-fix code. 153/153 CLI pass. Smoke + integration green.

User-visible validation failures previously threw plain Error, so the top-level main().catch printed the full stack trace to stderr. Ten lines of Node internal frames followed every 'bad path' or 'provider mismatch' message — user-hostile for what are normal validation errors. New UserError class sits alongside existing helpers in index.js. Three contracts: 1. Thrown at input-validation sites only (bad path, unindexed phase, provider/model mismatch, missing chunking config, missing work unit manifest, empty-file refusal). 2. withRetry does not retry UserError (same treatment as TypeError et al — retrying a permanent user-input failure is waste). 3. main().catch prints 'Error: <message>' alone for UserError; full stack for anything else (likely a real bug). Converted throw sites: - deriveIdentity: 7 throws (path mismatch, invalid wu/topic, phase not indexed, per-phase subpath validation). - readWorkType: 2 throws (manifest missing, work_type field missing). - resolveProviderState: 2 throws (index-side provider mismatch). - resolveQueryMode: 2 throws (query-side provider mismatch). - indexSingleFile: 2 throws (missing chunking config, empty file). Kept as plain Error: - Store schema TOCTOU throw in indexSingleFile (intentionally retryable via withRetry). UserError exported from module.exports for downstream consumers. Tests: Test 14 extended to assert no stack frames on stderr for an unindexed-phase index invocation. New Test 14b covers the non-.workflows path case. Confirmed both stack-suppression assertions fail on pre-fix code (reverted the catch-handler branch, re-ran, saw FAIL on both). 157/157 CLI tests, smoke, retry, integration, store all green.

…card Orama treats an empty query term as 'match everything' and returns up to --limit arbitrary chunks. Previously 'knowledge query ""' silently succeeded with 10 unrelated hits — almost always a caller bug (unsubstituted template variable, accidental empty positional) rather than an intentional 'give me anything' request. cmdQuery now rejects any positional term that is empty or whitespace- only with a UserError. Error message suggests 'knowledge status' for users who actually wanted to list what's indexed. Tests: - Test 23c: empty, whitespace-only, and valid positional terms. Asserts non-zero exit + error message surfaces for empty/ws, zero exit for valid. - Test 38b updated: replaced query "" chunk-counting calls with query "content" (term appears in the fixture). Three regression guards confirmed to fail on pre-fix code. 161/161 CLI tests, smoke, integration all green. Note: other 'exit 0 on error' cases Agent 4 flagged (query no-args, unknown command, rebuild abort, unknown --boost:<field>) were verified to already exit 1 correctly — they were either fixed incidentally by earlier commits in this cleanup pass or were stale claims.

knowledge remove --work-unit <typo> silently succeeded with 'Removed 0 chunks' and exit 0 — a fat-finger was indistinguishable from a legitimate no-op. Skills that pass a template-substituted name had no way to detect that the variable resolved wrong. cmdRemove now performs a project-registry lookup (manifest project get <wu>) at entry, before the store-existence check. Registry miss (manifest exits 2) → UserError with the wu name and a hint to run knowledge status. Any other manifest error is rethrown with stack. The check lives in cmdRemove only, NOT performRemoval. The pending- removal queue's drain path calls performRemoval directly for entries whose wu may have been deleted from the registry after the failure was queued (e.g. absorption). Those still need to clean up chunks that remain in the store. Verified every skill flow that calls knowledge remove (cancel, supersede, promote, absorb-into-epic, epic-topic-cancel) invokes it while the wu is still in the registry — so no caller breaks. Tests: Test 41 flipped from 'reports 0 removed' for nonexistent wu to 'exits non-zero + error names wu + mentions project manifest'. Test 44 updated to create the WU first before asserting empty-store no-op. Three guard assertions confirmed to fail on pre-fix code. 165/165 CLI, smoke, integration all green.

Changing the config (e.g. 128-dim stub → 1536-dim OpenAI) and running 'knowledge index' silently 'succeeded' with 'N already indexed.' The preflight isIndexed() check short-circuited per file before anything could notice the mismatch, so the existing store kept its old dims while the config drifted. 'query' surfaced the mismatch cryptically later; 'index' fooled the user into thinking nothing needed doing. Added a preflight resolveProviderState() call at the top of cmdIndexBulk, right after ensuring the knowledge dir. If metadata exists and diverges from current cfg/provider, the existing UserError ('Run knowledge rebuild to reindex') is thrown — same message as query, no stack noise thanks to the centralised UserError handling. First-ever bulk index (no metadata yet) skips the check naturally. Single-file cmdIndex already surfaces the mismatch through indexSingleFile → resolveProviderState, unaffected by this change. performRemoval and the pending-removal drain path are unaffected. Tests: Test 22b added after the query-mismatch test (22), same scenario flipped to bulk index. Three assertions (exits non-zero, error mentions rebuild, no 'already indexed' in output) all confirmed to fail on pre-fix code by reverting the preflight block. 168/168 CLI tests, smoke, integration, store all green.

…usage Both SKILL.md files had a signpost blockquote at their knowledge-usage load step saying 'proactive querying is available' — directly contradicting knowledge-usage.md Section E, which says 'do not query during planning' and 'rare in practice' for implementation. Signposts render first; a model reads the invitation before loading the reference that restricts it. Rewrote both to match the guidance: - Planning: 'Planning operates from the spec as the golden source — the guide documents the narrow cases where a KB query is warranted, and those where it is not.' - Implementation: 'Implementation reads the code as the source of truth for *what* exists — the guide documents the rare cases where the KB is useful for the *why* behind an existing pattern.' Prose-only change. No code, no tests touched. 168/168 CLI + smoke green (sanity).

CLAUDE.md defines exactly one form for terminal stops: **STOP.** Do not proceed — terminal condition. Nine sites used the non-canonical 'This is terminal — do not return to the caller/backbone.' or 'the invoked skill takes over.' These risk a model continuing past the handoff because the sentence doesn't parse as a stop gate by convention. Agent 1's audit flagged 2 sites (manage-work-unit.md, absorb-into- epic.md); grep across skills/ found 7 more with the identical pattern. All 9 replaced with 'Invoke … <skill>.\n\n**STOP.** Do not proceed — terminal condition.' Files changed: - continue-bugfix/SKILL.md - continue-cross-cutting/SKILL.md - continue-feature/SKILL.md - continue-quickfix/SKILL.md - workflow-discussion-entry/references/display-options.md - workflow-scoping-process/references/complexity-check.md (2 sites) - workflow-start/references/absorb-into-epic.md - workflow-start/references/manage-work-unit.md Prose-only. 168/168 CLI + smoke green. Final grep confirms zero non-canonical phrases remain.

cfg.similarity_threshold || 0.8 silently overrode an explicit 0 with the default — classic JS footgun where 0 is legitimate data but gets coerced as falsy. Setting similarity_threshold=0 means 'accept all vector matches, no filtering', a valid config the user was blocked from using. Changed to ?? (nullish coalescing). Only null/undefined fall back to 0.8; explicit 0 is now respected. Checked adjacent || fallbacks — dimensions, limit, provider, model — all have string/positive-integer semantics where 0 is meaningless. Kept them as-is. Only similarity_threshold had the real zero-is-valid case. 168/168 CLI tests, smoke, store, integration, config all green.

Migration 037 is on phase-5+ only — never deployed to main, never present in any project's migration log. So editing it now actually helps the first real user who runs the post-merge AGNTC update. Contrast with migrations 031-036 which are deployed and off-limits. Two issues fixed: 1. report_update was called unconditionally on node-exit-0, even when zero work units needed backfilling. Now: the node block counts modifications and exits 2 when nothing ran; the bash wrapper routes exit 2 to report_skip. Keeps the orchestrator's FILES_UPDATED counter honest and prevents the false 'review changes' prompt on a no-op run. 2. node stderr was being swallowed via 2>/dev/null. A future bug inside the node script would leave the migration silently skipping. Removed the redirect — any stderr now passes through to the orchestrator. Verified manually by injecting a typo: ReferenceError stack is now visible. Wrapper pattern 'node … && rc=0 || rc=$?' chosen so the non-zero exit-2 doesn't trip 'set -eo pipefail' in the orchestrator (same pattern I built for 036 before that fix was withdrawn on the deployed-migration grounds). Tests: test-migration-037.sh harness updated — report_update / report_skip stubs now record which was called via $REPORT_CALLED. Three new tests: - test_counter_reports_update_when_modified - test_counter_reports_skip_when_nothing_to_do - test_counter_reports_skip_on_empty_workflows Confirmed the two 'should skip' assertions fail on pre-fix code by reverting the wrapper temporarily. 16/16 migration-037 tests pass on the fix. 168/168 CLI + smoke unaffected.

Every other entry-point skill's Step 0.2 (start-bugfix, start-feature, start-epic, start-cross-cutting, continue-*, workflow-migrate-entry) wraps the migration invocation in a gate: #### If the /workflow-migrate skill has already been invoked in this conversation → Proceed to Step 0.3. #### Otherwise [invoke migrations] workflow-start ran migrations unconditionally. In normal flow it is always the first entry-point so the gate never fires either way, but the drift is real: a future maintainer reading workflow-start vs any other entry-point sees inconsistent patterns and has to guess which is canonical. Aligned with the convention. Prose only. 168/168 CLI + smoke green.

When a completed work unit is reactivated via view-completed.md the status flipped to in-progress but completed_at was left behind — manifest then carried a completion date for an in-progress work unit. Cosmetic today (nothing reads completed_at except compact, which gates on status first), but the pair is self-contradictory data and breaks if any future feature keys on completed_at. Added an exists-check + conditional delete in the 'was completed' branch. completed_at is backfilled by migration 036/037 for normal flows, but a completed work unit with no artifact files skips backfill, so check before deleting rather than accepting an error. Prose-only change in a skill file.

knowledge --help previously wrote USAGE to stderr and exited 1. Scripts probing the CLI treat that as a failure. Informational help should exit 0 by convention. Handled before arg parsing so the help paths are cheap and independent of other flag logic: if (rawArgs.includes('--help') || rawArgs.includes('-h') || rawArgs[0] === 'help') { process.stdout.write(USAGE + '\n'); process.exit(0); } No-args (user forgot a command) stays stderr + exit 1 — that is still a usage error. USAGE string updated with a '--help, -h' line. SKILL.md Invocation section notes both the help forms (stdout/0) and the no-args form (stderr/1). Verified all four paths empirically: --help, -h, help → stdout=1073B, stderr=0, exit 0. No-args → stdout=0, stderr=1073B, exit 1.

cmdRebuild's abort message wrote 'Aborted.\n' with no leading newline, so on an EOF-without-newline at the prompt the message collided with whatever the user had typed. Added a leading newline. Exit code stays 1 (already correct). Purely cosmetic.

…uting Scoping was the only exploratory phase missing a structured Contextual Query step. knowledge-usage.md Section E explicitly invites scoping to 'query throughout' (same posture as discussion/investigation), and the existing Step 2 nudge says 'query the knowledge base while gathering context' — but with no structural query step, the prompt was harder for Claude to action. Added Step 3 between Gather Context and Complexity Check, modelled on workflow-investigation-process Step 4. Subsequent steps renumbered 3→4, 4→5, 5→6, 6→7, 7→8. Also fixed a pre-existing bug in Step 0's resume routing: when a spec existed but the plan was incomplete, the comment said 'resume from format selection' but the link routed to Step 1 (Knowledge Usage), not the format-selection step (was Step 5, now Step 6). Updated the link to actually go to format selection. No external skill references the scoping step numbers — confirmed via grep across skills/. Markdown-only change.

withRetry skips retry for TypeError, ReferenceError, SyntaxError, RangeError, and UserError — the guard added incrementally across deferred-issue #9 (commit f984476), Important #7's UserError introduction, and the RangeError addition. Behaviour was untested: a future refactor reversing the guard would burn 7s of retry budget on permanent failures with no test catching it. Five new cases in test-knowledge-retry.cjs, one per error class. Each throws the specific type, asserts exactly one call and the original class survives. Confirmed all five fail on pre-guard code (seen 'actual: 3, expected: 1' — 3 retries instead of 1). 11/11 retry tests now pass.

Empirical probing of Orama's hybrid mode shows the deferred-issue #15 concern doesn't manifest. With a flat or poor-quality vector and a tight similarity threshold (e.g. 0.99), hybrid still returns BM25- driven hits — text matches come through regardless. The only way to get zero hybrid hits is when the term itself doesn't match anything, at which point a fulltext fallback also returns zero. The defensive fallback added in commit 33303da was therefore dead code: in every realistic scenario the original hybrid call already returns the right thing. Removing it. deferred-issues.md #15 reclassified RESOLVED → WITHDRAWN with the empirical evidence and a pointer to the brief commit history. kb-smoke.cjs: relabelled the previous '#15 fallback' check. The assertion still holds — hybrid surfaces BM25 hits even with weak vector matches — but it now describes what's actually being proven (Orama hybrid handles this natively) rather than the now-removed fallback path. 168/168 CLI + smoke + store + integration green.

…how batch example The reference contradicted itself. Opening prose said 'a single targeted query' / 'One query, one interpretation step', then later prose pointed at the batch idiom. The example showed only the single form. Skill files learn the shape from the example, so Claude defaulted to single-term queries even when the calling phase had multiple distinct angles to query. - Opening reframed: 'a focused query' / 'One invocation, one interpretation step' (covers both single and batch). - New paragraph naming when batch is appropriate (investigation has symptoms + subsystem; research/discussion/scoping usually have thinner starting context, single is fine). - Example block now shows both forms — single first, batch second with three positional placeholders. Prose only, single file.

leeovery added 30 commits April 20, 2026 19:59

cleanup: drop unused XDG_CONFIG_HOME export in test isolation

5c02bbe

The knowledge CLI resolves the system config path via os.homedir(), which only honours $HOME. XDG_CONFIG_HOME was dead weight.

Revert "fix(migrate): 036 no longer inflates FILES_UPDATED counter wh…

4a80725

…en nothing changes" This reverts commit 9b1a199.

chore: remove accidentally-committed analyzer debris

a8131e5

leeovery added 18 commits April 21, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(knowledge): Phase 8 — release process#257

feat(knowledge): Phase 8 — release process#257
leeovery wants to merge 48 commits intofeat/knowledge-base-phase-7from
feat/knowledge-base-phase-8

leeovery commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leeovery commented Apr 20, 2026

Summary

Stacking

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant