Skip to content

test(watcher): close the spec 13.1 verification gap (read-cache handoff via the watcher's own flush)#103

Merged
clay-good merged 1 commit into
mainfrom
test/spec-13.1-watch-handoff-regression
May 31, 2026
Merged

test(watcher): close the spec 13.1 verification gap (read-cache handoff via the watcher's own flush)#103
clay-good merged 1 commit into
mainfrom
test/spec-13.1-watch-handoff-regression

Conversation

@clay-good
Copy link
Copy Markdown
Owner

What this is

Follow-up verification for #102 (spec 13.1, watch-mode performance). I re-read the merged work, re-ran the full test/bench/build, and audited each of the five documented root causes against the code. The mechanism is sound and the existing tests are genuine. This PR adds two regression tests that close the one real proof gap I found. No source changes.

Verification performed (all real, all re-run on this machine)

Check Result
tsc --noEmit clean
npm run lint 0 errors (3 pre-existing warnings in an untouched file)
CI-mirror vitest run src examples 2936 passed / 2 skipped / 0 failures
npm run bench:watch next-call read after save 0.02 ms (cache HIT) vs 4.0 ms cold (169×); 50-file burst → 1 flush
watcher integration suite (real chokidar) 5/5 green

Root-cause → fix → test, audited:

  1. Full llm-context.json rewrite per save → coalesced to one write per batch (Step 1). ✔ covered (G2 burst→1-flush test).
  2. Forced cold re-parse on next tool call (mtime cache-bust)primeContextCache handoff (Step 2). ✔ covered in isolation — the gap this PR closes (see below).
  3. Full vector read + createTable(overwrite) → row-level updateFiles (delete(\filePath` IN …) + add`) (Step 3). ✔ covered, incl. the backtick-quoting trap.
  4. stderr line per change → one summary line per batch (Step 6). ✔ covered (G5).
  5. No cross-file coalescing → single pending Set + debounce + max-batch ceiling (Step 1). ✔ covered (G2/G3).

The gap I found

The merged PR proved root-cause #2 in two disconnected halves: a unit test proves primeContextCache→hit when called directly, and the integration tests prove freshness lands on disk through a real save. But nothing proved the two connect — that the watcher's own flush path (enqueue → flush → handleBatch → persistContext → primeContextCache) actually makes the next tool-call read a cache HIT instead of the cold ~2 MB re-parse that was the single biggest per-save cost. Freshness alone can't prove it (a cold re-parse also returns fresh data); the discriminator is whether a parse happened.

What this PR adds

  • mcp-watcher-incremental.test.ts (runs in CI): drives a flush via enqueue, spies on primeContextCache to capture the exact object the flush hands to the read cache, then asserts the next readCachedContext returns that same object reference — reference identity ⇒ no disk re-parse. Also asserts the flush primes exactly once.
  • mcp-watcher.integration.test.ts (real chokidar; npm run test:integration): the same proof through an actual file save + real FSWatcher, for end-to-end fidelity to the field scenario.

Finding for the owner (not changed here)

*.integration.test.ts is excluded from the default vitest config, so CI (npm run test:run) runs zero real-chokidar watcher tests — the integration suite only runs via npm run test:integration. Wiring the hermetic watcher integration file into CI would be worthwhile, but the full integration suite pulls in network/embedding-dependent files, so I left that as an owner decision rather than bundling it. The CI-covered incremental test added here gives regression protection in the meantime.

Honest limits (unchanged from #102)

This does not reproduce the original multi-second field symptom — that needs a real MCP client under load / a denser corpus than this box has. Confidence still rests on the root-cause analysis + the now-complete mechanism tests + the microbenchmark. This PR strengthens the second leg; it doesn't claim the field magnitude.

🤖 Generated with Claude Code

…cache handoff through the watcher's own flush path

Follow-up verification for PR #102 (spec 13.1). The merged PR proved
primeContextCache→hit in ISOLATION (the watcher calls a setter; the next read
returns it), and proved freshness lands on disk through a real chokidar save —
but nothing proved the two halves connect: that the watcher's OWN flush path
(enqueue → flush → handleBatch → persistContext → primeContextCache) makes the
next tool-call read a cache HIT rather than the cold ~2 MB re-parse that was
root-cause #2 of the field regression. These two regression tests close that gap.

- mcp-watcher-incremental.test.ts (runs in CI): drive a flush via enqueue, spy
  on primeContextCache to capture the exact object the flush hands to the read
  cache, then assert the next readCachedContext returns that SAME object
  (reference identity ⇒ no disk re-parse). Asserts the flush primes exactly once.
- mcp-watcher.integration.test.ts (real chokidar, local/`test:integration`):
  the same proof through an actual file save and a real FSWatcher, for end-to-end
  fidelity to the field scenario.

No source changes — the implementation was already correct; this only adds the
missing proof. Verification: tsc --noEmit clean; lint 0 errors; CI-mirror
(vitest run src examples) 2936 passed / 2 skipped / 0 failures; watcher
integration suite 5/5 green.

Note for the owner (not changed here): *.integration.test.ts is excluded from
the default vitest config, so CI (`npm run test:run`) runs zero real-chokidar
watcher tests — the integration suite only runs via `npm run test:integration`.
Wiring the hermetic watcher integration file into CI would be worthwhile, but the
full integration suite includes network/embedding-dependent files, so that's left
as an owner decision rather than bundled here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clay-good clay-good merged commit fdc4cd8 into main May 31, 2026
4 checks passed
@clay-good clay-good deleted the test/spec-13.1-watch-handoff-regression branch June 2, 2026 21:52
laurentftech pushed a commit to laurentftech/OpenLore that referenced this pull request Jun 3, 2026
Bump version for the v2.0.6 release (spec 13.1 — O(change) watch-mode
freshness). The release workflow publishes package.json's version on the v*
tag, so this must match the tag.

**Why this commit exists:** the first v2.0.6 publish (run 71639223540) failed
with `npm error You cannot publish over the previously published versions:
2.0.5` — the v2.0.6 tag was pushed at a commit where package.json still said
2.0.5 (the bump was never made), so `npm publish` tried to republish 2.0.5.

Two changes:
- Bump package.json + package-lock.json 2.0.5 → 2.0.6 (the missing bump).
- Add a "Verify tag matches package.json version" guard to the release
  workflow's validate job. A mismatched tag now fails fast and loud in
  `validate` with an actionable message, instead of sailing through and dying
  late in `npm publish` with the cryptic over-publish error.

After this merges, the v2.0.6 tag must be re-created at the merge commit (it
currently points at the pre-bump clay-good#103 merge) before re-running publish.

Verification: tsc --noEmit clean; lint 0 errors; build clean; CI-mirror
(vitest run src examples) 2936 passed / 2 skipped / 0 failures; guard logic
sanity-checked (v2.0.6→pass, v2.0.5→fail).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant