Skip to content

perf(watcher): read changed files once on the live-update path#638

Closed
justrach wants to merge 1 commit into
refactor-watcher-read-gatefrom
optimize-watcher-single-read
Closed

perf(watcher): read changed files once on the live-update path#638
justrach wants to merge 1 commit into
refactor-watcher-read-gatefrom
optimize-watcher-single-read

Conversation

@justrach

Copy link
Copy Markdown
Owner

Why

incrementalDiff and drainNotifyFile each read a changed file twice: once to hash it (change detection), then again to index it. After #635 widened the cap to 2 MB, that's up to 2× max_indexed_file_bytes of IO per change. drainNotifyFile (the muonry instant re-index, fired on every edit) double-read unconditionally.

Stacks on refactor-watcher-read-gate (#637) — uses its readIndexableFile / max_trigram_file_bytes. Retarget to release/0.2.5826 once the stack merges.

What

  • hashAndIndexFile — reads a file once, hashes it, and indexes from the same buffer. Returns the hash to store (0 = skipped/over-cap, maxInt = IO error).
  • indexContentBuffer — the shared "index a text buffer" primitive (binary check + trigram decision), reused by indexFileContent and hashAndIndexFile so the rules stay in one place.
  • incrementalDiff reads the changed file once. When there's no prior hash (first edit since startup — the common case, since the in-memory known map starts at hash=0) it skips the separate hash pass entirely, and now stores a real content hash for size-changed files too (previously 0), so a later revert-to-same-size is detected.
  • drainNotifyFile drops its index-then-hash double read.
  • hashFile deleted — its only two callers are replaced.

Read counts

case before after
first edit since startup (common) 2 reads 1
same-size content change 2 reads 1
identical content (touch / git checkout) 1 read 1
muonry notify (every edit) 2 reads 1

Behavior

Change detection preserved: same identical-content short-circuit, same "binary files are hashed but not indexed" split, same skip/IO-error sentinels. std.hash.Wyhash.hash(0, buf) over the whole buffer equals the prior 16 KB-chunked stream, so stored hashes stay comparable. The one intentional change is storing a real hash for size-changed files (strictly more correct).

Tests

Full suite green (zig build test); zig fmt clean. No new unit test: incrementalDiff/drainNotifyFile are private and only drivable through real mtime changes, which makes a unit test inherently flaky (sub-ms edits don't bump mtime). This is an internal, behavior-preserving change with no observable API surface — the existing index/snapshot suites are the safety net (same rationale as #637).

🤖 Generated with Claude Code

incrementalDiff and drainNotifyFile each hashed a changed file (full read) and
then indexed it (a second full read) — two reads of the same bytes per change,
up to 2× the max_indexed_file_bytes cap of IO after #635 widened it to 2 MB.

Read once and reuse the buffer for both the content hash and indexing:

- New `hashAndIndexFile` reads a file once, hashes it, and indexes it from the
  same buffer; returns the hash to store (0 = skipped, maxInt = IO error).
- New `indexContentBuffer` is the shared "index a text buffer" primitive
  (binary check + trigram decision), reused by indexFileContent and
  hashAndIndexFile so the rules live in one place.
- incrementalDiff reads the changed file once: when there is no prior hash
  (first edit since startup — the common case) it no longer does a separate
  hash pass, and it now stores a real content hash for size-changed files too,
  so a later revert-to-same-size is detected. The streaming `hashFile` helper
  is gone (its only callers are replaced).
- drainNotifyFile (muonry instant re-index, fired on every edit) drops its
  unconditional index-then-hash double read.

Behavior-preserving for change detection: same identical-content short-circuit
(touch / git checkout), same binary-hashed-but-not-indexed split, same skip and
IO-error sentinels. Wyhash over the full buffer equals the prior chunked stream.
Full suite green; zig fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 48292 46903 -2.88% -1389 OK
codedb_changes 6099 6648 +9.00% +549 OK
codedb_context 632307 642001 +1.53% +9694 OK
codedb_deps 308 286 -7.14% -22 OK
codedb_edit 33571 33513 -0.17% -58 OK
codedb_find 1665 1665 +0.00% +0 OK
codedb_hot 14037 20020 +42.62% +5983 NOISE
codedb_outline 10366 10350 -0.15% -16 OK
codedb_read 9700 10629 +9.58% +929 OK
codedb_search 41843 43112 +3.03% +1269 OK
codedb_snapshot 72683 71813 -1.20% -870 OK
codedb_status 4734 4912 +3.76% +178 OK
codedb_symbol 44766 45056 +0.65% +290 OK
codedb_tree 15362 15357 -0.03% -5 OK
codedb_word 6908 8000 +15.81% +1092 NOISE

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16e5fe91a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/watcher.zig
defer content_arena.deinit();
const content = dir.readFileAlloc(io, path, content_arena.allocator(), .limited(max_indexed_file_bytes)) catch return std.math.maxInt(u64);
const hash = std.hash.Wyhash.hash(0, content);
indexContentBuffer(explorer, path, content, false) catch {};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate indexing failures before marking notifications handled

When indexContentBuffer fails in the muonry notification path (for example from a parser/index allocation error), this catch still lets hashAndIndexFile return a normal hash; drainNotifyFile then updates known and emits an event, so the subsequent poll sees the same mtime/size and will not retry, leaving the old index contents in place. The previous indexFileContent(...) catch continue behavior avoided marking the notification as processed on indexing failure, so please return an error/sentinel for indexing failures or avoid updating known unless indexing succeeds.

Useful? React with 👍 / 👎.

@justrach

Copy link
Copy Markdown
Owner Author

Landed on release/0.2.5826 via fast-forward alongside #636/#637 (commit 16e5fe9, now the release tip). Closing — base branch now contains these commits.

@justrach justrach closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant