perf(watcher): read changed files once on the live-update path#638
perf(watcher): read changed files once on the live-update path#638justrach wants to merge 1 commit into
Conversation
incrementalDiff and drainNotifyFile each hashed a changed file (full read) and then indexed it (a second full read) — two reads of the same bytes per change, up to 2× the max_indexed_file_bytes cap of IO after #635 widened it to 2 MB. Read once and reuse the buffer for both the content hash and indexing: - New `hashAndIndexFile` reads a file once, hashes it, and indexes it from the same buffer; returns the hash to store (0 = skipped, maxInt = IO error). - New `indexContentBuffer` is the shared "index a text buffer" primitive (binary check + trigram decision), reused by indexFileContent and hashAndIndexFile so the rules live in one place. - incrementalDiff reads the changed file once: when there is no prior hash (first edit since startup — the common case) it no longer does a separate hash pass, and it now stores a real content hash for size-changed files too, so a later revert-to-same-size is detected. The streaming `hashFile` helper is gone (its only callers are replaced). - drainNotifyFile (muonry instant re-index, fired on every edit) drops its unconditional index-then-hash double read. Behavior-preserving for change detection: same identical-content short-circuit (touch / git checkout), same binary-hashed-but-not-indexed split, same skip and IO-error sentinels. Wyhash over the full buffer equals the prior chunked stream. Full suite green; zig fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16e5fe91a9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| defer content_arena.deinit(); | ||
| const content = dir.readFileAlloc(io, path, content_arena.allocator(), .limited(max_indexed_file_bytes)) catch return std.math.maxInt(u64); | ||
| const hash = std.hash.Wyhash.hash(0, content); | ||
| indexContentBuffer(explorer, path, content, false) catch {}; |
There was a problem hiding this comment.
Propagate indexing failures before marking notifications handled
When indexContentBuffer fails in the muonry notification path (for example from a parser/index allocation error), this catch still lets hashAndIndexFile return a normal hash; drainNotifyFile then updates known and emits an event, so the subsequent poll sees the same mtime/size and will not retry, leaving the old index contents in place. The previous indexFileContent(...) catch continue behavior avoided marking the notification as processed on indexing failure, so please return an error/sentinel for indexing failures or avoid updating known unless indexing succeeds.
Useful? React with 👍 / 👎.
Why
incrementalDiffanddrainNotifyFileeach read a changed file twice: once to hash it (change detection), then again to index it. After #635 widened the cap to 2 MB, that's up to 2× max_indexed_file_bytes of IO per change.drainNotifyFile(the muonry instant re-index, fired on every edit) double-read unconditionally.Stacks on
refactor-watcher-read-gate(#637) — uses itsreadIndexableFile/max_trigram_file_bytes. Retarget torelease/0.2.5826once the stack merges.What
hashAndIndexFile— reads a file once, hashes it, and indexes from the same buffer. Returns the hash to store (0= skipped/over-cap,maxInt= IO error).indexContentBuffer— the shared "index a text buffer" primitive (binary check + trigram decision), reused byindexFileContentandhashAndIndexFileso the rules stay in one place.incrementalDiffreads the changed file once. When there's no prior hash (first edit since startup — the common case, since the in-memoryknownmap starts athash=0) it skips the separate hash pass entirely, and now stores a real content hash for size-changed files too (previously0), so a later revert-to-same-size is detected.drainNotifyFiledrops its index-then-hash double read.hashFiledeleted — its only two callers are replaced.Read counts
git checkout)Behavior
Change detection preserved: same identical-content short-circuit, same "binary files are hashed but not indexed" split, same skip/IO-error sentinels.
std.hash.Wyhash.hash(0, buf)over the whole buffer equals the prior 16 KB-chunked stream, so stored hashes stay comparable. The one intentional change is storing a real hash for size-changed files (strictly more correct).Tests
Full suite green (
zig build test);zig fmtclean. No new unit test:incrementalDiff/drainNotifyFileare private and only drivable through real mtime changes, which makes a unit test inherently flaky (sub-ms edits don't bump mtime). This is an internal, behavior-preserving change with no observable API surface — the existing index/snapshot suites are the safety net (same rationale as #637).🤖 Generated with Claude Code