perf(watcher): read changed files once on the live-update path by justrach · Pull Request #638 · justrach/codedb

justrach · 2026-06-22T04:50:31Z

Why

incrementalDiff and drainNotifyFile each read a changed file twice: once to hash it (change detection), then again to index it. After #635 widened the cap to 2 MB, that's up to 2× max_indexed_file_bytes of IO per change. drainNotifyFile (the muonry instant re-index, fired on every edit) double-read unconditionally.

Stacks on refactor-watcher-read-gate (#637) — uses its readIndexableFile / max_trigram_file_bytes. Retarget to release/0.2.5826 once the stack merges.

What

hashAndIndexFile — reads a file once, hashes it, and indexes from the same buffer. Returns the hash to store (0 = skipped/over-cap, maxInt = IO error).
indexContentBuffer — the shared "index a text buffer" primitive (binary check + trigram decision), reused by indexFileContent and hashAndIndexFile so the rules stay in one place.
incrementalDiff reads the changed file once. When there's no prior hash (first edit since startup — the common case, since the in-memory known map starts at hash=0) it skips the separate hash pass entirely, and now stores a real content hash for size-changed files too (previously 0), so a later revert-to-same-size is detected.
drainNotifyFile drops its index-then-hash double read.
hashFile deleted — its only two callers are replaced.

Read counts

case	before	after
first edit since startup (common)	2 reads	1
same-size content change	2 reads	1
identical content (touch / `git checkout`)	1 read	1
muonry notify (every edit)	2 reads	1

Behavior

Change detection preserved: same identical-content short-circuit, same "binary files are hashed but not indexed" split, same skip/IO-error sentinels. std.hash.Wyhash.hash(0, buf) over the whole buffer equals the prior 16 KB-chunked stream, so stored hashes stay comparable. The one intentional change is storing a real hash for size-changed files (strictly more correct).

Tests

Full suite green (zig build test); zig fmt clean. No new unit test: incrementalDiff/drainNotifyFile are private and only drivable through real mtime changes, which makes a unit test inherently flaky (sub-ms edits don't bump mtime). This is an internal, behavior-preserving change with no observable API surface — the existing index/snapshot suites are the safety net (same rationale as #637).

🤖 Generated with Claude Code

incrementalDiff and drainNotifyFile each hashed a changed file (full read) and then indexed it (a second full read) — two reads of the same bytes per change, up to 2× the max_indexed_file_bytes cap of IO after #635 widened it to 2 MB. Read once and reuse the buffer for both the content hash and indexing: - New `hashAndIndexFile` reads a file once, hashes it, and indexes it from the same buffer; returns the hash to store (0 = skipped, maxInt = IO error). - New `indexContentBuffer` is the shared "index a text buffer" primitive (binary check + trigram decision), reused by indexFileContent and hashAndIndexFile so the rules live in one place. - incrementalDiff reads the changed file once: when there is no prior hash (first edit since startup — the common case) it no longer does a separate hash pass, and it now stores a real content hash for size-changed files too, so a later revert-to-same-size is detected. The streaming `hashFile` helper is gone (its only callers are replaced). - drainNotifyFile (muonry instant re-index, fired on every edit) drops its unconditional index-then-hash double read. Behavior-preserving for change detection: same identical-content short-circuit (touch / git checkout), same binary-hashed-but-not-indexed split, same skip and IO-error sentinels. Wyhash over the full buffer equals the prior chunked stream. Full suite green; zig fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-22T04:53:11Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	48292	46903	-2.88%	-1389	OK
`codedb_changes`	6099	6648	+9.00%	+549	OK
`codedb_context`	632307	642001	+1.53%	+9694	OK
`codedb_deps`	308	286	-7.14%	-22	OK
`codedb_edit`	33571	33513	-0.17%	-58	OK
`codedb_find`	1665	1665	+0.00%	+0	OK
`codedb_hot`	14037	20020	+42.62%	+5983	NOISE
`codedb_outline`	10366	10350	-0.15%	-16	OK
`codedb_read`	9700	10629	+9.58%	+929	OK
`codedb_search`	41843	43112	+3.03%	+1269	OK
`codedb_snapshot`	72683	71813	-1.20%	-870	OK
`codedb_status`	4734	4912	+3.76%	+178	OK
`codedb_symbol`	44766	45056	+0.65%	+290	OK
`codedb_tree`	15362	15357	-0.03%	-5	OK
`codedb_word`	6908	8000	+15.81%	+1092	NOISE

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16e5fe91a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T04:53:27Z

+    defer content_arena.deinit();
+    const content = dir.readFileAlloc(io, path, content_arena.allocator(), .limited(max_indexed_file_bytes)) catch return std.math.maxInt(u64);
+    const hash = std.hash.Wyhash.hash(0, content);
+    indexContentBuffer(explorer, path, content, false) catch {};


Propagate indexing failures before marking notifications handled

When indexContentBuffer fails in the muonry notification path (for example from a parser/index allocation error), this catch still lets hashAndIndexFile return a normal hash; drainNotifyFile then updates known and emits an event, so the subsequent poll sees the same mtime/size and will not retry, leaving the old index contents in place. The previous indexFileContent(...) catch continue behavior avoided marking the notification as processed on indexing failure, so please return an error/sentinel for indexing failures or avoid updating known unless indexing succeeds.

Useful? React with 👍 / 👎.

justrach · 2026-06-22T07:40:46Z

Landed on release/0.2.5826 via fast-forward alongside #636/#637 (commit 16e5fe9, now the release tip). Closing — base branch now contains these commits.

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

justrach closed this Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(watcher): read changed files once on the live-update path#638

perf(watcher): read changed files once on the live-update path#638
justrach wants to merge 1 commit into
refactor-watcher-read-gatefrom
optimize-watcher-single-read

justrach commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

justrach commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 22, 2026

Why

What

Read counts

Behavior

Tests

Uh oh!

github-actions Bot commented Jun 22, 2026

Benchmark Regression Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

justrach commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant