Skip to content

refactor(watcher): single read gate for the index path; name the trigram cap#637

Closed
justrach wants to merge 1 commit into
issue-635-large-file-skipfrom
refactor-watcher-read-gate
Closed

refactor(watcher): single read gate for the index path; name the trigram cap#637
justrach wants to merge 1 commit into
issue-635-large-file-skipfrom
refactor-watcher-read-gate

Conversation

@justrach

Copy link
Copy Markdown
Owner

Why

#635 (PR #636) had to bump the file-size cap in five separate copies of the same stat → cap → read → null-byte check block in watcher.zig — and one of those copies disagreed with the trigram intent, which was the #635 bug. The duplication is the root cause, not the symptom. This consolidates it so the next cap change is a one-line edit.

Stacks on issue-635-large-file-skip (needs the max_indexed_file_bytes constant). Retarget to release/0.2.5826 once #636 merges.

What

  • One read gate. New readIndexableFile(io, dir, path, alloc, size, warn_oversize) owns the cap, the read, and the binary check. The five sites (parseInitialScanEntry, readFileEntry, indexFileOutline, indexFileContent, plus the warn path) now call it. hashFile keeps its streaming reader (no full-buffer alloc) and is untouched.
  • Name the trigram cap. The bare 1024 * 1024 trigram-byte threshold was duplicated across seven sites — the same magic-number smell. Hoisted to max_trigram_file_bytes, so the index cap and trigram cap are visibly paired.
  • Vectorize the binary check. Per-site scalar for loop → std.mem.indexOfScalar (which frees on the binary-skip path, matching prior defer-based behavior).
  • Fix stale comments left by Files 512KB–1MB silently dropped from the index (contradicts the 1MB trigram intent) #635 (512KB → 2 MB cap; 64KB → trigram cap).

Behavior

Preserving — same cap, same warn-on-oversize (only the initial-scan path warns), same binary/oversize skips. Net +9 lines but −5 duplicated blocks and −6 magic numbers.

Tests

Full suite green (zig build test); zig fmt clean. No behavior change, so the existing #635 + index tests are the safety net.

Not in scope (separate issues)

  • Double full-file read on same-size edits (hashFile then indexFileContent) — real perf win, but touches change-detection sentinels and wants its own characterization test.

🤖 Generated with Claude Code

…ram cap

#635 had to edit the file-size cap in five separate copies of the
stat→cap→read→null-byte-check block, and one copy disagreed with the trigram
intent — that duplication was the bug. Extract one `readIndexableFile` helper
that owns the cap, the read, and the binary check, so the threshold lives in a
single place and can't drift across call sites again.

Also name the trigram byte threshold: the bare `1024 * 1024` literal was
duplicated across seven sites (the same magic-number smell). Hoist it to
`max_trigram_file_bytes` so the two thresholds (index cap vs trigram cap) are
visibly paired and editable in one spot.

Behavior-preserving: same cap, same warn-on-oversize (only the initial-scan
path warns), same binary skip. Folds the per-site scalar null-byte loop into a
vectorized `std.mem.indexOfScalar`. Fixes the stale "512KB"/"64KB" comments
left by #635. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 75786 70792 -6.59% -4994 OK
codedb_changes 11143 11330 +1.68% +187 OK
codedb_context 770803 782875 +1.57% +12072 OK
codedb_deps 351 337 -3.99% -14 OK
codedb_edit 41763 46778 +12.01% +5015 NOISE
codedb_find 2959 2816 -4.83% -143 OK
codedb_hot 26550 29992 +12.96% +3442 NOISE
codedb_outline 16339 16466 +0.78% +127 OK
codedb_read 13950 13346 -4.33% -604 OK
codedb_search 69251 67963 -1.86% -1288 OK
codedb_snapshot 72970 79409 +8.82% +6439 OK
codedb_status 9589 10065 +4.96% +476 OK
codedb_symbol 55699 54292 -2.53% -1407 OK
codedb_tree 24096 26324 +9.25% +2228 OK
codedb_word 14321 12111 -15.43% -2210 OK

@justrach

Copy link
Copy Markdown
Owner Author

Landed on release/0.2.5826 via fast-forward alongside #636 (commit d4247ee). Closing — base branch now contains these commits.

@justrach justrach closed this Jun 22, 2026
@justrach justrach deleted the refactor-watcher-read-gate branch June 22, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant