refactor(watcher): single read gate for the index path; name the trigram cap#637
Closed
justrach wants to merge 1 commit into
Closed
refactor(watcher): single read gate for the index path; name the trigram cap#637justrach wants to merge 1 commit into
justrach wants to merge 1 commit into
Conversation
…ram cap #635 had to edit the file-size cap in five separate copies of the stat→cap→read→null-byte-check block, and one copy disagreed with the trigram intent — that duplication was the bug. Extract one `readIndexableFile` helper that owns the cap, the read, and the binary check, so the threshold lives in a single place and can't drift across call sites again. Also name the trigram byte threshold: the bare `1024 * 1024` literal was duplicated across seven sites (the same magic-number smell). Hoist it to `max_trigram_file_bytes` so the two thresholds (index cap vs trigram cap) are visibly paired and editable in one spot. Behavior-preserving: same cap, same warn-on-oversize (only the initial-scan path warns), same binary skip. Folds the per-site scalar null-byte loop into a vectorized `std.mem.indexOfScalar`. Fixes the stale "512KB"/"64KB" comments left by #635. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
#635 (PR #636) had to bump the file-size cap in five separate copies of the same
stat → cap → read → null-byte checkblock inwatcher.zig— and one of those copies disagreed with the trigram intent, which was the #635 bug. The duplication is the root cause, not the symptom. This consolidates it so the next cap change is a one-line edit.Stacks on
issue-635-large-file-skip(needs themax_indexed_file_bytesconstant). Retarget torelease/0.2.5826once #636 merges.What
readIndexableFile(io, dir, path, alloc, size, warn_oversize)owns the cap, the read, and the binary check. The five sites (parseInitialScanEntry,readFileEntry,indexFileOutline,indexFileContent, plus the warn path) now call it.hashFilekeeps its streaming reader (no full-buffer alloc) and is untouched.1024 * 1024trigram-byte threshold was duplicated across seven sites — the same magic-number smell. Hoisted tomax_trigram_file_bytes, so the index cap and trigram cap are visibly paired.forloop →std.mem.indexOfScalar(which frees on the binary-skip path, matching priordefer-based behavior).512KB→ 2 MB cap;64KB→ trigram cap).Behavior
Preserving — same cap, same warn-on-oversize (only the initial-scan path warns), same binary/oversize skips. Net +9 lines but −5 duplicated blocks and −6 magic numbers.
Tests
Full suite green (
zig build test);zig fmtclean. No behavior change, so the existing #635 + index tests are the safety net.Not in scope (separate issues)
hashFilethenindexFileContent) — real perf win, but touches change-detection sentinels and wants its own characterization test.🤖 Generated with Claude Code