perf(search): master speed — keep #2/#4 wins, gate #1, beat Everything across the matrix#375
Merged
Merged
Conversation
sort_and_localise ran a full O(N log N) value-sort even when limit admits every candidate (e.g. `*` full-scan, limit=usize::MAX). The downstream backend::sort_rows re-sorts the materialised rows by the user's column anyway and truncate is a no-op, so the value-sort is wasted work over millions of tuples. Add an early return for limit >= candidates.len() that does only the cheap MFT-locality sort (keeps DirCache warm for path resolution) and skips the value-sort entirely. Recovers the full_scan C,D regression (4.2s top-5 -> <=3.5s) without touching the limited-query path.
#4) #2 Trigram prefix fast-path: prefix queries (e.g. `win*`) now narrow candidates via the first-3-char trigram lookup then filter by full prefix, instead of scanning every record. Adds is_prefix_pattern() in tree.rs, a new prefix_search.rs module, and is_prefix dispatch arms in backend.rs (both search sites) + dispatch.rs (+ pick_mode_label). Expected: prefix C 91->~72ms, C,D 95->~82ms (beats ES). #4 Size-gated parallel path resolution: indices_to_rows dispatches sequential below RESOLVE_CHUNK_SIZE (4096) and par_chunks at/above it. 4096 keeps tiny exact queries (3-37 rows) off rayon (no p95 tail jitter) while letting prefix/substring (12K-34K rows) fan out. Expected: substring C 57->~38ms, C,D 58->~47ms. Decompose: extract the indices_to_rows family into the new sibling module row_resolve.rs so query/mod.rs stays under the 800-LOC policy (809 -> 694), no file_size_exceptions entry added. Tests: is_prefix_pattern acceptance matrix (tree.rs), prefix/glob parity + limit (query_tests), and a 9000-row parallel-resolve parity test guarding the chunk-reduce ordering.
…d.rs size exception backend.rs was 1067 LOC and carried a PERMANENT file_size_exceptions entry. Per workspace policy (decompose, don't suppress), move the self-contained DisplayRow type — struct + inherent impl + Default + uffs_format::FormatRow impl — into a new sibling module display_row.rs (289 LOC). backend.rs drops to 784 LOC, under the 800 ceiling. DisplayRow is re-exported (`pub use super::display_row::DisplayRow;`) so the single-import convention downstream relies on (uffs_core::search::backend::DisplayRow) is unchanged — public API and behavior preserved. Removes the backend.rs entry from scripts/ci/file_size_exceptions.txt.
Public-facing, fact-only benchmark snapshot of the verified-fresh cross-tool run (UFFS v0.5.120 vs Everything 1.4.1.1032) on C: + D: (7.97M records, Ryzen 9 3900XT / Win11 24H2). States results only, not methodology, linking docs/benchmarks/methodology.md for the fairness doctrine. Headline: UFFS wins 17/18 targeted head-to-head cells at p50 (median ~0.52x, ~1.9x faster); the 18th (C: prefix) is a 1ms tie. Mirrors the structure of the 2026-04 v0.5.66 report. REUSE: covered by the repo-wide ** -> MPL-2.0 annotation in REUSE.toml.
The golden cpp_*.txt baseline is immutable across reruns. Hashing a multi-GB file on every invocation wastes seconds for no benefit. Add compute_streaming_stats_cached: writes a .parityhash sidecar keyed on (size_bytes, mtime_nanos); subsequent runs skip the SHA256 pass entirely if the file hasn't changed. Falls back to a full recompute if the sidecar is absent, stale, or unreadable. Also annotates the baseline hash line with ', golden cached' so the operator can confirm the fast-path engaged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The synthesis build from the performance regression root-cause analysis: keep the two genuine winners (
#2trigram-prefix,#4parallel-resolve), gate the loser (#1unlimited value-sort), and drop the un-gated#3. Verified on a fresh-daemon full-matrix Windows benchmark.Commits
#1skip value-sort for unlimited match-all —sort_and_localiseearly-returns (MFT-locality sort only) whenlimit >= candidates.len(), eliminating a redundant full sort of millions of tuples on*full-scans.#2+#4trigram prefix fast-path + size-gated parallel resolve — restoressearch_compact_drive_prefix(trigram-acceleratedwin*), wiresis_prefixthrough bothMultiDriveBackend::searchandsearch_index, and gatesindices_to_rowsparallelism atPARALLEL_RESOLVE_THRESHOLD(50K) so tinyexactsets stay sequential (no rayon p95 jitter). Adds prefix-parity, limit, and parallel-resolve regression tests.backend.rsdecomposition — extractsDisplayRowintodisplay_row.rs, drops the file-size exception.Benchmark result (verified-fresh daemon, C: + D:, 7.97M records)
Acceptance gate MET: best-or-tied vs both 0.5.66 and Everything on every row; beats Everything on all 16 comparable rows (C: prefix is a 1ms tie). Sets six new bests: D:/C,D: full_scan, D: prefix, D:/C,D: substring, C,D: ext_dll. Median UFFS/ES ratio ~0.52x (~1.9x faster).
Verification
cargo clippy -D warningsclean,cargo test -p uffs-coregreen (829 lib tests + new parity/regression tests).lint-pre-pushgate green (incl. windows lint, doc-tests, smoke).main@ 0.5.119; 3 signed code commits + 1 signed docs commit.Note: published artifact will be v0.5.120 after the post-merge CI version bump.