fix(daemon): WARM guarded fast-load + HOT per-write icacls spawn removal by githubrobbi · Pull Request #373 · skyllc-ai/UltraFastFileSearch

githubrobbi · 2026-06-09T04:13:37Z

Summary

Fixes two daemon performance regressions surfaced by the interleaved cross-tool
benchmark, plus the bench-harness changes that made them measurable.

1. WARM exact-query regression (v0.5.80 → v0.5.81) — `fix(daemon): guarded warm-load …`

Every warm load synchronously replayed the USN journal and rebuilt the compact
index, adding the full rebuild cost to the first query after start.

Guarded warm-load (cache::guarded_load): startup fast-serves the on-disk
compact cache when the persisted USN cursor lies inside the live journal
window (first_usn ≤ cursor ≤ next_usn) and lets the background per-shard
journal loop converge the bounded [cursor, live) delta. Falls back to a
synchronous full rebuild only when the cursor is absent / predates / postdates
the journal.
Cursor↔body lockstep: the loop previously persisted the cursor on every
save-threshold crossing, but handle_journal_save skips the body save for a
parked shard — so a demoted drive advanced its cursor while its cache stayed
frozen, letting the guard fast-load a stale body and strand the [demote, now)
delta after restart. Cursor persistence now lives in the applier and only
fires when a body was actually saved. PatchSink::trigger_save /
ApplyMsg::Save carry the cursor; full_rebuild seeds it in lockstep with its
own body write. Invariant: persisted cursor ≤ persisted body state (or 0).
Re-promote (body_loader) deliberately stays on the synchronous
USN-refresh path — it runs live, where the loop's cursor has already advanced
past the demote point, so deferring would be unsafe.

2. HOT per-write `icacls.exe` spawn (v0.5.110 → v0.5.111) — `fix(security): native Win32 ACL …`

create_new_secure_file shelled out to icacls.exe on every secured write — a
full process spawn (tens of ms) on a hot path, and it resolved the principal
from %USERNAME% (can diverge from the effective token owner under elevation).

Replaced the spawn with native Win32 ACL calls granting to the process token's
owner SID (always the correct principal).
Added create_new_file_exclusive (exclusive create_new, no owner-only ACL)
for non-secret outputs, and wired the daemon's --out=<path> result export to
it — the export must adopt the directory's normal permissions and must not pay
the per-query spawn. The randomised tmp name + create_new still closes the
symlink/TOCTOU window.

Bench harness (supporting commits)

Per-drive ES + daemon scoping, private Everything instance lifecycle, prime/wipe
of the ES bench DB, subset-invariant-gated diffing (ES ⊆ CPP ⊆ UFFS), and
readable per-round HOT content diffs.

Tests

journal_sink::tests::no_warm_shard_save_does_not_persist_cursor pins the
lockstep safety property.
Loop-level cursor-persistence tests updated to the new loop→sink contract;
RecordingSink captures the cursor argument.
decide_strategy boundary tests for the guard.
uffs-security tests cover create_new_file_exclusive + a regression guard
asserting the secure-file path spawns no subprocess.

File-size policy

Two files crossed the 800-LOC ceiling and were decomposed via sibling modules
(canonical pattern, no exception entries): journal_sink.rs → journal_sink/tests.rs,
and search.rs → search_file_sink.rs.

Verification

Full local pre-push gate green, including lint-ci-windows (cargo xwin clippy
against x86_64-pc-windows-msvc).

Loop shape: for pattern → for round → run(uffs,cpp,es) → compare → next round Within each round: - Tools run in a freshly-shuffled order (LCG seed from wall-clock ns) so no tool consistently benefits from OS FS cache. - After all three tools complete, line counts are shown inline on the [round N/M] progress line. - Output files are read, normalised (lowercase, headers stripped, sorted, deduped), and compared pairwise: uffs vs cpp — should be identical (same MFT, same flags) uffs vs es — Everything skips NTFS system files / ADS cpp vs es — same reasoning - Up to 10 examples per side are printed with '-' / '+' sigils so the operator can immediately spot naming/path-format patterns in diffs. - Output files are deleted after each round; no leftover CSVs. New helpers (no external deps): - normalise_paths(file) → sorted Vec<String> (UTF-8 + UTF-16LE) - diff_paths(a, b) → DiffResult (merge-walk, O(n)) - print_diff(...) → human-readable diff with example lines - DiffResult::is_identical()

COLD/WARM wall time is dominated by the MFT read + index (de)serialisation — a fixed per-drive cost independent of the query pattern. The pattern only changes the output-write cost, which scales with result-set size. Running all 7 patterns in COLD/WARM just re-measured the same index-load floor 7 times. Restrict COLD/WARM to two patterns that bracket the cost space: - exact (34 rows) → pure index-load floor - full_scan (3M+ rows) → max output-write cost All 7 patterns still run in HOT, where pure query execution dominates and the trie/prefix/regex/full-scan engines genuinely diverge. Saves ~5 destructive cold-starts per drive (~35s/drive) with no loss of signal.

The bench iterates one drive at a time, but uffs_start ran `uffs daemon start` with NO --drive flag, so the daemon discovered and loaded EVERY NTFS volume on the host. WARM and HOT load timings were therefore polluted with the load cost of drives that were never requested (e.g. --drives C,D,G still loaded E, F, ...). COLD was already correct: its query auto-spawns the daemon and the CLI forwards --drive <X> via extract_spawn_args, scoping the load. Fix: uffs_start now takes a &[String] of drives and forwards each as a separate --drive <X> to 'daemon start' (parse_daemon_start in uffs-cli accumulates them). Call sites: - skip-cold global warmup → all requested drives, probe each - WARM per-pattern restart → single current drive - HOT per-drive warmup → single current drive Now every phase loads only the drive(s) under test, matching COLD.

Ports the sandboxed-Everything pattern from crates/uffs-bench/src/run/es_instance.rs into the rust-script bench. ES not running (root cause of every es.exe exit=8 fast-fail): - At bench start, when ES is requested, KILL all running Everything.exe and launch a private instance: Everything.exe -config <tempini> -instance uffs-bench -startup The temp ini is generated from the permanent Everything.ini with ntfs_volume_includes/monitors=1 for ONLY the requested drives (0 for the rest) and all auto_include_*/auto_remove_* forced to 0. The permanent ini is never modified. - Poll es.exe -instance uffs-bench <D>: -get-result-count until every drive reports >0 (5 min budget), then run the bench with es_instance=uffs-bench so all es queries hit the sandbox. - Tear down with -instance uffs-bench -exit and remove the temp ini. - Skipped if --es-instance was passed or Everything.exe is missing. - New helpers ported verbatim: parse_ini_array, rebuild_ini, write_bench_ini, es_kill_existing, es_launch, es_wait_until_loaded, es_stop, find_everything. Output readability: - C++ File-sink run sends stderr to NUL (kept stdout inherited for freopen). The decorative 'Drives? .. / Finished in N s' footer no longer spams the console between CMD lines. - Round header on its own line; per-round counts labelled 'rows: uffs=.. cpp=.. es=..' and indented. - Fixed double 'exit=exit=Some(8)' (dropped redundant prefix). Diff output: - print_diff names the full absolute source file each side was read from plus its row count, and states lists are sorted+normalised.

Loop is DRIVE / PATTERN / ROUNDS. On every drive switch BOTH the daemon and the ES sandbox are re-scoped to EXACTLY the drive(s) under test, so all three tools share the same working set (fair vs ES, always pre-indexed, and the C++ tool, which re-reads every MFT). Per-drive step (scope_tools): - ES: relaunch the uffs-bench sandbox scoped to [drive] (kills any other instance), wait_until_loaded. - daemon: kill + start scoped to [drive], then prime_daemon() runs PRIME_ROUNDS (3) warm '*' searches with --no-output. Verified: the daemon still executes the full query (warms hot tier + OS FS cache); --no-output only skips row materialisation/IPC/stdout (see uffs-daemon/src/index/search.rs include_rows gate). - COLD/WARM daemon load-floor demo stays single-drive as before. All-drives aggregate step (when >1 drive): - ES reloaded with the full drive set, daemon restarted + primed across the full set, head-to-head with queries spanning every drive: uffs --drives=C,D,G, es with no path filter, uffs.com --drives=C,D,G. Plumbing: - run_uffs_to: --drives=CSV when the drive arg contains a comma, else --drive <d>. - run_es_to: omit the <D>:\ path filter for the CSV aggregate (the sandbox is already scoped to those drives). - Extracted the HOT head-to-head into run_hot_compare(cfg, drive, all_rows), reused for per-drive and aggregate. - Summary HOT table now includes the aggregate drive-spec row. - Removed the redundant once-at-start all-drives ES launch and skip_cold warmup; scope_tools supersedes both.

Cosmetic-only spacing so the run log is easier to scan: - blank line before the WARM section (separates it from COLD) - blank line before the ES sandbox (re)launch - blank line before the daemon prime - blank line before each HOT pattern block No behavior change.

Previously every round printed a diff for every active tool pair, producing noisy 'identical ✓' lines on clean runs. New rule: compute the diff but only call print_diff when the expected subset is broken — i.e. when the smaller tool returns rows not present in the larger: - cpp ⊆ uffs: print only if cpp has rows absent from uffs - es ⊆ uffs: print only if es has rows absent from uffs - es ⊆ cpp: print only if es has rows absent from cpp Identical sets and clean supersets (the expected steady state) are silent.

… mask Root cause: rebuild_ini only rewrote ntfs_volume_includes and ntfs_volume_monitors. ntfs_volume_load_recent_changes fell through the catch-all arm and inherited the permanent Everything.ini value verbatim (e.g. 1,1,0,0,0,0,0 — C and D), so the sandbox loaded recent-change journals for drives outside the bench set. Fix: - rebuild_ini now also rewrites ntfs_volume_load_recent_changes with the same per-drive bitmask as includes/monitors (1 only for bench drives). - write_bench_ini logs the exact bit->volume mapping it writes (e.g. 'C:=0 D:=0 E:=1 ...') so a wrong bitpattern or an empty/garbled ntfs_volume_paths is visible in the run log.

Root cause (diagnosed from a live run): write_bench_ini correctly wrote ntfs_volume_includes=0,0,1,... (E) — the new diagnostic confirmed 'C:=0 D:=0 E:=1 ...' — but the launched uffs-bench instance still loaded C and rewrote the temp ini to includes=1,0,0,... The instance was reusing the database from the PREVIOUS per-drive launch (the C run earlier in the same benchmark): Everything loads its persisted volume state, ignores the freshly-written includes mask, and rewrites the ini to match. es_wait then polls E and sees 0 forever. Fix: - Pin db_location to a known temp path (bench_db_path) in the rebuilt ini instead of the per-instance default. - Delete that db in es_launch after killing existing instances and before spawning, forcing a fresh index scoped to the current includes mask every launch. - es_stop now also removes the db (after a grace period) so no stale index is left for the next benchmark process. No behavior change for single-drive or first-launch runs; fixes the wrong-drive index on every relaunch with a different drive set.

… race) The previous order wrote the bench ini (correct includes mask, e.g. E=1) and THEN sent -exit to the still-running uffs-bench instance. That instance was launched with -config <same temp ini>, so on shutdown Everything wrote its current (wrong-drive, e.g. C) volume state back to the ini, clobbering our fresh mask. The newly spawned instance then read the clobbered ini and indexed the wrong drive — es polling saw E:0 forever even though our diagnostic logged the correct E=1 we wrote. This also explains why a manual launch with the identical command line worked: no stale instance was alive to overwrite the ini. Fix: reorder es_launch to kill existing instances -> grace -> wipe db -> write fresh ini -> spawn. With no instance alive to write back, the ini we spawn against is exactly the one we wrote.

…h body save Warm exact-query latency regressed (v0.5.80->v0.5.81) because every warm load synchronously replayed the USN journal and rebuilt the compact index. Startup warm load now uses a guard (cache::guarded_load) that fast-serves the on-disk compact cache when the persisted USN cursor lies inside the live journal window (first_usn <= cursor <= next_usn) and lets the background per-shard journal loop converge the bounded [cursor, live) delta; it falls back to a synchronous full rebuild only when the cursor is absent, predates, or postdates the journal. That fast path is only sound if the on-disk cursor never outruns the on-disk compact body. It did: the journal loop persisted the cursor on every save-threshold crossing, but handle_journal_save skips the body save for a parked shard, so a demoted drive kept advancing its cursor while its cache stayed frozen -> after restart the guard would fast-load a stale body and permanently strand the [demote, now] delta. Fix: move cursor persistence into the applier so it writes the cursor only when handle_journal_save actually saved a body. PatchSink::trigger_save and ApplyMsg::Save now carry the cursor; the loop no longer self-persists on save (it still seeds from the store and resets to 0 on wrap). full_rebuild seeds the cursor in lockstep with its own body write. Net invariant: persisted cursor <= persisted body state (or 0). Re-promote (body_loader) is left on the synchronous USN-refresh path on purpose: it runs live, where the loop's cursor has already advanced past the demote point, so deferring to the loop would be unsafe. Removed the orphaned guarded_load::load_body left over from the abandoned re-promote-via-guard approach. Tests: RecordingSink captures the cursor; loop-level persistence tests pinned to the new loop->sink contract; new journal_sink test no_warm_shard_save_does_not_persist_cursor pins the lockstep safety property; decide_strategy boundary tests for the guard. File-size policy: journal_sink.rs grew past the 800-LOC ceiling, so its inline #[cfg(test)] mod tests was extracted to a journal_sink/tests.rs sibling (canonical decomposition; use super::* keeps every private item in scope).

…e icacls spawn create_new_secure_file historically shelled out to icacls.exe to stamp an owner-only ACL on each newly created file. That is a full process spawn (tens of ms) on a path the daemon hits on every secured write, and it resolved the principal from %USERNAME%, which can diverge from the effective token owner under elevation (granting the wrong SID). Replace the icacls spawn with native Win32 ACL calls that grant to the process token's owner SID (always the correct principal), and add create_new_file_exclusive: an exclusive create_new open with no owner-only ACL, for outputs that must adopt the directory's normal permissions. Wire the daemon's `--out=<path>` result export to create_new_file_exclusive: the export is a user-chosen, non-secret file, so it should NOT inherit the daemon owner's restrictive ACL, and it must not pay the per-query spawn the old secure path imposed. The exclusive create still closes the symlink/TOCTOU window via the randomised tmp name + create_new. Tests: cover create_new_file_exclusive and add a regression guard asserting the secure-file path spawns no subprocess. File-size policy: the export-writer change pushed search.rs past the 800-LOC ceiling, so write_rows_to_file (a Self-less associated fn) was relocated to a search_file_sink.rs sibling and called as a free function.

githubrobbi added 13 commits June 8, 2026 12:23

Merge branch 'main' into feat/bench-hot-interleaved-diff

e81634b

githubrobbi enabled auto-merge (squash) June 9, 2026 04:14

githubrobbi merged commit 09eeab0 into main Jun 9, 2026
27 checks passed

githubrobbi deleted the feat/bench-hot-interleaved-diff branch June 9, 2026 04:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(daemon): WARM guarded fast-load + HOT per-write icacls spawn removal#373

fix(daemon): WARM guarded fast-load + HOT per-write icacls spawn removal#373
githubrobbi merged 13 commits into
mainfrom
feat/bench-hot-interleaved-diff

githubrobbi commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

githubrobbi commented Jun 9, 2026

Summary

1. WARM exact-query regression (v0.5.80 → v0.5.81) — fix(daemon): guarded warm-load …

2. HOT per-write icacls.exe spawn (v0.5.110 → v0.5.111) — fix(security): native Win32 ACL …

Bench harness (supporting commits)

Tests

File-size policy

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. WARM exact-query regression (v0.5.80 → v0.5.81) — `fix(daemon): guarded warm-load …`

2. HOT per-write `icacls.exe` spawn (v0.5.110 → v0.5.111) — `fix(security): native Win32 ACL …`