Skip to content

perf(fts): debounce corpus stats recompute during sync#1548

Open
MrDirkelz wants to merge 1 commit into
mainfrom
1547-shared-improve-syncing-performance-by-debouncing-corpus-scan
Open

perf(fts): debounce corpus stats recompute during sync#1548
MrDirkelz wants to merge 1 commit into
mainfrom
1547-shared-improve-syncing-performance-by-debouncing-corpus-scan

Conversation

@MrDirkelz

@MrDirkelz MrDirkelz commented Apr 21, 2026

Copy link
Copy Markdown
Collaborator

db.bulkPut was calling recomputeCorpusStats() synchronously after every batch of Content docs, scanning the full Content doc set each time — O(n²) over a fresh sync. Switch to the existing scheduleCorpusStatsRecompute() debounce (10 s after last call) so stats are computed once after sync settles. Measured ~3× faster initial sync; no change to search behavior beyond brief staleness of BM25 N/avgdl during active sync.

1 doc, sum=0 831ms 33ms
2 docs, sum=0 750ms 41ms
100 docs, sum≈55k 980-1067ms 213-672ms
Total (~700 docs) ~7.9s ~2.4s

This is the result from using scheduleCorpusStatsRecompute instead of "await recomputeCorpusStats()" - this works because it debounces the full-corpus scan 10 seconds after sync quiets. I was able to get this comparison by using performance.now() in the bulkPut function for corpus stats.

`db.bulkPut` was calling `recomputeCorpusStats()` synchronously after every
batch of Content docs, scanning the full Content doc set each time — O(n²)
over a fresh sync. Switch to the existing `scheduleCorpusStatsRecompute()`
debounce (10 s after last call) so stats are computed once after sync
settles. Measured ~3× faster initial sync; no change to search behavior
beyond brief staleness of BM25 `N`/`avgdl` during active sync.
@MrDirkelz MrDirkelz linked an issue Apr 21, 2026 that may be closed by this pull request
@MrDirkelz MrDirkelz self-assigned this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shared: Improve syncing performance by debouncing corpus scan

1 participant