perf(fts): debounce corpus stats recompute during sync#1548
Open
MrDirkelz wants to merge 1 commit into
Open
Conversation
`db.bulkPut` was calling `recomputeCorpusStats()` synchronously after every batch of Content docs, scanning the full Content doc set each time — O(n²) over a fresh sync. Switch to the existing `scheduleCorpusStatsRecompute()` debounce (10 s after last call) so stats are computed once after sync settles. Measured ~3× faster initial sync; no change to search behavior beyond brief staleness of BM25 `N`/`avgdl` during active sync.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
db.bulkPutwas callingrecomputeCorpusStats()synchronously after every batch of Content docs, scanning the full Content doc set each time — O(n²) over a fresh sync. Switch to the existingscheduleCorpusStatsRecompute()debounce (10 s after last call) so stats are computed once after sync settles. Measured ~3× faster initial sync; no change to search behavior beyond brief staleness of BM25N/avgdlduring active sync.1 doc, sum=0 831ms 33ms
2 docs, sum=0 750ms 41ms
100 docs, sum≈55k 980-1067ms 213-672ms
Total (~700 docs) ~7.9s ~2.4s
This is the result from using scheduleCorpusStatsRecompute instead of "await recomputeCorpusStats()" - this works because it debounces the full-corpus scan 10 seconds after sync quiets. I was able to get this comparison by using performance.now() in the bulkPut function for corpus stats.