Skip to content

perf(db): batch node lookups, fix insertNode cache, auto-ANALYZE after writes#28

Open
mschreib28 wants to merge 1 commit into
mainfrom
upstream/feat/db-perf-and-cache-fix
Open

perf(db): batch node lookups, fix insertNode cache, auto-ANALYZE after writes#28
mschreib28 wants to merge 1 commit into
mainfrom
upstream/feat/db-perf-and-cache-fix

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nThree DB-layer improvements bundled in one PR. All low-risk, high-leverage.\n\n### 1. Batch getNodesByIds — fix N+1 in graph traversal\n\nQueryBuilder.getNodesByIds(ids[]) returns Map<id, Node> in one SQL round-trip (chunked at 500 for SQLite param-limit safety; cache-aware so already-cached entries are served from memory and only the misses hit SQL).\n\nReplaces 9 N+1 loops in src/graph/traversal.ts:\n\n- getCallersRecursive / getCalleesRecursive\n- getTypeAncestors / getTypeDescendants\n- findUsages\n- getImpactRecursive (both inner loops: contains-children + dependents)\n- findPath BFS frontier\n- traverseBFS / traverseDFS neighbor expansion\n- getChildren\n\nEach previously did getNodeById per edge inside a loop; for a function with N callers at depth D, that was N^D point reads. Now: one IN-list query per traversal step. Expected 10–50× speedup on deep / fan-out-heavy traversals (impact analysis on popular utilities, call graphs of central modules).\n\n### 2. insertNode cache invalidation — fix correctness bug\n\nQueryBuilder.insertNode uses INSERT OR REPLACE INTO nodes. The LRU nodeCache was invalidated by updateNode and deleteNode but NOT by insertNode, so a re-indexed node (replacing a cached row) would still serve the pre-replace version on next getNodeById until LRU eviction pushed it out.\n\nNow invalidates nodeCache.delete(node.id) at the top of insertNode (after validation, before SQL — so failed-validation early-returns don't churn the cache).\n\n### 3. Auto-maintenance after bulk writes\n\nDatabaseConnection.runMaintenance() runs:\n\n- PRAGMA optimize — incremental ANALYZE; only re-analyzes tables whose row counts changed materially since the last ANALYZE. Without it, the SQLite query planner has zero statistics on freshly-bulk-loaded tables and can pick wrong indexes.\n- PRAGMA wal_checkpoint(PASSIVE) — fold WAL pages back into the main DB file so the WAL doesn't grow unboundedly between automatic checkpoints (which fire at 1000 pages).\n\nBoth non-blocking and silently swallowed on failure — best-effort, never load-bearing. Wired into indexAll (when files were indexed) and sync (when files changed). No-op when nothing happened.\n\n## Files changed\n\n| File | Change |\n|---|---|\n| src/db/queries.ts | Add getNodesByIds; invalidate cache in insertNode |\n| src/db/index.ts | Add runMaintenance() helper |\n| src/graph/traversal.ts | Replace 9 N+1 loops with batch lookups |\n| src/index.ts | Call runMaintenance after indexAll / sync |\n| __tests__/db-perf.test.ts (NEW) | 9 regression tests covering batch lookup, chunking, cache hits, cache invalidation, maintenance no-throw on closed DB |\n\n## Test plan\n\n- [x] npm test: 388/389 pass on macOS (one pre-existing fs.watch flake)\n- [x] npx tsc --noEmit clean\n- [x] Independent reviewer pass before pushing — APPROVE; one info finding addressed (getChildren was the 9th N+1 site, now batched too)\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#108

…r writes

Three DB-layer improvements bundled in one PR.

## 1. Batch getNodesByIds — fix N+1 in graph traversal

QueryBuilder.getNodesByIds(ids[]) returns Map<id, Node> in one
round-trip (chunked at 500 for SQLite param-limit safety, cache-aware
so already-cached entries are served from memory and only the misses
hit SQL).

Replaces 9 N+1 loops in src/graph/traversal.ts:
- getCallersRecursive / getCalleesRecursive
- getTypeAncestors / getTypeDescendants
- findUsages
- getImpactRecursive (both inner loops: contains-children + dependents)
- findPath BFS frontier
- traverseBFS / traverseDFS neighbor expansion
- getChildren

Each previously did `getNodeById` per edge inside a loop; for a function
with N callers at depth D, that was N^D point reads per traversal.
Now: one IN-list query per traversal step. Expected 10-50x speedup on
deep / fan-out-heavy traversals (impact analysis on popular utilities,
call graphs of central modules).

## 2. insertNode cache invalidation — fix correctness bug

QueryBuilder.insertNode uses INSERT OR REPLACE INTO nodes. The LRU
nodeCache was invalidated by updateNode and deleteNode but NOT by
insertNode, so a re-indexed node (replacing a cached row) would still
serve the pre-replace version on next getNodeById until LRU eviction
pushed it out.

Now invalidates `nodeCache.delete(node.id)` at the top of insertNode
(after validation, before SQL — so failed-validation early-returns
don't churn the cache).

## 3. Auto-maintenance after bulk writes

DatabaseConnection.runMaintenance() runs:
  - PRAGMA optimize    (incremental ANALYZE; only re-analyzes tables
                        whose row counts changed materially since last
                        ANALYZE — without it, the SQLite query planner
                        has zero statistics on freshly-bulk-loaded
                        tables and can pick wrong indexes)
  - PRAGMA wal_checkpoint(PASSIVE)
                       (fold WAL pages back into the main DB file so
                        the WAL doesn't grow unboundedly between
                        automatic checkpoints which fire at 1000 pages)

Both are non-blocking and silently swallowed on failure — best-effort,
never load-bearing for correctness. Wired into indexAll (when files
were indexed) and sync (when files changed). No-op when nothing
happened.

## Files changed

| File | Change |
|---|---|
| src/db/queries.ts | Add getNodesByIds; invalidate cache in insertNode |
| src/db/index.ts | Add runMaintenance() helper |
| src/graph/traversal.ts | Replace 9 N+1 loops with batch lookups |
| src/index.ts | Call runMaintenance after indexAll/sync |
| __tests__/db-perf.test.ts (NEW) | 9 regression tests |

## Test plan

- [x] npm test: 388/389 pass on macOS (one pre-existing fs.watch flake)
- [x] npx tsc --noEmit clean
- [x] Independent reviewer pass before pushing — APPROVE; one info
      finding addressed (getChildren was the 9th N+1 site, now batched
      too)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants