fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#37
Open
mschreib28 wants to merge 1 commit into
Open
fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#37mschreib28 wants to merge 1 commit into
mschreib28 wants to merge 1 commit into
Conversation
Bundles seven smaller hardening fixes from the audit pass: 1. Symlink-resistant path validation in sync paths src/extraction/index.ts:1148, 1206, 1293, 1347 src/utils.ts (new validatePathWithinRootReal) getChangedFiles() and sync() now resolve symlinks via realpath before reading, so a regular file swapped for a symlink to outside the project (between scan and read) is rejected rather than followed. The lexical-only validatePathWithinRoot stays for paths that don't yet exist on disk. 2. storeExtractionResult is now atomic src/extraction/index.ts:1018-1098, src/db/queries.ts (new QueryBuilder.transaction) delete + insert + upsert run in a single SQLite transaction, so a process kill mid-write can no longer leave the file's old data wiped while the new data is missing. Nested transactions inside insertNodes/insertEdges/insertUnresolvedRefsBatch/deleteFile work correctly because better-sqlite3 collapses them via SAVEPOINT. 3. recycleWorker is now actually async src/extraction/index.ts:573 Previously declared sync but called via `await`. Now properly awaits worker.terminate() with a 1s timeout (so a wedged WASM worker can't block the caller indefinitely) and force-fires terminate after the timeout to avoid zombie threads. 4. Parse-worker handles unknown message types src/extraction/parse-worker.ts:60-78 When a malformed message arrives with an `id`, the worker sends back a structured error instead of silently dropping it (which previously caused the main thread to block until the per-file timeout). Messages without an id are still ignored — no pending promise to unblock. 5. ReDoS-safe glob matching src/utils.ts (new globToSafeRegex), src/bin/codegraph.ts:1163, src/graph/queries.ts:196 Coalesces runs of `*` so hostile inputs like `*****` map to a single `.*` rather than five chained quantifiers (which would catastrophically backtrack on long inputs). Caps input at 1024 chars. Implementation walks character-by-character with no sentinels — avoids any chance of marker-string collisions with user input. 6. fileUriToPath fallback hardening src/mcp/index.ts:36-40 Catch-block fallback for malformed file:// URIs now runs through path.resolve() so a `file:///../etc/passwd` style input is normalized rather than passed raw to downstream filesystem code. 7. Reference dedup at extraction time + dead-code removal src/extraction/tree-sitter.ts (new dedupeReferences), src/db/queries.ts (deleteUnresolvedByNode removed) A function that calls `foo()` 100 times now produces 1 unresolved reference instead of 100; the resolver collapses duplicates anyway but extraction was wasteful. Removed the never-called deleteUnresolvedByNode helper (FK cascade handles cleanup). 3 new tests for globToSafeRegex; full suite 383/383 passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary\n\nSeven defense-in-depth findings from the audit pass, bundled into one PR for review tractability. Companion to colbymchenry#98 (which covers the user-visible correctness bugs).\n\n### 1. Symlink-resistant path validation in sync paths\n\n
src/extraction/index.ts— sync() and getChangedFiles() previously didpath.join(rootDir, filePath)and read the result without checking that the file actually stays within the project (the lexicalvalidatePathWithinRootis used for the indexAll path, but the sync paths skipped it). Both paths now use a newvalidatePathWithinRootReal(insrc/utils.ts) that callsrealpathSyncso a regular file swapped for a symlink between scan and read is rejected rather than followed.\n\n### 2.storeExtractionResultis now atomic\n\nsrc/extraction/index.ts:1018-1098,src/db/queries.ts(newQueryBuilder.transaction)\n\nThe delete + insert + upsert sequence now runs inside a single SQLite transaction. A process kill mid-write can no longer leave the file's old data wiped while the new data is missing. Nested transactions insideinsertNodes/insertEdges/insertUnresolvedRefsBatch/deleteFilecollapse correctly via better-sqlite3's SAVEPOINT support (verified with a quick standalone repro before committing).\n\n### 3.recycleWorkeris now actually async\n\nsrc/extraction/index.ts:573\n\nPreviously declaredvoidbut called viaawait— theawaitwas onundefined, so the new worker spawned before the old one finished terminating. Now properly awaitsworker.terminate()with a 1s timeout, and fires a fire-and-forget terminate after the timeout to avoid zombie threads if WASM is wedged.\n\n### 4. Parse-worker handles unknown messages\n\nsrc/extraction/parse-worker.ts:60-78\n\nIf a message arrives with an unknowntypeand anid, the worker now sends back a structured error instead of silently dropping it (which previously caused the main thread to block until the per-file timeout). Messages without anidare still ignored — no pending promise to unblock.\n\n### 5. ReDoS-safe glob matching\n\nsrc/utils.ts(newglobToSafeRegex),src/bin/codegraph.ts:1163,src/graph/queries.ts:196\n\nThe--filterflag andfindByQualifiedNamepreviously built regexes by simple substitution (*→[^/]*,**→.+). A hostile input like*****produced[^/]*[^/]*[^/]*[^/]*[^/]*— five chained quantifiers that catastrophically backtrack on long inputs.\n\nThe new helper walks character-by-character and coalesces runs of*: any run of 2+ stars maps to.*, a single*to[^/]*. Caps input at 1024 chars. No sentinel strings (an earlier draft with a sentinel was rewritten after a reviewer flagged the collision risk if user input happened to contain the sentinel).\n\n### 6.fileUriToPathfallback hardening\n\nsrc/mcp/index.ts:36-40\n\nThe catch-block fallback for malformed file:// URIs now runs throughpath.resolve()so afile:///../etc/passwd-style input is normalized rather than handed raw to downstream filesystem code.\n\n### 7. Reference dedup at extraction + dead-code removal\n\nsrc/extraction/tree-sitter.ts(newdedupeReferences),src/db/queries.ts(deleteUnresolvedByNoderemoved)\n\nA function that callsfoo()100 times now produces 1 unresolved reference instead of 100. The resolver collapsed duplicates eventually, but extraction was wasteful — this is mostly a perf win on large files. Also removed the never-calleddeleteUnresolvedByNodehelper (FK cascade handles cleanup automatically).\n\n## Files changed\n\n| File | Change |\n|---|---|\n|src/utils.ts| AddvalidatePathWithinRootReal, replaceglobToSafeRegexwith sentinel-free implementation |\n|src/extraction/index.ts| UsevalidatePathWithinRootRealin 4 sync sites, wrapstoreExtractionResultin a transaction, makerecycleWorkerasync |\n|src/extraction/tree-sitter.ts| AdddedupeReferencesapplied at end ofextract()|\n|src/extraction/parse-worker.ts| Handle unknown message types |\n|src/db/queries.ts| AddQueryBuilder.transaction(), remove deaddeleteUnresolvedByNode|\n|src/bin/codegraph.ts| UseglobToSafeRegexfor--filter|\n|src/graph/queries.ts| UseglobToSafeRegexforfindByQualifiedName|\n|src/mcp/index.ts| Run fallbackfileUriToPaththroughpath.resolve()|\n|__tests__/security.test.ts| 3 new tests forglobToSafeRegex|\n\n## Test plan\n\n- [x]npm test(serialized): 383/383 pass (was 380, added 3 ReDoS tests)\n- [x]npx tsc --noEmitclean\n- [x] Standalone repro confirmed nested-transaction safety in better-sqlite3\n- [x] Independent reviewer pass — caught a sentinel-collision risk in an earlier draft ofglobToSafeRegex(rewritten without sentinels), a missed force-kill after the recycleWorker timeout (added), and the unusedvalidatePathWithinRootReal(now wired into the sync sites)\n\n🤖 Generated with Claude Code\nCopied from colbymchenry/codegraph#99