Skip to content

fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#37

Open
mschreib28 wants to merge 1 commit into
mainfrom
upstream/fix/audit-hardening
Open

fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#37
mschreib28 wants to merge 1 commit into
mainfrom
upstream/fix/audit-hardening

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nSeven defense-in-depth findings from the audit pass, bundled into one PR for review tractability. Companion to colbymchenry#98 (which covers the user-visible correctness bugs).\n\n### 1. Symlink-resistant path validation in sync paths\n\nsrc/extraction/index.ts — sync() and getChangedFiles() previously did path.join(rootDir, filePath) and read the result without checking that the file actually stays within the project (the lexical validatePathWithinRoot is used for the indexAll path, but the sync paths skipped it). Both paths now use a new validatePathWithinRootReal (in src/utils.ts) that calls realpathSync so a regular file swapped for a symlink between scan and read is rejected rather than followed.\n\n### 2. storeExtractionResult is now atomic\n\nsrc/extraction/index.ts:1018-1098, src/db/queries.ts (new QueryBuilder.transaction)\n\nThe delete + insert + upsert sequence now runs inside a single SQLite transaction. A process kill mid-write can no longer leave the file's old data wiped while the new data is missing. Nested transactions inside insertNodes / insertEdges / insertUnresolvedRefsBatch / deleteFile collapse correctly via better-sqlite3's SAVEPOINT support (verified with a quick standalone repro before committing).\n\n### 3. recycleWorker is now actually async\n\nsrc/extraction/index.ts:573\n\nPreviously declared void but called via await — the await was on undefined, so the new worker spawned before the old one finished terminating. Now properly awaits worker.terminate() with a 1s timeout, and fires a fire-and-forget terminate after the timeout to avoid zombie threads if WASM is wedged.\n\n### 4. Parse-worker handles unknown messages\n\nsrc/extraction/parse-worker.ts:60-78\n\nIf a message arrives with an unknown type and an id, the worker now sends back a structured error instead of silently dropping it (which previously caused the main thread to block until the per-file timeout). Messages without an id are still ignored — no pending promise to unblock.\n\n### 5. ReDoS-safe glob matching\n\nsrc/utils.ts (new globToSafeRegex), src/bin/codegraph.ts:1163, src/graph/queries.ts:196\n\nThe --filter flag and findByQualifiedName previously built regexes by simple substitution (*[^/]*, **.+). A hostile input like ***** produced [^/]*[^/]*[^/]*[^/]*[^/]* — five chained quantifiers that catastrophically backtrack on long inputs.\n\nThe new helper walks character-by-character and coalesces runs of *: any run of 2+ stars maps to .*, a single * to [^/]*. Caps input at 1024 chars. No sentinel strings (an earlier draft with a sentinel was rewritten after a reviewer flagged the collision risk if user input happened to contain the sentinel).\n\n### 6. fileUriToPath fallback hardening\n\nsrc/mcp/index.ts:36-40\n\nThe catch-block fallback for malformed file:// URIs now runs through path.resolve() so a file:///../etc/passwd-style input is normalized rather than handed raw to downstream filesystem code.\n\n### 7. Reference dedup at extraction + dead-code removal\n\nsrc/extraction/tree-sitter.ts (new dedupeReferences), src/db/queries.ts (deleteUnresolvedByNode removed)\n\nA function that calls foo() 100 times now produces 1 unresolved reference instead of 100. The resolver collapsed duplicates eventually, but extraction was wasteful — this is mostly a perf win on large files. Also removed the never-called deleteUnresolvedByNode helper (FK cascade handles cleanup automatically).\n\n## Files changed\n\n| File | Change |\n|---|---|\n| src/utils.ts | Add validatePathWithinRootReal, replace globToSafeRegex with sentinel-free implementation |\n| src/extraction/index.ts | Use validatePathWithinRootReal in 4 sync sites, wrap storeExtractionResult in a transaction, make recycleWorker async |\n| src/extraction/tree-sitter.ts | Add dedupeReferences applied at end of extract() |\n| src/extraction/parse-worker.ts | Handle unknown message types |\n| src/db/queries.ts | Add QueryBuilder.transaction(), remove dead deleteUnresolvedByNode |\n| src/bin/codegraph.ts | Use globToSafeRegex for --filter |\n| src/graph/queries.ts | Use globToSafeRegex for findByQualifiedName |\n| src/mcp/index.ts | Run fallback fileUriToPath through path.resolve() |\n| __tests__/security.test.ts | 3 new tests for globToSafeRegex |\n\n## Test plan\n\n- [x] npm test (serialized): 383/383 pass (was 380, added 3 ReDoS tests)\n- [x] npx tsc --noEmit clean\n- [x] Standalone repro confirmed nested-transaction safety in better-sqlite3\n- [x] Independent reviewer pass — caught a sentinel-collision risk in an earlier draft of globToSafeRegex (rewritten without sentinels), a missed force-kill after the recycleWorker timeout (added), and the unused validatePathWithinRootReal (now wired into the sync sites)\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#99

Bundles seven smaller hardening fixes from the audit pass:

1. Symlink-resistant path validation in sync paths
   src/extraction/index.ts:1148, 1206, 1293, 1347
   src/utils.ts (new validatePathWithinRootReal)

   getChangedFiles() and sync() now resolve symlinks via realpath before
   reading, so a regular file swapped for a symlink to outside the
   project (between scan and read) is rejected rather than followed.
   The lexical-only validatePathWithinRoot stays for paths that don't
   yet exist on disk.

2. storeExtractionResult is now atomic
   src/extraction/index.ts:1018-1098, src/db/queries.ts (new
   QueryBuilder.transaction)

   delete + insert + upsert run in a single SQLite transaction, so a
   process kill mid-write can no longer leave the file's old data
   wiped while the new data is missing. Nested transactions inside
   insertNodes/insertEdges/insertUnresolvedRefsBatch/deleteFile work
   correctly because better-sqlite3 collapses them via SAVEPOINT.

3. recycleWorker is now actually async
   src/extraction/index.ts:573

   Previously declared sync but called via `await`. Now properly
   awaits worker.terminate() with a 1s timeout (so a wedged WASM
   worker can't block the caller indefinitely) and force-fires
   terminate after the timeout to avoid zombie threads.

4. Parse-worker handles unknown message types
   src/extraction/parse-worker.ts:60-78

   When a malformed message arrives with an `id`, the worker sends
   back a structured error instead of silently dropping it (which
   previously caused the main thread to block until the per-file
   timeout). Messages without an id are still ignored — no pending
   promise to unblock.

5. ReDoS-safe glob matching
   src/utils.ts (new globToSafeRegex), src/bin/codegraph.ts:1163,
   src/graph/queries.ts:196

   Coalesces runs of `*` so hostile inputs like `*****` map to a
   single `.*` rather than five chained quantifiers (which would
   catastrophically backtrack on long inputs). Caps input at 1024
   chars. Implementation walks character-by-character with no
   sentinels — avoids any chance of marker-string collisions with
   user input.

6. fileUriToPath fallback hardening
   src/mcp/index.ts:36-40

   Catch-block fallback for malformed file:// URIs now runs through
   path.resolve() so a `file:///../etc/passwd` style input is
   normalized rather than passed raw to downstream filesystem code.

7. Reference dedup at extraction time + dead-code removal
   src/extraction/tree-sitter.ts (new dedupeReferences),
   src/db/queries.ts (deleteUnresolvedByNode removed)

   A function that calls `foo()` 100 times now produces 1 unresolved
   reference instead of 100; the resolver collapses duplicates anyway
   but extraction was wasteful. Removed the never-called
   deleteUnresolvedByNode helper (FK cascade handles cleanup).

3 new tests for globToSafeRegex; full suite 383/383 passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants