Skip to content

fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#99

Open
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:fix/audit-hardening
Open

fix: defense-in-depth hardening (path validation, atomicity, ReDoS, worker lifecycle)#99
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:fix/audit-hardening

Conversation

@andreinknv
Copy link
Copy Markdown
Contributor

Summary

Seven defense-in-depth findings from the audit pass, bundled into one PR for review tractability. Companion to #98 (which covers the user-visible correctness bugs).

1. Symlink-resistant path validation in sync paths

src/extraction/index.ts — sync() and getChangedFiles() previously did path.join(rootDir, filePath) and read the result without checking that the file actually stays within the project (the lexical validatePathWithinRoot is used for the indexAll path, but the sync paths skipped it). Both paths now use a new validatePathWithinRootReal (in src/utils.ts) that calls realpathSync so a regular file swapped for a symlink between scan and read is rejected rather than followed.

2. storeExtractionResult is now atomic

src/extraction/index.ts:1018-1098, src/db/queries.ts (new QueryBuilder.transaction)

The delete + insert + upsert sequence now runs inside a single SQLite transaction. A process kill mid-write can no longer leave the file's old data wiped while the new data is missing. Nested transactions inside insertNodes / insertEdges / insertUnresolvedRefsBatch / deleteFile collapse correctly via better-sqlite3's SAVEPOINT support (verified with a quick standalone repro before committing).

3. recycleWorker is now actually async

src/extraction/index.ts:573

Previously declared void but called via await — the await was on undefined, so the new worker spawned before the old one finished terminating. Now properly awaits worker.terminate() with a 1s timeout, and fires a fire-and-forget terminate after the timeout to avoid zombie threads if WASM is wedged.

4. Parse-worker handles unknown messages

src/extraction/parse-worker.ts:60-78

If a message arrives with an unknown type and an id, the worker now sends back a structured error instead of silently dropping it (which previously caused the main thread to block until the per-file timeout). Messages without an id are still ignored — no pending promise to unblock.

5. ReDoS-safe glob matching

src/utils.ts (new globToSafeRegex), src/bin/codegraph.ts:1163, src/graph/queries.ts:196

The --filter flag and findByQualifiedName previously built regexes by simple substitution (*[^/]*, **.+). A hostile input like ***** produced [^/]*[^/]*[^/]*[^/]*[^/]* — five chained quantifiers that catastrophically backtrack on long inputs.

The new helper walks character-by-character and coalesces runs of *: any run of 2+ stars maps to .*, a single * to [^/]*. Caps input at 1024 chars. No sentinel strings (an earlier draft with a sentinel was rewritten after a reviewer flagged the collision risk if user input happened to contain the sentinel).

6. fileUriToPath fallback hardening

src/mcp/index.ts:36-40

The catch-block fallback for malformed file:// URIs now runs through path.resolve() so a file:///../etc/passwd-style input is normalized rather than handed raw to downstream filesystem code.

7. Reference dedup at extraction + dead-code removal

src/extraction/tree-sitter.ts (new dedupeReferences), src/db/queries.ts (deleteUnresolvedByNode removed)

A function that calls foo() 100 times now produces 1 unresolved reference instead of 100. The resolver collapsed duplicates eventually, but extraction was wasteful — this is mostly a perf win on large files. Also removed the never-called deleteUnresolvedByNode helper (FK cascade handles cleanup automatically).

Files changed

File Change
src/utils.ts Add validatePathWithinRootReal, replace globToSafeRegex with sentinel-free implementation
src/extraction/index.ts Use validatePathWithinRootReal in 4 sync sites, wrap storeExtractionResult in a transaction, make recycleWorker async
src/extraction/tree-sitter.ts Add dedupeReferences applied at end of extract()
src/extraction/parse-worker.ts Handle unknown message types
src/db/queries.ts Add QueryBuilder.transaction(), remove dead deleteUnresolvedByNode
src/bin/codegraph.ts Use globToSafeRegex for --filter
src/graph/queries.ts Use globToSafeRegex for findByQualifiedName
src/mcp/index.ts Run fallback fileUriToPath through path.resolve()
__tests__/security.test.ts 3 new tests for globToSafeRegex

Test plan

  • npm test (serialized): 383/383 pass (was 380, added 3 ReDoS tests)
  • npx tsc --noEmit clean
  • Standalone repro confirmed nested-transaction safety in better-sqlite3
  • Independent reviewer pass — caught a sentinel-collision risk in an earlier draft of globToSafeRegex (rewritten without sentinels), a missed force-kill after the recycleWorker timeout (added), and the unused validatePathWithinRootReal (now wired into the sync sites)

🤖 Generated with Claude Code

Bundles seven smaller hardening fixes from the audit pass:

1. Symlink-resistant path validation in sync paths
   src/extraction/index.ts:1148, 1206, 1293, 1347
   src/utils.ts (new validatePathWithinRootReal)

   getChangedFiles() and sync() now resolve symlinks via realpath before
   reading, so a regular file swapped for a symlink to outside the
   project (between scan and read) is rejected rather than followed.
   The lexical-only validatePathWithinRoot stays for paths that don't
   yet exist on disk.

2. storeExtractionResult is now atomic
   src/extraction/index.ts:1018-1098, src/db/queries.ts (new
   QueryBuilder.transaction)

   delete + insert + upsert run in a single SQLite transaction, so a
   process kill mid-write can no longer leave the file's old data
   wiped while the new data is missing. Nested transactions inside
   insertNodes/insertEdges/insertUnresolvedRefsBatch/deleteFile work
   correctly because better-sqlite3 collapses them via SAVEPOINT.

3. recycleWorker is now actually async
   src/extraction/index.ts:573

   Previously declared sync but called via `await`. Now properly
   awaits worker.terminate() with a 1s timeout (so a wedged WASM
   worker can't block the caller indefinitely) and force-fires
   terminate after the timeout to avoid zombie threads.

4. Parse-worker handles unknown message types
   src/extraction/parse-worker.ts:60-78

   When a malformed message arrives with an `id`, the worker sends
   back a structured error instead of silently dropping it (which
   previously caused the main thread to block until the per-file
   timeout). Messages without an id are still ignored — no pending
   promise to unblock.

5. ReDoS-safe glob matching
   src/utils.ts (new globToSafeRegex), src/bin/codegraph.ts:1163,
   src/graph/queries.ts:196

   Coalesces runs of `*` so hostile inputs like `*****` map to a
   single `.*` rather than five chained quantifiers (which would
   catastrophically backtrack on long inputs). Caps input at 1024
   chars. Implementation walks character-by-character with no
   sentinels — avoids any chance of marker-string collisions with
   user input.

6. fileUriToPath fallback hardening
   src/mcp/index.ts:36-40

   Catch-block fallback for malformed file:// URIs now runs through
   path.resolve() so a `file:///../etc/passwd` style input is
   normalized rather than passed raw to downstream filesystem code.

7. Reference dedup at extraction time + dead-code removal
   src/extraction/tree-sitter.ts (new dedupeReferences),
   src/db/queries.ts (deleteUnresolvedByNode removed)

   A function that calls `foo()` 100 times now produces 1 unresolved
   reference instead of 100; the resolver collapses duplicates anyway
   but extraction was wasteful. Removed the never-called
   deleteUnresolvedByNode helper (FK cascade handles cleanup).

3 new tests for globToSafeRegex; full suite 383/383 passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant