Skip to content

fix(scan): honor .codegraphignore on the git fast path#33

Open
mschreib28 wants to merge 1 commit into
mainfrom
upstream/fix/codegraphignore-git-fast-path
Open

fix(scan): honor .codegraphignore on the git fast path#33
mschreib28 wants to merge 1 commit into
mainfrom
upstream/fix/codegraphignore-git-fast-path

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nThe .codegraphignore marker (per-directory opt-out from indexing) was respected by scanDirectoryWalk (the filesystem-walk fallback) but silently ignored by getGitVisibleFiles (the git fast path) and getGitChangedFiles (sync's git path). Same project gave different file sets depending on whether .git existed — typically the marker "worked" only on non-git scratch projects and was a no-op everywhere else, the opposite of how most users encounter it.\n\n## What changed\n\nTwo helpers in src/extraction/index.ts:\n\n- findCodegraphIgnoredDirs(rootDir, files) — walks parent directories of the given file list, returns the set of dirs that contain a .codegraphignore marker. Walks once per unique parent directory with an early-out on shared ancestors.\n- isUnderCodegraphIgnoredDir(filePath, ignoredDirs) — true if filePath lives under any of those dirs.\n\nApplied at three sites:\n- scanDirectory and scanDirectoryAsync — between the git file list and the include-pattern filter\n- getGitChangedFiles — refactored to a two-pass collect-then-bucketize so the ignored-dir set is built once from the candidate paths\n\nThe marker file itself does not need to be tracked by git — fs.existsSync catches it whether it was committed or added as a local override.\n\n## Files changed\n\n| File | Change |\n|---|---|\n| src/extraction/index.ts | Add findCodegraphIgnoredDirs + isUnderCodegraphIgnoredDir; apply in scanDirectory, scanDirectoryAsync, getGitChangedFiles |\n| __tests__/codegraphignore.test.ts | 6 regression tests |\n\n## Test coverage\n\n- scanDirectory honors marker in a subdir on the git fast path\n- Marker at project root excludes everything\n- Marker in one subdir does not affect siblings\n- Marker added as local override (untracked) is still respected\n- Non-git fallback parity (sanity, pre-existing behavior unchanged)\n- Sync (getGitChangedFiles) ignores changes inside marker dirs\n\n## Known limitation (pre-existing, out of scope)\n\nIf a .codegraphignore marker is added after files in that directory have already been indexed, the next sync via the git fast path won't proactively delete those stale rows — git status doesn't report unchanged files. The next full indexAll (or a sync that falls into the filesystem-walk path) will clean them up. This is an existing characteristic of the git fast path; documenting here for transparency rather than fixing in this PR.\n\n## Test plan\n\n- [x] npm test: 386/386 pass on macOS (one pre-existing fs.watch flake under parallel load, passes in isolation)\n- [x] npx tsc --noEmit clean\n- [x] Independent reviewer pass before pushing — verdict APPROVE; addressed two info-level cleanups (JSDoc accuracy, removed dead try/catch around fs.existsSync which never throws)\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#103

The .codegraphignore marker (per-directory opt-out from indexing) was
respected by `scanDirectoryWalk` (the filesystem-walk fallback) but
silently ignored by `getGitVisibleFiles` (the git fast path) and
`getGitChangedFiles` (sync's git path). Same project gave different
file sets depending on whether `.git` existed — typically the marker
"worked" only on non-git scratch projects and was a no-op everywhere
else, which is the opposite of how most users encounter it.

This change adds two helpers in `src/extraction/index.ts`:

  - `findCodegraphIgnoredDirs(rootDir, files)` — walks parent directories
    of the given file list, returns the set of directories that contain
    a `.codegraphignore` marker. Walks once per unique parent directory,
    with an early-out on shared ancestors.

  - `isUnderCodegraphIgnoredDir(filePath, ignoredDirs)` — true if filePath
    lives under any of those dirs.

Applied in:
  - `scanDirectory` and `scanDirectoryAsync` — between the git file list
    and the include-pattern filter.
  - `getGitChangedFiles` — refactored to a two-pass collect-then-bucketize
    so the ignored-dir set is built once from the candidate paths.

The marker file itself does not need to be tracked by git — fs.existsSync
catches it whether it was committed or added as a local override.

## Files changed

| File | Change |
|---|---|
| src/extraction/index.ts | Add findCodegraphIgnoredDirs + isUnderCodegraphIgnoredDir; apply in scanDirectory, scanDirectoryAsync, getGitChangedFiles |
| __tests__/codegraphignore.test.ts | 6 regression tests |

## Test plan

- [x] npm test: 386/386 pass on macOS (one pre-existing fs.watch flake under parallel load, passes in isolation)
- [x] npx tsc --noEmit clean
- [x] Independent reviewer pass before pushing — APPROVE; addressed two info-level cleanups (JSDoc accuracy, removed dead try/catch around fs.existsSync)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants