feat: index files inside git submodules#93
Open
andreinknv wants to merge 1 commit into
Open
Conversation
`git ls-files` (used for both the initial scan and incremental sync)
does not enter submodules — they appear as gitlink entries with their
contents invisible. As a result, source files inside submodules were
silently skipped during indexing.
Both file-discovery paths now recurse into active submodules:
- getGitVisibleFiles (full index) enumerates active submodules via
`git submodule foreach --recursive --quiet 'echo "$displaypath"'`
and runs `git ls-files -co --exclude-standard` inside each, prefixing
the submodule path so files are reported relative to the parent root.
- getGitChangedFiles (sync) was refactored to share its status-parsing
logic between the parent repo and each submodule. Submodule directory
entries that the parent's status emits when a submodule pointer moves
(e.g., " m vendor/sub") are filtered out so we don't try to read a
directory as a file.
Submodule indexing is on by default and can be disabled via
`indexSubmodules: false` in CodeGraphConfig — useful for repos with
large vendor submodules that should remain unindexed without having to
add a path-based exclude. Uninitialized / missing submodules are
silently skipped (best-effort enhancement on top of the existing scan).
Status output paths are now C-style-unquoted before being used or
compared against the submodule directory set, so submodule paths
containing spaces or non-ASCII bytes are handled correctly. The parent
status command failing still falls back to the full filesystem scan via
a null return, preserving the prior contract; only submodule-internal
status failures are absorbed silently.
Closes colbymchenry#86.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71c3cd2 to
3ca2f51
Compare
|
Stumbled upon this issue - cool to see there is a pr opened for it! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
git ls-files(used for both the initial scan and incremental sync) doesn't enter submodules — they appear as gitlink entries with their contents invisible. Both file-discovery paths now recurse into active submodules. Closes #86.getGitVisibleFiles(full index) enumerates active submodules viagit submodule foreach --recursive --quietand runsgit ls-files -co --exclude-standardinside each, prefixing paths so files are reported relative to the parent root.getGitChangedFiles(sync) was refactored to share its status-parsing logic between the parent repo and each submodule. Submodule directory entries the parent's status emits (e.g.m vendor/subwhen the pointer moved) are filtered out so we don't try to read a directory as a file.Opt-out
Submodule indexing is on by default. For repos with very large vendor submodules, set
indexSubmodules: falseinCodeGraphConfigto skip them. Path-based excludes (e.g.'**/vendor/**') also still work.Behavior on failure
git statusfailure → falls back to the full filesystem scan (preserves the priornullcontract).git status/ls-filesfailure → silently absorbed (e.g. uninitialized or partially fetched submodules).Files changed
src/extraction/index.tsgetGitSubmodules/getSubmoduleFiles; recurse into submodules fromgetGitVisibleFiles; refactorgetGitChangedFilesto sharereadGitStatus; addunquoteGitPathfor C-style-quoted porcelain pathssrc/types.tsindexSubmodules?: booleantoCodeGraphConfig, defaulttrueinDEFAULT_CONFIG__tests__/sync.test.tsindexSubmodules: falseskips submodule contentsTest plan
npm test(serialized): 385/385 pass (was 379, added 6 — 5 submodule + 1 internal helper test indirectly)npx tsc --noEmitcleannpm run buildclean🤖 Generated with Claude Code