fix(shell): use git ls-files for @ file mention in large repos#1588
fix(shell): use git ls-files for @ file mention in large repos#1588n-WN wants to merge 15 commits intoMoonshotAI:mainfrom
Conversation
7c0ff57 to
2da1698
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2da1698ad2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| except Exception: | ||
| return None | ||
|
|
||
| paths = _parse_ls_files_output(result.stdout) |
There was a problem hiding this comment.
Reapply ignore filters to git-discovered completion paths
list_files_git now returns parsed git ls-files output directly, so deep completion mode (@ queries with length ≥3 or any scoped path) can surface paths under ignored buckets like node_modules/, vendor/, and target/ whenever those directories are tracked. The previous walker filtered these names everywhere, and losing that filter can flood candidates and make file mentions much less usable in repos that commit dependencies or build artifacts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 976c71225d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/kimi_cli/utils/file_filter.py
Outdated
| paths: list[str] = [] | ||
| seen_dirs: set[str] = set() | ||
| for line in stdout.splitlines(): | ||
| line = line.strip() |
There was a problem hiding this comment.
Preserve filename whitespace when parsing git output
_parse_ls_files_output currently does line.strip(), which removes legitimate leading/trailing spaces from tracked filenames. In repos that contain files like " leading.txt" or "name ".py, completion entries are rewritten to a different path, so selecting the suggestion points to a non-existent file. This regression comes from the new git-based path source; the previous os.walk path handling preserved names exactly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 409c8c5996
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| except Exception: | ||
| return None | ||
|
|
||
| paths = _parse_ls_files_output(result.stdout) |
There was a problem hiding this comment.
Reapply ignore rules to git-discovered paths
list_files_git returns parsed git ls-files output without running is_ignored, so deep @ completion now includes tracked artifact/dependency trees like node_modules/, vendor/, and target/. This regresses the previous walker behavior and can flood completions in repos that commit generated assets. Fresh evidence: calling list_files_git on a temp repo with tracked node_modules/a.js returns that path, confirming no post-parse ignore filtering is applied.
Useful? React with 👍 / 👎.
src/kimi_cli/utils/file_filter.py
Outdated
| if dir_path not in seen_dirs: | ||
| seen_dirs.add(dir_path) | ||
| paths.append(dir_path) | ||
| paths.append(line) |
There was a problem hiding this comment.
Drop deleted tracked files from git completion list
git ls-files includes index entries for files deleted from the working tree but not yet staged, and _parse_ls_files_output appends each line directly, so completions can suggest paths that no longer exist on disk. The old os.walk implementation only returned existing files, so this is a correctness regression that can produce invalid @ suggestions until users run git add -u/commit.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 378c27c93a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if deleted: | ||
| paths = [p for p in paths if p.endswith("/") or p not in deleted] |
There was a problem hiding this comment.
Drop synthesized dirs for deleted tracked paths
list_files_git synthesizes directory entries in _parse_ls_files_output before removing git ls-files --deleted entries, but the deletion filter only removes file paths (p not in deleted). In a git repo where the last tracked file in a directory is deleted (and the directory removed on disk) without staging, completion still returns that non-existent directory (e.g., foo/), which regresses correctness versus the previous os.walk behavior that only surfaced existing paths.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8945e7def0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/kimi_cli/utils/file_filter.py
Outdated
| for line in stdout.splitlines(): | ||
| if not line: |
There was a problem hiding this comment.
Parse
git ls-files output in NUL-delimited mode
list_files_git currently relies on newline-delimited git ls-files output and _parse_ls_files_output treats each raw line as a literal path, but Git emits C-style quoted paths for special characters (tabs/newlines/quotes/backslashes), so candidates can become escaped strings (e.g. "tab\\tname.py") that do not exist on disk. I validated this with a temp repo: list_files_git returned quoted/escaped names while list_files_walk returned the real filename. git ls-files -h documents -z (“separate paths with the NUL character”); using -z parsing would prevent this path corruption.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e89cd99e67
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if self._is_git: | ||
| paths = list_files_git(self._root, scope) | ||
| self._git_index_mtime = git_index_mtime(self._root) |
There was a problem hiding this comment.
Reapply completion cap for git-backed deep searches
When a repo is detected, deep @ completion now calls list_files_git without applying self._limit, so the candidate set becomes unbounded while the os.walk fallback still enforces the limit. In large repos this can feed tens of thousands of paths into FuzzyCompleter on each completion request, causing noticeable UI latency and negating the guardrail that the limit constructor argument previously provided.
Useful? React with 👍 / 👎.
e89cd99 to
68db846
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68db846c12
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/kimi_cli/utils/file_filter.py
Outdated
| "--recurse-submodules", | ||
| ] | ||
| if scope: | ||
| cmd.append(scope + "/") |
There was a problem hiding this comment.
Insert
-- before scoped pathspec in git calls
When scope is appended directly (for example -docs/), git ls-files treats it as an option instead of a pathspec and exits with an error, so deep @ completion for top-level directories/files starting with - silently falls back to list_files_walk and can miss candidates once the walk limit is hit. I verified this with git ls-files (which expects options before positional <file> args): git ls-files ... -docs/ returns an unknown-switch error, while git ls-files ... -- -docs/ works. The same option-termination issue applies to the scoped --deleted/--others calls too.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7d394cfe8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/kimi_cli/utils/file_filter.py
Outdated
| """ | ||
| deleted = _git_ls_files_z(root, "--deleted", *_scope_args(scope)) | ||
| for sub in _git_submodule_paths(root): | ||
| if scope and not sub.startswith(scope + "/") and scope != sub: |
There was a problem hiding this comment.
Match scope against submodule prefix direction
The scope guard is reversed here: with a scoped query inside a submodule (for example scope="sub/pkg" and sub="sub"), not sub.startswith(scope + "/") evaluates true, so the submodule is skipped entirely. That means _git_deleted_files() and the analogous loop in list_files_git() do not apply --deleted/--others to that submodule, so scoped @sub/pkg/ completion can show stale deleted files and miss new untracked files. I reproduced this with a temp repo + submodule where untracked sub/pkg/b.py was present in unscoped results but missing in scoped results.
Useful? React with 👍 / 👎.
src/kimi_cli/utils/file_filter.py
Outdated
| parts = line.strip().split() | ||
| if len(parts) >= 2: | ||
| paths.append(parts[1]) |
There was a problem hiding this comment.
Parse submodule paths without whitespace splitting
Using line.strip().split() to parse git submodule status breaks submodule paths that contain spaces (e.g. sub mod), because only the first token after the SHA is kept. When that happens, later deleted/untracked scans run against a non-existent path (root/sub) and dirty-state reconciliation for the real submodule is skipped, so completions can include deleted files and miss untracked ones under spaced submodule directories.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45dd8df406
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "core.quotepath=false", | ||
| "ls-files", | ||
| "-z", | ||
| "--recurse-submodules", |
There was a problem hiding this comment.
Reconcile dirty submodules in git file listing
list_files_git enables git ls-files --recurse-submodules for tracked paths, but the follow-up deleted/untracked reconciliation only runs in the superproject (_git_deleted_files and --others --exclude-standard without submodule handling). In practice this returns stale candidates for dirty submodules: deleting sub/a.txt and creating sub/new.txt inside a submodule still yields sub/a.txt and misses sub/new.txt. I verified the CLI semantics here: git ls-files --deleted --recurse-submodules and git ls-files --others --exclude-standard --recurse-submodules both fail with fatal: ls-files --recurse-submodules unsupported mode, so the current approach cannot keep submodule results in sync with the working tree.
Useful? React with 👍 / 👎.
| "core.quotepath=false", | ||
| "ls-files", | ||
| "-z", | ||
| "--recurse-submodules", |
There was a problem hiding this comment.
🟡 --recurse-submodules in main listing without matching submodule handling in --deleted causes stale submodule files in completions
The main git ls-files command at src/kimi_cli/utils/file_filter.py:230 uses --recurse-submodules, so files inside submodules are included in the listing. However, _git_deleted_files at line 189 does NOT recurse into submodules (because git ls-files --deleted --recurse-submodules is unsupported by git — it errors with "unsupported mode"). Similarly, the --others command at line 252 doesn't recurse into submodules.
This asymmetry means that if a file is deleted (or renamed) inside a submodule's working tree but still exists in its index, it will appear in the main listing but won't be caught by _git_deleted_files, resulting in stale completion candidates for files that no longer exist on disk. The previous commit (f7d394c) attempted to fix this by manually recursing into submodules, but was reverted (45dd8df), re-introducing the inconsistency.
Prompt for agents
In src/kimi_cli/utils/file_filter.py, the main git ls-files command at line 230 uses --recurse-submodules, but _git_deleted_files (line 189) and the --others command (line 252) do not recurse into submodules because git doesn't support combining those flags with --recurse-submodules. This causes stale submodule files to appear in completions.
Two options to fix:
1. Remove --recurse-submodules from the main listing (line 230) to make behavior consistent — submodule files simply won't appear in completions, which is an acceptable trade-off for correctness.
2. Re-implement submodule recursion for --deleted and --others by enumerating submodule paths via git submodule status and running separate git ls-files commands inside each submodule directory (this was attempted in commit f7d394cf but reverted — investigate why and fix the approach).
Was this helpful? React with 👍 or 👎 to provide feedback.
…#1375) Replace os.walk with git ls-files as the primary file discovery mechanism for the @ file mention completer. This fixes large repos (e.g. apache/superset with 65k+ files) where the 1000-file limit caused late-alphabetical directories to be unreachable. Changes: - Add _get_git_paths(): runs `git ls-files --recurse-submodules` with 5s timeout, extracts directory entries from file paths - Add _walk_paths(): refactored os.walk fallback for non-git repos - Add scoped search: when fragment contains "/" (e.g. @src/utils/), both git and walk paths restrict to that subtree - Lazy git detection with _detect_git() (cached) - Scope-aware caching: cache invalidates when search scope changes Fixes MoonshotAI#1375, helps MoonshotAI#1276
…odule Extract ignore rules, git ls-files integration, and directory listing into kimi_cli/utils/file_filter.py. Both the shell completer and the web backend API now use the same filtering logic. Changes: - New kimi_cli/utils/file_filter.py: shared is_ignored(), detect_git(), list_files_git(), list_files_walk(), list_directory_filtered() - Shell prompt.py: delegate to file_filter instead of inline rules - Web sessions.py: use list_directory_filtered() which filters .git, node_modules, etc. (fixes MoonshotAI#1339) - New e2e tests (4): @ trigger, scoped search, git ls-files deep files, gitignore filtering
- Remove section separator banners and module docstring (match stdrc style) - Make all constants private (_IGNORED_NAMES, _IGNORED_PATTERNS) - Add untracked file discovery via git ls-files --others --exclude-standard - Add .git/index mtime-based cache invalidation (vs pure TTL) - Extract _parse_ls_files_output helper to reduce duplication - Add git_index_mtime() utility for cache staleness detection
Reject scope values containing '..' to prevent @../ from escaping the workspace root in both git ls-files and os.walk code paths.
The get_session_file endpoint is a general-purpose file browser, not an @ mention completer. Revert to listing all directory entries without is_ignored() filtering so web UI users can still see node_modules, build, dist, etc. when browsing their project.
Remove unnecessary strip() from _parse_ls_files_output — splitlines() already handles line endings, and strip() would silently mangle filenames with leading/trailing spaces.
Two regressions fixed: 1. Tracked ignored directories (node_modules/, vendor/, etc.) were bypassing is_ignored() when discovered via git ls-files, making completion inconsistent with top-level and os.walk fallback paths. Now _parse_ls_files_output applies is_ignored() to each path segment. 2. Files renamed/deleted on disk but not staged (e.g. mv old.py new.py without git add) remained as stale candidates because git ls-files reads the index. Now git ls-files --deleted is used to exclude working-tree deletions. Add regression tests for both scenarios.
When all files under a directory are deleted from the working tree (but not staged), the synthesized directory entry was left as a stale candidate. Now directory entries are pruned if no surviving file children exist after deleted-file exclusion.
43 tests covering: - Git vs walk parity on clean repos (flat, nested, scoped, gitignored) - Tracked ignored dir filtering (node_modules, vendor, __pycache__, etc.) - Deleted/renamed file exclusion and empty dir pruning - Untracked file discovery and gitignore respect - Path traversal prevention - is_ignored unit tests - Non-git fallback behaviour
Filenames containing tab, quotes, or backslash were C-style quoted by
git even with core.quotepath=false, producing escaped strings like
"tab\there.py" that don't match real paths on disk. Switch all three
git ls-files invocations to -z (NUL-delimited) mode and parse with
split('\0') instead of splitlines().
Add regression tests for tab, quote, and backslash in filenames.
Directory names starting with `-` (e.g. `-docs/`) are misinterpreted as git options without `--`. Extract `_scope_args()` helper to ensure all three git ls-files invocations use the end-of-options marker.
git ls-files --deleted and --others do not support --recurse-submodules (fatal: unsupported mode). Enumerate submodules via `git submodule status` and run per-submodule queries, prefixing results back to the root repo namespace. Also extract _git_ls_files_z() helper to reduce subprocess boilerplate.
…iscovery" This reverts commit f7d394c.
45dd8df to
aaaa0ca
Compare
Related Issue
Fixes #1375, helps #1276, #1339
Description
Root Cause
In large repositories (e.g. apache/superset with 65k+ files), the
@file mention completer fails to discover files in directories that sort late alphabetically. The root cause isos.walk()depth-first traversal with a hardcoded 1000-file limit — early directories exhaust the budget before later ones are visited.Analysis
Analyzed how professional tools solve this:
git ls-filesprimary → vendoredrg --filesfallbackignorecrate (ripgrep internals), parallel walkergit ls-files/fd), streamingfuzzyScoreos.walk()+ 1000 limitCommon pattern: all professional tools use
git ls-filesor equivalent (reads git index, not filesystem). None useos.walkwith hard truncation.Fix
New shared module
kimi_cli/utils/file_filter.py— unifies file discovery for Shell and Web:git ls-files -zas primary file source: Reads git's pre-built index with NUL-delimited output (-z) — instant for 100k+ file repos, handles special characters (tab, quotes, backslash) correctly.--recurse-submodules: Includes submodule files.--others --exclude-standard: Discovers untracked files while respecting.gitignore.--deletedexclusion: Filters stale renamed/deleted files (e.g. aftermv old.py new.pywithout staging).is_ignored()post-filtering: Trackednode_modules/,vendor/, etc. are filtered from results for consistency with the walk fallback...segments in scope for git paths; usesresolve()+is_relative_to()for walk paths./(e.g.@src/utils/), both git and walk restrict to that subtree.os.walkfallback: Preserved for non-git repos with identical scoping and filtering..git/indexmtime cache: 5s TTL + mtime change detection (borrowed from Claude Code) avoids redundant subprocess calls.Verification
Key files changed
src/kimi_cli/utils/file_filter.py— new: shared file discovery modulesrc/kimi_cli/ui/shell/prompt.py— delegates tofile_filter, adds mtime cachesrc/kimi_cli/web/api/sessions.py— restored unfilterediterdir()for general-purpose file browser APItests/utils/test_file_filter.py— new: 46 tests (git/walk parity, ignored dirs, deleted files, special chars, path traversal)tests/ui_and_conv/test_file_completer.py— added git-mode integration teststests/e2e/test_file_mention_e2e.py— new: 4 PTY E2E testsChecklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.