Skip to content

perf(scanners): parallel walk, extension filter, text redaction#6

Merged
wgordon17 merged 2 commits intogordon-code:mainfrom
wgordon17:feat/parallel-walk
Mar 16, 2026
Merged

perf(scanners): parallel walk, extension filter, text redaction#6
wgordon17 merged 2 commits intogordon-code:mainfrom
wgordon17:feat/parallel-walk

Conversation

@wgordon17
Copy link
Member

Summary

  • Replace serial os.walk() with parallel_walk_dirs() — ThreadPoolExecutor per-directory fan-out with ≤2-dir serial bypass
  • Expand WALK_SKIP_DIRS from 36 to 58 entries (Python packaging, Electron/Chromium, macOS metadata, developer tools)
  • Add NON_CONFIG_EXTENSIONS blacklist (117 extensions) — skip stat()/hash/content-read for non-config files
  • Add WALK_SKIP_SUFFIXES (.noindex, .lproj) for suffix-based directory pruning
  • Remove lossy file caps — safety via pruning and filtering
  • Upgrade nix_state adjacent-project walk: depth 2 to 5, shared pruning, parallelism, symlink check
  • Add text content redaction for captured config files in uncovered directories
  • 854 tests (811 to 854), ruff clean, pyright clean

Replace serial os.walk() with parallel_walk_dirs() (ThreadPoolExecutor
per-directory fan-out). Expand WALK_SKIP_DIRS from 36 to 58 entries.
Add NON_CONFIG_EXTENSIONS blacklist (117 extensions) and
WALK_SKIP_SUFFIXES (.noindex, .lproj). Remove lossy file caps —
safety now comes from directory pruning and extension filtering.

Upgrade nix_state adjacent-project walk from depth 2 to 5 with shared
pruning infrastructure and parallelism. Add symlink check in
_walk_recursive to prevent traversal outside $HOME.

Add text content redaction for sensitive values (password, token, key,
secret) in captured config files, matching the existing plist redaction.
- _classify_file returns None for non-config extensions (no phantom stubs)

- Removes _ADJACENT_CAP, _PACKAGE_CAP, _MAX_SCRIPTS (silent data loss)

- Single results declaration in parallel_walk_dirs (if/else, not duplicate)

- Sorts all parallel-aggregated lists for deterministic output (5 call sites)

- Inlines _get_hm_packages return after cap removal (ruff RET504)
@wgordon17 wgordon17 merged commit 51ce43a into gordon-code:main Mar 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant