fix: scope cleanup-legacy.py skill discovery and preserve config.yaml#2205
fix: scope cleanup-legacy.py skill discovery and preserve config.yaml#2205don-petry wants to merge 2 commits intobmad-code-org:mainfrom
Conversation
…ig.yaml
The cleanup-legacy.py script used an overly broad rglob("SKILL.md") that
matched template and asset files nested deep in the directory tree (e.g.
bmad-module-builder/assets/setup-skill-template/SKILL.md). This caused
cleanup to abort when it couldn't verify non-installable templates at the
skills directory.
Scopes find_skill_dirs() to only match SKILL.md at recognized installable
positions: direct children ({name}/SKILL.md) and skills subfolder
(skills/{name}/SKILL.md). Also adds config.yaml backup/restore around
shutil.rmtree() so per-module configs needed by bmad-init are preserved.
Fixes bmad-code-org#2175
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a shared cleanup-legacy.py utility under tools/scripts/ to safely remove legacy _bmad/ module directories after config/help migrations, with optional verification that skills are already installed elsewhere and preservation of per-module config.yaml.
Changes:
- Introduces scoped skill discovery to only treat
SKILL.mdin installable locations as skills. - Adds an optional safety check (
--skills-dir) to validate discovered skills exist at the installed location before deletion. - Preserves
config.yamlacross directory removal by backing it up beforeshutil.rmtree()and restoring it afterward.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Preserve config.yaml if present (bmad-init needs per-module configs) | ||
| config_path = target / "config.yaml" | ||
| config_backup = None | ||
| if config_path.exists(): | ||
| config_backup = config_path.read_bytes() | ||
| if verbose: | ||
| print(f"Preserving config.yaml in {dirname}/", file=sys.stderr) | ||
|
|
||
| file_count = count_files(target) | ||
| if config_backup: | ||
| file_count -= 1 # Don't count the preserved file | ||
| if verbose: |
There was a problem hiding this comment.
config_backup uses truthiness checks (if config_backup:). If config.yaml exists but is empty, read_bytes() returns b'' which is falsy, so the script will not restore the file and will miscount removed files. Track preservation with an explicit boolean (e.g., config_exists = config_path.is_file()) or check config_backup is not None instead of truthiness.
| for dirname in dirs_to_remove: | ||
| target = Path(bmad_dir) / dirname | ||
| if not target.exists(): |
There was a problem hiding this comment.
dirname values come from CLI (--module-code / --also-remove) and are directly appended to bmad_dir before shutil.rmtree(). A crafted value like ../ could delete directories outside _bmad/. Consider validating names (reject path separators / ..) or resolving target and asserting it is within Path(bmad_dir).resolve() before removal.
| for skill_md in root.rglob("SKILL.md"): | ||
| rel = skill_md.parent.relative_to(root) | ||
| parts = rel.parts | ||
| # Direct child: {name}/SKILL.md | ||
| if len(parts) == 1: | ||
| skills.append(parts[0]) | ||
| # Skills subfolder: skills/{name}/SKILL.md | ||
| elif len(parts) == 2 and parts[0] == "skills": | ||
| skills.append(parts[1]) |
There was a problem hiding this comment.
find_skill_dirs() still walks the entire tree via root.rglob('SKILL.md') and then filters by depth. To avoid unnecessary deep traversal (and speed up large modules), consider using targeted globs for the only supported installable locations (e.g., root.glob('*/SKILL.md') and root.glob('skills/*/SKILL.md')) instead of scanning everything.
| for skill_md in root.rglob("SKILL.md"): | |
| rel = skill_md.parent.relative_to(root) | |
| parts = rel.parts | |
| # Direct child: {name}/SKILL.md | |
| if len(parts) == 1: | |
| skills.append(parts[0]) | |
| # Skills subfolder: skills/{name}/SKILL.md | |
| elif len(parts) == 2 and parts[0] == "skills": | |
| skills.append(parts[1]) | |
| # Direct child: {name}/SKILL.md | |
| for skill_md in root.glob("*/SKILL.md"): | |
| skills.append(skill_md.parent.name) | |
| # Skills subfolder: skills/{name}/SKILL.md | |
| skills_root = root / "skills" | |
| if skills_root.exists(): | |
| for skill_md in skills_root.glob("*/SKILL.md"): | |
| skills.append(skill_md.parent.name) |
| file_count = count_files(target) | ||
| if config_backup: | ||
| file_count -= 1 # Don't count the preserved file | ||
| if verbose: | ||
| print( | ||
| f"Removing {target} ({file_count} files)", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
count_files() is called unconditionally for every target directory. For large legacy trees this adds a full extra traversal even when --verbose is off. If files_removed_count is not strictly required, consider only counting when verbose (or make it opt-in) to keep cleanup fast.
📝 WalkthroughWalkthroughAdds a new executable Python script that removes legacy module directories from a BMad installation path. The script scans for installable skills at recognized locations, verifies they exist in the target skills directory, removes legacy directories while preserving Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tools/scripts/cleanup-legacy.py (1)
61-87: Prefer a single source of truth for skill discovery.
tools/installer/core/manifest-generator.js:collectSkills()still discovers installableSKILL.mdentrypoints recursively, while this helper hardcodes only direct children andskills/{name}. That makes the pre-delete safety check easy to drift from install behavior and can silently under-verify nested skill layouts if they ever appear in a legacy_bmadtree. Reusing the installer's discovery rules here, or explicitly failing on deeperSKILL.mdpaths, would make this guard safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/scripts/cleanup-legacy.py` around lines 61 - 87, The current find_skill_dirs function only accepts direct children and skills/{name} layouts, which diverges from the installer's recursive discovery in collectSkills; update find_skill_dirs to use the same discovery rules as the installer by either calling the installer function collectSkills (from tools/installer/core/manifest-generator.js or its Python wrapper) or by making the function reject/deem unsafe any SKILL.md nested deeper than the installer's allowed entrypoints (i.e., explicitly fail or raise an error on deeper paths), ensuring the guard's behavior exactly matches collectSkills' logic so pre-delete checks can't drift from install behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/scripts/cleanup-legacy.py`:
- Around line 188-220: The code incorrectly treats zero-byte preserved configs
as absent and allows restoration errors to escape handling; update the guards
around config_backup in the cleanup routine to use "if config_backup is not
None:" (affecting the file_count adjustment and the restore condition) so empty
b"" files are preserved and restored, and move the restoration steps
(target.mkdir(...) and config_path.write_bytes(...)) into the same try/except
that wraps shutil.rmtree(target) so any errors during restore are caught and
reported via the existing error_result (which references removed and dirname)
and trigger sys.exit(2).
---
Nitpick comments:
In `@tools/scripts/cleanup-legacy.py`:
- Around line 61-87: The current find_skill_dirs function only accepts direct
children and skills/{name} layouts, which diverges from the installer's
recursive discovery in collectSkills; update find_skill_dirs to use the same
discovery rules as the installer by either calling the installer function
collectSkills (from tools/installer/core/manifest-generator.js or its Python
wrapper) or by making the function reject/deem unsafe any SKILL.md nested deeper
than the installer's allowed entrypoints (i.e., explicitly fail or raise an
error on deeper paths), ensuring the guard's behavior exactly matches
collectSkills' logic so pre-delete checks can't drift from install behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 41b08d6c-cbc6-49db-95c7-6c48ddbfece1
📒 Files selected for processing (1)
tools/scripts/cleanup-legacy.py
| # Preserve config.yaml if present (bmad-init needs per-module configs) | ||
| config_path = target / "config.yaml" | ||
| config_backup = None | ||
| if config_path.exists(): | ||
| config_backup = config_path.read_bytes() | ||
| if verbose: | ||
| print(f"Preserving config.yaml in {dirname}/", file=sys.stderr) | ||
|
|
||
| file_count = count_files(target) | ||
| if config_backup: | ||
| file_count -= 1 # Don't count the preserved file | ||
| if verbose: | ||
| print( | ||
| f"Removing {target} ({file_count} files)", | ||
| file=sys.stderr, | ||
| ) | ||
|
|
||
| try: | ||
| shutil.rmtree(target) | ||
| except OSError as e: | ||
| error_result = { | ||
| "status": "error", | ||
| "error": f"Failed to remove {target}: {e}", | ||
| "directories_removed": removed, | ||
| "directories_failed": dirname, | ||
| } | ||
| print(json.dumps(error_result, indent=2)) | ||
| sys.exit(2) | ||
|
|
||
| # Restore preserved config.yaml | ||
| if config_backup: | ||
| target.mkdir(parents=True, exist_ok=True) | ||
| config_path.write_bytes(config_backup) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'Truthiness of empty bytes in Python:\n'
python - <<'PY'
print(bool(b""))
PY
printf '\nCurrent config preservation block:\n'
sed -n '188,220p' tools/scripts/cleanup-legacy.pyRepository: bmad-code-org/BMAD-METHOD
Length of output: 1325
🏁 Script executed:
# Get the full function containing these lines to understand complete control flow
grep -n "def " tools/scripts/cleanup-legacy.py | tail -5Repository: bmad-code-org/BMAD-METHOD
Length of output: 233
🏁 Script executed:
# View the complete function from the beginning to lines 220 plus context
sed -n '150,225p' tools/scripts/cleanup-legacy.pyRepository: bmad-code-org/BMAD-METHOD
Length of output: 2450
🏁 Script executed:
# Check if there's any error handling or try-except wrapper around the function call
grep -B5 -A10 "for dirname in" tools/scripts/cleanup-legacy.py | head -40Repository: bmad-code-org/BMAD-METHOD
Length of output: 1041
Fix empty-byte config handling and restore error safety.
read_bytes() returns b"" for zero-byte files, which is falsy, so both if config_backup: guards (lines 196 and 207) skip empty configs during file counting and restoration. Additionally, the restore operations (lines 208–209) run outside the try block wrapping rmtree(), so mkdir() or write_bytes() errors escape error handling and produce a traceback instead of the documented JSON error response and sys.exit(2).
Fix: Move preservation and restoration into the try block and replace if config_backup: with if config_backup is not None: to safely handle empty files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/scripts/cleanup-legacy.py` around lines 188 - 220, The code incorrectly
treats zero-byte preserved configs as absent and allows restoration errors to
escape handling; update the guards around config_backup in the cleanup routine
to use "if config_backup is not None:" (affecting the file_count adjustment and
the restore condition) so empty b"" files are preserved and restored, and move
the restoration steps (target.mkdir(...) and config_path.write_bytes(...)) into
the same try/except that wraps shutil.rmtree(target) so any errors during
restore are caught and reported via the existing error_result (which references
removed and dirname) and trigger sys.exit(2).
- Fix config restoration for zero-byte files by using `is not None`
instead of truthiness checks on `config_backup` (empty bytes `b""` is
falsy, causing silent loss of empty config.yaml files)
- Move config restore into the try/except block so mkdir/write_bytes
errors are caught and reported as structured JSON instead of tracebacks
- Add logging.error() call on failure for observability
- Replace rglob("SKILL.md") with targeted glob() calls to avoid
unnecessary deep traversal — only the two canonical installable
layouts are checked
- Add docstring note explaining why find_skill_dirs() is intentionally
stricter than the installer's recursive collectSkills()
- Add path traversal validation rejecting "..", "/", "\\" in dir names
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
All review feedback from CodeRabbit and Copilot has been addressed in the latest push (
All reviews above were submitted against commit |
don-petry
left a comment
There was a problem hiding this comment.
Automated review — NEEDS HUMAN REVIEW
Risk: MEDIUM
Reviewed commit: 9924dc6344f1262a27e5f5f6c146c42955e6da66
Council vote: security=MEDIUM · correctness=MEDIUM · maintainability=MEDIUM
Summary
The PR introduces a well-structured shared utility script (tools/scripts/cleanup-legacy.py) with clear documentation, implementing two core fixes from issue #2175: scoped find_skill_dirs() discovery and config.yaml backup/restore with correct is not None guards. However, no CI checks have run against the current head commit (9924dc6), making the author's quality-run claim unverifiable. More critically, the new shared script has no call site anywhere in the repository, leaving open the question of whether the end-user bug in /bmad-bmb-setup is actually resolved by this changeset alone.
Linked issue analysis
Issue #2175 describes a bug in the BMB module's cleanup-legacy.py invoked during /bmad-bmb-setup. The diff addresses both acceptance criteria (scoped skill discovery and config.yaml preservation), but tools/scripts/cleanup-legacy.py has no wiring to the existing BMB setup skill or task — the integration path from the new script to the reported bug is unclear and unverified.
Findings
Major
- [major]
tools/scripts/cleanup-legacy.py— No call site found: the new shared script is not referenced anywhere in the repository. Without a companion change updating the BMB setup skill/task to invoke this path, the end-user bug from issue #2175 may remain unresolved. (correctness) - [major] CI missing:
statusCheckRollupis empty — no checks have run against head9924dc6. The PR body claimsnpm run qualitypasses (204 tests) but this is unverifiable from the GitHub status API. (correctness + security)
Minor
- [minor]
tools/scripts/cleanup-legacy.py:195-196— Path traversal guard (..///\check) runs aftertarget.exists()andtarget.is_dir()filesystem calls. The validation should precede all path operations to make the safety intent clear. (correctness + security + maintainability) - [minor]
tools/scripts/cleanup-legacy.py— Docstring coverage at 66.67% (below 80% threshold flagged by CodeRabbit):parse_args(),count_files(), andmain()are missing docstrings. (correctness + maintainability) - [minor] No test files: PR body describes unit tests for
find_skill_dirs()andconfig.yamlbackup/restore, but no test file additions are present in the diff. (correctness)
Info
- [info]
tools/scripts/cleanup-legacy.py:214—shutil.rmtree()called without checking whether the top-level target itself is a symlink; safe in Python 3 by default, but worth noting. (security) - [info]
tools/scripts/cleanup-legacy.py:228—logging.error()used for the OSError case while all other diagnostics useprint(..., file=sys.stderr); logging level is never configured inmain(). Unify on one mechanism. (maintainability) - [info]
tools/scripts/cleanup-legacy.py:264—'core'directory is unconditionally added todirs_to_remove; if this script is later used in contexts beyond BMB,coremay not always be a legacy directory. (correctness) - [info]
tools/scripts/cleanup-legacy.py:264— Deduplication loop can be replaced withlist(dict.fromkeys(dirs_to_remove))(idiomatic Python). (maintainability) - [info]
tools/scripts/cleanup-legacy.py:268— TOCTOU gap betweenverify_skills_installed()andcleanup_directories(); acceptable for a local CLI tool. (security)
CI status
No CI checks recorded against head 9924dc6 — statusCheckRollup is empty. The PR body claims npm run quality passes (204 tests, lint, format) but this is unverifiable.
Reviewed automatically by the don-petry PR-review council (security: opus 4.6 · correctness: sonnet 4.6 · maintainability: sonnet 4.6 · synthesis: sonnet 4.6). The marker on line 1 lets the agent detect new commits and re-review. Reply with @don-petry if you need a human.
|
Hi @don-petry is this supposed to be called by a skill? Having a bit of trouble understanding what this is resolving. |
Summary
cleanup-legacy.pyscript with two fixes from bmad-suno-band-manager commiteb226a7:find_skill_dirs()to only match SKILL.md at installable positions: direct children ({name}/SKILL.md) and skills subfolder (skills/{name}/SKILL.md). Deeply nested SKILL.md files (e.g.bmad-module-builder/assets/setup-skill-template/SKILL.md) are no longer incorrectly matched.config.yamlbeforeshutil.rmtree()and restores it after, so per-module configs needed bybmad-initsurvive cleanup.Fixes #2175
Test plan
find_skill_dirs()with direct child, skills/ subfolder, and deeply nested SKILL.md -- only installable positions matchednpm run qualitypasses (204 tests, lint, format)🤖 Generated with Claude Code