Problem
The current Crane migration framework can report migration_score: 1.0, but that score is not yet deletion-grade proof that the Go CLI is a complete replacement for the Python CLI.
The current gate is useful as a progress signal, but it can still pass while:
- Go command bodies are simplified or only print success messages.
- Help/output differences are logged as approved exceptions.
- Some tests pass vacuously if the Python reference binary is missing.
- The score counts
TestParity* pass events instead of proving command-surface, output, filesystem, lockfile, config, cache, and generated-file equivalence.
Before we remove the Python CLI, the Crane program needs to enforce true black-box Python-vs-Go replacement proof.
Goal
Update the Crane migration program so migration_score: 1.0 means:
- The Python CLI can be removed.
- The Go CLI has 100% supported command-surface parity.
- Every supported command behavior is verified against Python through black-box contract tests.
- There are zero approved parity exceptions remaining at cutover.
- Benchmarks still run on every Crane PR commit and are posted back to the PR.
Required Framework Change
Replace the current progress-oriented scoring gate with a deletion-grade cutover gate.
The final score must be composed from explicit gates:
python_reference_required = true
surface_parity = 100%
help_parity = 100%
functional_contracts = 100%
state_diff_contracts = 100%
known_exceptions = 0
go_tests = pass
python_tests = pass
benchmarks = pass
If any gate is false, migration_score must be less than 1.0.
Implementation Plan
1. Require the Python reference binary
Update the parity harness so missing Python is a hard failure, not a log message or vacuous pass.
Required behavior:
- CI must fail if
APM_PYTHON_BIN is unset.
- CI must fail if
APM_PYTHON_BIN does not exist or is not executable.
- Tests must not silently pass when Python comparison is unavailable.
- The score script must fail if the Go test event stream is empty or incomplete.
2. Generate and diff the CLI surface
Add a surface inventory test that extracts command metadata from both implementations.
Python source:
- Use Click introspection from
src/apm_cli/cli.py.
- Capture every command, subcommand, alias, positional argument, option, default, required flag, repeatability, hidden status, and help availability.
Go source:
- Add an equivalent inventory emitter for
cmd/apm.
- Capture the same data model.
Fail on:
- Missing command.
- Missing subcommand.
- Missing option.
- Mismatched required positional argument.
- Mismatched default.
- Mismatched exit behavior for invalid usage.
- Hidden Python aliases not intentionally represented or explicitly documented.
3. Enforce golden help and usage parity
For every command and subcommand, run both CLIs and compare normalized results:
apm --help
apm <command> --help
apm <command> <subcommand> --help
- invalid option
- missing required argument
- unknown subcommand
Compare:
The final cutover gate must have no "simplified help" or "approved truncation" exceptions.
4. Add state-diff functional parity tests
Create a reusable black-box contract harness:
- Create two identical temp homes and temp repos.
- Run Python in one environment.
- Run Go in the other environment.
- Compare command results and post-run state.
Compare:
- exit code
- stdout/stderr after normalization
- files created, removed, and modified
- file contents
- generated
AGENTS.md, CLAUDE.md, Copilot, Codex, Gemini, Cursor, Windsurf, and OpenCode outputs
apm.yml
apm.lock.yaml
.apm/ package directories
- marketplace config files
- user config files
- cache layout where deterministic
- audit output artifacts such as JSON, SARIF, and markdown
5. Build fixture-first command contracts
Crane should port one Python behavior at a time by adding one black-box contract test first, then implementing Go until that exact contract passes.
Do not add broad "parity-ish" tests that only check exit code or absence of WIP strings.
Required command families:
-
init
- defaults
- explicit project name
- explicit targets
- plugin mode
- marketplace mode
- existing
apm.yml
- non-interactive behavior
-
compile
- all canonical targets
--target
--all
--clean
--validate
--dry-run
- generated file content parity
- orphan cleanup parity
-
install
- local bundle
- local plugin directory
- local skill bundle
- fixture git repository
- transitive dependencies
- skill subset
- MCP dependency
- global scope
--frozen
- policy enforcement
- collision handling
--force
- lockfile writes
- managed-file manifest writes
- no live network
-
uninstall
- direct dependency
- transitive orphan cleanup
- global scope
- dry-run
- stale integrated file cleanup
- manifest mutation parity
-
update and deps update
- no-op
- changed refs
- selected packages
- target-specific update
- lockfile mutation
- dry-run
-
deps
- list
- tree
- info
- clean
- global scope
- insecure-only views
-
pack and unpack
- bundle artifact contents
- marketplace artifact contents
- manifest fields
- hidden-character audit behavior
- JSON output
- dry-run
--force
--skip-verify
-
marketplace
- add/list/remove/update/browse/validate
- init/check/outdated/doctor/publish dry-run/migrate
- package add/set/remove
- config mutation parity
- fixture refs only
-
audit
- package scan
- arbitrary file scan
- hidden Unicode findings
--strip
--dry-run
- text/json/sarif/markdown outputs
--ci
- policy/no-policy/no-cache/no-drift combinations
-
policy
- status
- check
- local policy
- inherited policy
- no-cache
- output format parity
-
mcp
- list
- search
- inspect/show
- install
- fixture registry
- client config mutation parity
-
run, preview, and list
- scripts with params
- missing script
- default script
- prompt output parity
- verbose output
-
config
- get
- set
- unset
- config file path handling
- missing config
- invalid key/value
-
runtime
- list
- setup
- remove
- status
- fixture runtime adapters
-
targets
- auto-detection
- explicit
apm.yml targets
--json
--all
- ambiguous/missing target behavior
-
view
- installed package metadata
- missing package
- versions using fixture refs
- global scope
-
cache
- info
- clean
- prune
- cache path/env behavior
-
prune
- no-op
- removes undeclared packages
- dry-run
- integration cleanup
-
outdated
- missing lockfile behavior
- no outdated dependencies
- outdated dependency fixture
- prerelease behavior where applicable
-
self-update
--check
- platform-specific messaging
- no actual network/install in parity tests
6. Replace live network with fixtures
Parity tests must not rely on GitHub, registries, or external services.
Use:
- local git repos
- local tarballs
- local fixture registries
- local HTTP fixture servers when HTTP behavior is needed
- fake home directories
- fake cache roots
- fake config paths
Real network smoke tests may exist, but they must not be required for cutover parity.
7. Keep original Python tests, but do not confuse them with Go parity
Continue running the Python unit test suite while Python exists.
However, cutover readiness must come from black-box contracts that both Python and Go satisfy. Internal Python unit tests passing is not evidence that Go implements the same behavior.
8. Update PR reporting
Every Crane PR commit should report:
- surface parity percentage
- help parity percentage
- functional contract pass count
- state-diff contract pass count
- remaining known exceptions
- benchmark table
- top failing command families
- next recommended command behavior to port
Post this as a PR comment per commit SHA, updating the same comment on reruns.
Acceptance Criteria
migration_score: 1.0 is impossible if Python is missing.
migration_score: 1.0 is impossible if any approved exception remains.
migration_score: 1.0 is impossible if only help/exit-code tests pass.
migration_score: 1.0 requires file-state parity for mutating commands.
- CI runs the parity and benchmark gates on every Crane PR commit.
- The benchmark results are posted to the PR for every Crane PR commit.
- The score output clearly distinguishes progress from cutover readiness.
- The Python CLI is not removed until this issue's gates pass.
Suggested Crane Operating Rule
For every iteration:
- Pick one Python CLI behavior.
- Add a black-box Python-vs-Go contract test for it.
- Run the test and see it fail for Go.
- Implement the Go behavior.
- Confirm the contract passes.
- Update the migration report with the exact behavior now covered.
No final cutover exceptions. No vacuous parity. No deleting Python until the deletion-grade gate is green.
Problem
The current Crane migration framework can report
migration_score: 1.0, but that score is not yet deletion-grade proof that the Go CLI is a complete replacement for the Python CLI.The current gate is useful as a progress signal, but it can still pass while:
TestParity*pass events instead of proving command-surface, output, filesystem, lockfile, config, cache, and generated-file equivalence.Before we remove the Python CLI, the Crane program needs to enforce true black-box Python-vs-Go replacement proof.
Goal
Update the Crane migration program so
migration_score: 1.0means:Required Framework Change
Replace the current progress-oriented scoring gate with a deletion-grade cutover gate.
The final score must be composed from explicit gates:
python_reference_required = truesurface_parity = 100%help_parity = 100%functional_contracts = 100%state_diff_contracts = 100%known_exceptions = 0go_tests = passpython_tests = passbenchmarks = passIf any gate is false,
migration_scoremust be less than1.0.Implementation Plan
1. Require the Python reference binary
Update the parity harness so missing Python is a hard failure, not a log message or vacuous pass.
Required behavior:
APM_PYTHON_BINis unset.APM_PYTHON_BINdoes not exist or is not executable.2. Generate and diff the CLI surface
Add a surface inventory test that extracts command metadata from both implementations.
Python source:
src/apm_cli/cli.py.Go source:
cmd/apm.Fail on:
3. Enforce golden help and usage parity
For every command and subcommand, run both CLIs and compare normalized results:
apm --helpapm <command> --helpapm <command> <subcommand> --helpCompare:
The final cutover gate must have no "simplified help" or "approved truncation" exceptions.
4. Add state-diff functional parity tests
Create a reusable black-box contract harness:
Compare:
AGENTS.md,CLAUDE.md, Copilot, Codex, Gemini, Cursor, Windsurf, and OpenCode outputsapm.ymlapm.lock.yaml.apm/package directories5. Build fixture-first command contracts
Crane should port one Python behavior at a time by adding one black-box contract test first, then implementing Go until that exact contract passes.
Do not add broad "parity-ish" tests that only check exit code or absence of WIP strings.
Required command families:
initapm.ymlcompile--target--all--clean--validate--dry-runinstall--frozen--forceuninstallupdateanddeps updatedepspackandunpack--force--skip-verifymarketplaceaudit--strip--dry-run--cipolicymcprun,preview, andlistconfigruntimetargetsapm.ymltargets--json--allviewcachepruneoutdatedself-update--check6. Replace live network with fixtures
Parity tests must not rely on GitHub, registries, or external services.
Use:
Real network smoke tests may exist, but they must not be required for cutover parity.
7. Keep original Python tests, but do not confuse them with Go parity
Continue running the Python unit test suite while Python exists.
However, cutover readiness must come from black-box contracts that both Python and Go satisfy. Internal Python unit tests passing is not evidence that Go implements the same behavior.
8. Update PR reporting
Every Crane PR commit should report:
Post this as a PR comment per commit SHA, updating the same comment on reruns.
Acceptance Criteria
migration_score: 1.0is impossible if Python is missing.migration_score: 1.0is impossible if any approved exception remains.migration_score: 1.0is impossible if only help/exit-code tests pass.migration_score: 1.0requires file-state parity for mutating commands.Suggested Crane Operating Rule
For every iteration:
No final cutover exceptions. No vacuous parity. No deleting Python until the deletion-grade gate is green.