Update Crane migration program for deletion-grade Python/Go parity

## Problem

The current Crane migration framework can report `migration_score: 1.0`, but that score is not yet deletion-grade proof that the Go CLI is a complete replacement for the Python CLI.

The current gate is useful as a progress signal, but it can still pass while:

- Go command bodies are simplified or only print success messages.
- Help/output differences are logged as approved exceptions.
- Some tests pass vacuously if the Python reference binary is missing.
- The score counts `TestParity*` pass events instead of proving command-surface, output, filesystem, lockfile, config, cache, and generated-file equivalence.

Before we remove the Python CLI, the Crane program needs to enforce true black-box Python-vs-Go replacement proof.

## Goal

Update the Crane migration program so `migration_score: 1.0` means:

- The Python CLI can be removed.
- The Go CLI has 100% supported command-surface parity.
- Every supported command behavior is verified against Python through black-box contract tests.
- There are zero approved parity exceptions remaining at cutover.
- Benchmarks still run on every Crane PR commit and are posted back to the PR.

## Required Framework Change

Replace the current progress-oriented scoring gate with a deletion-grade cutover gate.

The final score must be composed from explicit gates:

- `python_reference_required = true`
- `surface_parity = 100%`
- `help_parity = 100%`
- `functional_contracts = 100%`
- `state_diff_contracts = 100%`
- `known_exceptions = 0`
- `go_tests = pass`
- `python_tests = pass`
- `benchmarks = pass`

If any gate is false, `migration_score` must be less than `1.0`.

## Implementation Plan

### 1. Require the Python reference binary

Update the parity harness so missing Python is a hard failure, not a log message or vacuous pass.

Required behavior:

- CI must fail if `APM_PYTHON_BIN` is unset.
- CI must fail if `APM_PYTHON_BIN` does not exist or is not executable.
- Tests must not silently pass when Python comparison is unavailable.
- The score script must fail if the Go test event stream is empty or incomplete.

### 2. Generate and diff the CLI surface

Add a surface inventory test that extracts command metadata from both implementations.

Python source:

- Use Click introspection from `src/apm_cli/cli.py`.
- Capture every command, subcommand, alias, positional argument, option, default, required flag, repeatability, hidden status, and help availability.

Go source:

- Add an equivalent inventory emitter for `cmd/apm`.
- Capture the same data model.

Fail on:

- Missing command.
- Missing subcommand.
- Missing option.
- Mismatched required positional argument.
- Mismatched default.
- Mismatched exit behavior for invalid usage.
- Hidden Python aliases not intentionally represented or explicitly documented.

### 3. Enforce golden help and usage parity

For every command and subcommand, run both CLIs and compare normalized results:

- `apm --help`
- `apm <command> --help`
- `apm <command> <subcommand> --help`
- invalid option
- missing required argument
- unknown subcommand

Compare:

- exit code
- stdout
- stderr

The final cutover gate must have no "simplified help" or "approved truncation" exceptions.

### 4. Add state-diff functional parity tests

Create a reusable black-box contract harness:

1. Create two identical temp homes and temp repos.
2. Run Python in one environment.
3. Run Go in the other environment.
4. Compare command results and post-run state.

Compare:

- exit code
- stdout/stderr after normalization
- files created, removed, and modified
- file contents
- generated `AGENTS.md`, `CLAUDE.md`, Copilot, Codex, Gemini, Cursor, Windsurf, and OpenCode outputs
- `apm.yml`
- `apm.lock.yaml`
- `.apm/` package directories
- marketplace config files
- user config files
- cache layout where deterministic
- audit output artifacts such as JSON, SARIF, and markdown

### 5. Build fixture-first command contracts

Crane should port one Python behavior at a time by adding one black-box contract test first, then implementing Go until that exact contract passes.

Do not add broad "parity-ish" tests that only check exit code or absence of WIP strings.

Required command families:

- `init`
  - defaults
  - explicit project name
  - explicit targets
  - plugin mode
  - marketplace mode
  - existing `apm.yml`
  - non-interactive behavior

- `compile`
  - all canonical targets
  - `--target`
  - `--all`
  - `--clean`
  - `--validate`
  - `--dry-run`
  - generated file content parity
  - orphan cleanup parity

- `install`
  - local bundle
  - local plugin directory
  - local skill bundle
  - fixture git repository
  - transitive dependencies
  - skill subset
  - MCP dependency
  - global scope
  - `--frozen`
  - policy enforcement
  - collision handling
  - `--force`
  - lockfile writes
  - managed-file manifest writes
  - no live network

- `uninstall`
  - direct dependency
  - transitive orphan cleanup
  - global scope
  - dry-run
  - stale integrated file cleanup
  - manifest mutation parity

- `update` and `deps update`
  - no-op
  - changed refs
  - selected packages
  - target-specific update
  - lockfile mutation
  - dry-run

- `deps`
  - list
  - tree
  - info
  - clean
  - global scope
  - insecure-only views

- `pack` and `unpack`
  - bundle artifact contents
  - marketplace artifact contents
  - manifest fields
  - hidden-character audit behavior
  - JSON output
  - dry-run
  - `--force`
  - `--skip-verify`

- `marketplace`
  - add/list/remove/update/browse/validate
  - init/check/outdated/doctor/publish dry-run/migrate
  - package add/set/remove
  - config mutation parity
  - fixture refs only

- `audit`
  - package scan
  - arbitrary file scan
  - hidden Unicode findings
  - `--strip`
  - `--dry-run`
  - text/json/sarif/markdown outputs
  - `--ci`
  - policy/no-policy/no-cache/no-drift combinations

- `policy`
  - status
  - check
  - local policy
  - inherited policy
  - no-cache
  - output format parity

- `mcp`
  - list
  - search
  - inspect/show
  - install
  - fixture registry
  - client config mutation parity

- `run`, `preview`, and `list`
  - scripts with params
  - missing script
  - default script
  - prompt output parity
  - verbose output

- `config`
  - get
  - set
  - unset
  - config file path handling
  - missing config
  - invalid key/value

- `runtime`
  - list
  - setup
  - remove
  - status
  - fixture runtime adapters

- `targets`
  - auto-detection
  - explicit `apm.yml` targets
  - `--json`
  - `--all`
  - ambiguous/missing target behavior

- `view`
  - installed package metadata
  - missing package
  - versions using fixture refs
  - global scope

- `cache`
  - info
  - clean
  - prune
  - cache path/env behavior

- `prune`
  - no-op
  - removes undeclared packages
  - dry-run
  - integration cleanup

- `outdated`
  - missing lockfile behavior
  - no outdated dependencies
  - outdated dependency fixture
  - prerelease behavior where applicable

- `self-update`
  - `--check`
  - platform-specific messaging
  - no actual network/install in parity tests

### 6. Replace live network with fixtures

Parity tests must not rely on GitHub, registries, or external services.

Use:

- local git repos
- local tarballs
- local fixture registries
- local HTTP fixture servers when HTTP behavior is needed
- fake home directories
- fake cache roots
- fake config paths

Real network smoke tests may exist, but they must not be required for cutover parity.

### 7. Keep original Python tests, but do not confuse them with Go parity

Continue running the Python unit test suite while Python exists.

However, cutover readiness must come from black-box contracts that both Python and Go satisfy. Internal Python unit tests passing is not evidence that Go implements the same behavior.

### 8. Update PR reporting

Every Crane PR commit should report:

- surface parity percentage
- help parity percentage
- functional contract pass count
- state-diff contract pass count
- remaining known exceptions
- benchmark table
- top failing command families
- next recommended command behavior to port

Post this as a PR comment per commit SHA, updating the same comment on reruns.

## Acceptance Criteria

- `migration_score: 1.0` is impossible if Python is missing.
- `migration_score: 1.0` is impossible if any approved exception remains.
- `migration_score: 1.0` is impossible if only help/exit-code tests pass.
- `migration_score: 1.0` requires file-state parity for mutating commands.
- CI runs the parity and benchmark gates on every Crane PR commit.
- The benchmark results are posted to the PR for every Crane PR commit.
- The score output clearly distinguishes progress from cutover readiness.
- The Python CLI is not removed until this issue's gates pass.

## Suggested Crane Operating Rule

For every iteration:

1. Pick one Python CLI behavior.
2. Add a black-box Python-vs-Go contract test for it.
3. Run the test and see it fail for Go.
4. Implement the Go behavior.
5. Confirm the contract passes.
6. Update the migration report with the exact behavior now covered.

No final cutover exceptions. No vacuous parity. No deleting Python until the deletion-grade gate is green.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Crane migration program for deletion-grade Python/Go parity #96

Problem

Goal

Required Framework Change

Implementation Plan

1. Require the Python reference binary

2. Generate and diff the CLI surface

3. Enforce golden help and usage parity

4. Add state-diff functional parity tests

5. Build fixture-first command contracts

6. Replace live network with fixtures

7. Keep original Python tests, but do not confuse them with Go parity

8. Update PR reporting

Acceptance Criteria

Suggested Crane Operating Rule

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Update Crane migration program for deletion-grade Python/Go parity #96

Description

Problem

Goal

Required Framework Change

Implementation Plan

1. Require the Python reference binary

2. Generate and diff the CLI surface

3. Enforce golden help and usage parity

4. Add state-diff functional parity tests

5. Build fixture-first command contracts

6. Replace live network with fixtures

7. Keep original Python tests, but do not confuse them with Go parity

8. Update PR reporting

Acceptance Criteria

Suggested Crane Operating Rule

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions