diff --git a/AGENTS.md b/AGENTS.md index 1a5d9a3c..a65813f3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -76,6 +76,16 @@ structured so independent changes stay in disjoint files. Keep it that way: - `assembly init` templates live in `aai_cli/init/templates/` and are **committed**, including renamed dotfiles (`gitignore` → `.gitignore`, `env.example`). The wheel force-includes them via `[tool.hatch.build.targets.wheel] artifacts`, excluding `__pycache__/*.pyc`. Editing templates needs care — see the parametrized contract tests (`tests/test_init_template_*.py`). - `audioop` left the stdlib in 3.13; `audioop-lts` backfills it (conditional dependency). Supported Pythons: 3.12–3.13. - **Releasing is tag-triggered.** The version is **derived from the git tag** by hatch-vcs and written to a gitignored `aai_cli/_version.py` at build time — there is no version string to keep in sync across `pyproject.toml` or `aai_cli/__init__.py`, and `bump_patch.sh` no longer exists. To cut a release, run `scripts/cut_release.sh` from a clean `main` in sync with `origin/main`: no argument → next patch above the latest `vX.Y.Z` tag; `cut_release.sh X.Y.Z` → explicit version. It tags + pushes, which fires `.github/workflows/release.yml` — that builds the prebuilt arm64 Homebrew bottle (`Formula/assembly.rb`), cuts the GitHub Release, and opens the formula PR. **You don't need a local checkout to release:** `release.yml` also has a manual `workflow_dispatch` (GitHub's "Run workflow" button, or `actions_run_trigger` from a Claude web session) taking an optional `version` input — its `tag` job resolves the version and creates+pushes the tag (reusing `cut_release.sh --no-push`), and the rest of the pipeline then runs in that same workflow run. Tag creation lives *inside* the release run on purpose: a `GITHUB_TOKEN` tag push wouldn't re-trigger the `on: push` half, so a separate "push the tag" workflow would silently never build. (`dry_run: true` builds the bottle for an existing tag without publishing.) Bottling matters because the deps include Rust-backed sdists (`pydantic-core`, `jiter`, `cryptography`) that would otherwise compile from source on `brew install`. The Homebrew formula builds from a git-less GitHub source tarball, so `Formula/assembly.rb`'s `def install` sets the generic `SETUPTOOLS_SCM_PRETEND_VERSION` env var (installing resources first under a clean env, then setting the var for our package only) to feed the tag version to the build. **`cut_release.sh` only runs from a clean `main` in sync with `origin/main`** (it hard-errors on a feature branch / dirty tree), so cut releases from `main`, not your working branch. The "update available" notice users see is `aai_cli/update_check.py`. +- **Release-run operational gotchas (cost prior sessions a follow-up PR each).** Two + things bite the `release.yml` path specifically: (1) the bot-opened formula PR (`Bottle + vX.Y.Z`) is authored with `GITHUB_TOKEN`, which **does not trigger CI**, so its required + check never reports — merge it with the admin override; the diff is formula-only by + construction. (2) The manual `workflow_dispatch` `tag` job checks out with + `persist-credentials: false` and must **set a git identity** before invoking + `cut_release.sh`, because the script cuts an *annotated* tag (`git tag -a`) which needs a + committer — without it the run dies with `empty ident name` and the bottle/publish jobs + skip silently (this path only ever "worked" locally, where maintainers have a global + identity). Mirror the `publish` job's `git config user.{name,email}` step. ## Manual QA / running the CLI in sandboxed sessions diff --git a/tests/AGENTS.md b/tests/AGENTS.md index 4b93caab..d35dff73 100644 --- a/tests/AGENTS.md +++ b/tests/AGENTS.md @@ -87,6 +87,35 @@ uv run pytest -q -n auto --timeout=60 --cov=aai_cli --cov-branch --cov-context=t uv run python scripts/mutation_sweep.py aai_cli/config.py # or omit paths for the whole package ``` +## Cross-platform portability (a green Linux gate isn't a green macOS/Windows run) + +`scripts/check.sh` runs **Linux-only** (it's bash plus Go/Homebrew/shell tooling), +and that's the only gate a web session can run. But CI also runs the pytest suite +on `windows-latest` (the `tests (windows)` job), and maintainers run the full gate +on macOS — so OS-specific failures you never see on Linux still land on `main`. +These have each cost a session a follow-up PR; bake the fix in up front: + +- **POSIX-only imports at module scope crash collection on Windows.** A top-level + `import termios` / `fcntl` / `os.openpty` (e.g. `tests/test_hotkey.py`'s pty driver) + aborts collection before any skip can apply. Guard it with + `pytest.importorskip("termios")` at the top of the module — that skips the whole file + on Windows and, unlike a skip/xfail marker, is **not** counted by the Linux + escape-hatch gate (which greps for the marker/call forms — so don't paste those literal + tokens into a test file or even this guide; that itself trips the count). +- **Permission-bit asserts are POSIX-only.** `0o600`/`0o700` mode checks (e.g. + `tests/test_init_scaffold.py`) don't hold on Windows. Gate the mode assertion on + `os.name == "posix"` and assert the cross-platform behavior (file contents, the `.env` + rewrite) unconditionally so the test still covers Windows. +- **macOS filesystems are case-insensitive by default.** A test that distinguishes two + paths differing only in case (hard-link / same-file detection) passes on Linux and fails + on macOS — assert on a case-stable property instead of the casing. +- **When you touch `check.sh` itself, don't assume GNU tooling.** macOS ships BSD + utilities: BSD/ERE `grep -E` silently *ignores* `\b`, so a baseline-vs-working count that + used `git grep -E` on one side and `rg` on the other disagreed and failed the escape-hatch + gate on macOS only — use one matcher consistently (`git grep -P`, PCRE). Homebrew 6+ also + dropped `brew audit [path]`; a formula must be audited **by name** (copy it into an + ephemeral local tap first). Both bit a "green on Linux" branch on the maintainer's Mac. + ## Replay fixtures (offline end-to-end coverage) `tests/test_replay_e2e.py` drives whole commands (`transcribe`/`transcripts`/`llm`/