This file provides guidance to coding agents (Claude Code, Codex, Cursor, and
others) when working with code in this repository. CLAUDE.md is a symlink to
this file, so Claude Code reads the same instructions.
Guidance is split per directory so many agents can update it concurrently
without conflicting in one file. This root file holds repo-wide invariants;
read the AGENTS.md nearest the code you're changing:
aai_cli/AGENTS.md— architecture, the command-registration convention, cross-cutting state, feature subsystems.tests/AGENTS.md— test markers, snapshot goldens, hermeticity rules, and the hard-won lessons for getting the patch-coverage and mutation gates green.
This project uses uv. Run every Python tool through uv run so it uses the locked environment (pyproject.toml + uv.lock), not whatever is on PATH:
uv sync # create/refresh the venv (the dev group installs by default)
uv run assembly --help # run the CLI from the locked environment
./scripts/check.sh # the full gate CI runs (scripts/check.sh is the source of truth)Dev tooling is a PEP 735 [dependency-groups] group with default-groups = ["dev"], not a [project] extra — uv sync --extra dev errors.
scripts/check.sh is the authoritative gate; keep this list in sync with it. It runs, in order: uv lock --check → ruff check → ruff format --check → mypy → pyright (src strict) → pyright (tests) → vulture (dead code) → deptry (dependency hygiene) → lint-imports (import-linter architecture contracts) → max-file-length (500 lines) → xenon (cyclomatic complexity: function max B, module avg A, project avg A) → swiftlint + swift compile (macOS only, skipped elsewhere) → markdownlint → codespell (spell-check code/comments/docs via uvx; config in [tool.codespell]) → prettier (init template JS/CSS) → shellcheck → actionlint + zizmor (workflow lint/audit) → gitleaks (secret scan) → generated --show-code compile gate → init template contract gate → unused snapshot/fixture gate (scripts/unused_fixtures_gate.py: orphaned .ambr/API fixtures, since xdist disables syrupy's own unused detection) → docs consistency gate (scripts/docs_consistency_gate.py: REFERENCE.md/README.md env vars, exit codes, and assembly … command refs stay in sync with the code) → docstring coverage gate (scripts/docstring_coverage_gate.py: public-API docstring ratchet, an interrogate stand-in that handles PEP 695 generics) → brew audit --strict (the shipped Formula/assembly.rb; self-skips without Homebrew) → pytest (90% branch coverage) → Textual TUI coverage (≥90% on the textual-importing modules — a per-surface floor so a fragile TUI module can't rot under the project-wide average; the module set is derived from the textual import and reuses the pytest .coverage, no re-run) → diff-cover (100% patch coverage vs origin/main) → mutation gate (diff-scoped: mutates each changed line and reruns the tests that cover it — a surviving mutant fails the gate, so changed lines need assertions that would fail if the line broke, not just coverage; suppress a genuinely unassertable line with # pragma: no mutate) → a "no new escape hatches" gate (# type: ignore / # noqa / pragma: no cover / Any / cast( / test skip/xfail/sleep, all count-gated against the merge-base so moving an existing hatch in a refactor doesn't false-positive but a net-new one fails) → uv build + twine check --strict. The vulture/deptry/lint-imports/xenon, patch-coverage, and mutation stages catch the failures that ruff+mypy alone won't — don't claim the gate is green until the script prints All checks passed. CodeQL is intentionally NOT in this gate — it's the slowest check (~minutes) and is enforced separately by the codeql.yml workflow (which also covers CI; check.sh self-skipped it on the hosted runner anyway), so dropping it keeps the local gate fast with no loss of CI coverage. scripts/codeql_gate.py still exists to reproduce a code-scanning alert locally (uv run python scripts/codeql_gate.py).
Commits are gated. On success check.sh records a working-tree signature (scripts/gate_marker.py record → .git/aai-gate-pass), and a PreToolUse hook (.claude/hooks/require-gate-before-commit.sh) blocks git commit unless that signature still matches — so run the full gate to completion before committing (a single-file pytest does not satisfy it), and re-run it after any further edit. Iterate with the fast targeted commands above, gate once at the end. For a deliberate work-in-progress commit, prefix AAI_ALLOW_COMMIT=1 git commit ….
Individual tools (all via uv run):
uv run ruff check . # lint
uv run ruff format . # format (line-length 100)
uv run mypy # files = ["aai_cli", "tests"] from pyproject; src is full --strict bar disallow_untyped_calls (jiwer ships no stubs); tests relax the untyped-body flags
prettier --check "aai_cli/init/templates/**/*.{js,css}" # JS/CSS template formatting
uv run pytest -q # default unit suite
uv run pytest tests/test_transcribe.py -q # a single file
uv run pytest tests/test_transcribe.py::test_name -q # a single testThe post-edit hook (.claude/settings.json) runs ruff check --fix --unfixable F401 + ruff format on every edited *.py. --unfixable F401 means a just-added import is not auto-deleted while it's momentarily unused — so adding an import in one edit and its usage in the next is safe. The flip side: a genuinely unused import survives the hook and only fails at ruff check in the gate, so still prefer making the import and its first usage land in the same edit.
Dozens of sessions may be working on this repo concurrently; the codebase is structured so independent changes stay in disjoint files. Keep it that way:
- Check for in-flight duplicates before starting a fix. Before implementing
a bug fix or small feature, scan open PRs and the last few
origin/maincommits touching the same files (two sessions once shipped the identical fix; the slower PR was closed as redundant). Thepr-overlapworkflow also warns when a PR's changed files intersect another open PR's — treat that warning as a prompt to reconcile, not noise. - A new command edits no shared file. Registration, help ordering, and the
snapshot partition are all derived from the command module's own
SPECdeclaration (seeaai_cli/AGENTS.md). If you find yourself editing a shared list to add a command, you're fighting the convention. - Dependency changes are not part of feature PRs.
uv.lockis the one file two branches can never merge cleanly; add, bump, or remove dependencies in a dedicated, single-purpose PR so feature branches don't collide in the lockfile. Dropping a dependency still rewritesuv.lock, so a removal gets its own PR too — even when it rides along with deleting the code that used it. - Land through the merge queue. The diff-scoped gates compare against
origin/main, which moves constantly; two individually-green PRs can be jointly red. PRs should merge via GitHub's merge queue (a repository setting) so the gate re-runs against the combined state before landing — don't bypass it with direct pushes tomain. - Update the
AGENTS.mdnearest your change when you learn something durable; don't grow this root file.
- The package/module is
aai_cli; the distribution name isaai-cli; the console command isassembly([project.scripts] assembly = "aai_cli.main:run"). assembly inittemplates live inaai_cli/init/templates/and are committed, including renamed dotfiles (gitignore→.gitignore,env.example). The wheel force-includes them via[tool.hatch.build.targets.wheel] artifacts, excluding__pycache__/*.pyc. Editing templates needs care — see the parametrized contract tests (tests/test_init_template_*.py).audioopleft the stdlib in 3.13;audioop-ltsbackfills it (conditional dependency). Supported Pythons: 3.12–3.13.- Releasing is tag-triggered. The version is derived from the git tag by hatch-vcs and written to a gitignored
aai_cli/_version.pyat build time — there is no version string to keep in sync acrosspyproject.tomloraai_cli/__init__.py, andbump_patch.shno longer exists. To cut a release, runscripts/cut_release.shfrom a cleanmainin sync withorigin/main: no argument → next patch above the latestvX.Y.Ztag;cut_release.sh X.Y.Z→ explicit version. It tags + pushes, which fires.github/workflows/release.yml— that builds the prebuilt arm64 Homebrew bottle (Formula/assembly.rb), cuts the GitHub Release, and opens the formula PR. You don't need a local checkout to release:release.ymlalso has a manualworkflow_dispatch(GitHub's "Run workflow" button, oractions_run_triggerfrom a Claude web session) taking an optionalversioninput — itstagjob resolves the version and creates+pushes the tag (reusingcut_release.sh --no-push), and the rest of the pipeline then runs in that same workflow run. Tag creation lives inside the release run on purpose: aGITHUB_TOKENtag push wouldn't re-trigger theon: pushhalf, so a separate "push the tag" workflow would silently never build. (dry_run: truebuilds the bottle for an existing tag without publishing.) Bottling matters because the deps include Rust-backed sdists (pydantic-core,jiter,cryptography) that would otherwise compile from source onbrew install. The Homebrew formula builds from a git-less GitHub source tarball, soFormula/assembly.rb'sdef installsets the genericSETUPTOOLS_SCM_PRETEND_VERSIONenv var (installing resources first under a clean env, then setting the var for our package only) to feed the tag version to the build.cut_release.shonly runs from a cleanmainin sync withorigin/main(it hard-errors on a feature branch / dirty tree), so cut releases frommain, not your working branch. The "update available" notice users see isaai_cli/update_check.py. - Release-run operational gotchas (cost prior sessions a follow-up PR each). Two
things bite the
release.ymlpath specifically: (1) the bot-opened formula PR (Bottle vX.Y.Z) is authored withGITHUB_TOKEN, which does not trigger CI, so its required check never reports — merge it with the admin override; the diff is formula-only by construction. (2) The manualworkflow_dispatchtagjob checks out withpersist-credentials: falseand must set a git identity before invokingcut_release.sh, because the script cuts an annotated tag (git tag -a) which needs a committer — without it the run dies withempty ident nameand the bottle/publish jobs skip silently (this path only ever "worked" locally, where maintainers have a global identity). Mirror thepublishjob'sgit config user.{name,email}step.
Lessons that cost time in agent sessions — read before exercising uv run assembly by hand:
- Web/remote containers are fully provisioned at session start
(
.claude/hooks/session-start.sh): system deps,markdownlint/prettier, and the Go gate binaries (actionlint,gitleaks) are installed at CI's pinned versions, so./scripts/check.shenforces the same gates CI does — a gate that "self-skips locally" should not be skipping in a web session. If one is, read/tmp/session-start.logto see what failed to provision. Keep the hook's stdout terse (one line per step) — it is injected into the agent's context every session. - Probe network reachability first. Remote/sandboxed environments often allowlist
PyPI but block
api.assemblyai.com/streaming.assemblyai.com/llm-gateway.assemblyai.com(curl -s https://api.assemblyai.com/v2/transcript -H "authorization: $ASSEMBLYAI_API_KEY"returning a proxy 403 like "Host not in allowlist" means no real-API path can work — test error handling and--show-codeinstead of burning time on happy paths). - Isolate the config dir per test run. The CLI persists profiles in
platformdirs-resolvedconfig.toml(e.g.~/.config/assemblyai/). Concurrent or destructive manual tests (corrupt-config probes, profile/env switches) stomp each other through that shared file — setXDG_CONFIG_HOME=$(mktemp -d)per run instead. - Write scratch output to
/tmp, never the repo root. Redirects likecmd > out.txtin the repo show up as untracked files and trip commit hooks/gates. - Headless boxes have no mic/speakers/browser.
assembly stream/assembly agentmic paths andassembly login's browser flow can't complete; wrap exploratory runs intimeout 30 …so a blocking path can't wedge the session. For pytest,--timeout N(pytest-timeout, in the dev group) does the same per-test.
from __future__ import annotationsat the top of every module; modern typing (X | None).- Ruff lint set: see
[tool.ruff.lint]inpyproject.toml.S603/S607are ignored project-wide because the CLI intentionally shells out toclaude/npxwith controlled args.B008is ignored (Typer usestyper.Option/Argumentcalls as defaults). - mypy is strict on
aai_cli(disallow_untyped_defs); tests are type-checked but exempt from return annotations. - Errors → stderr, data → stdout. Preserve this split; it's what makes the CLI pipeline-safe.
- Help copy is terse and period-less (Codex-CLI style): one-line command summaries (the docstring's first line) and single-sentence option/argument
help=strings are imperative, sentence-case, and carry no trailing period —"Burn always-visible captions into a video", not"…video.". Only genuinely multi-sentence help (e.g."X. Default: Y.") keeps normal punctuation. The strings render inassembly --help, so they're pinned by the syrupy--helpgoldens (tests/__snapshots__/test_snapshots_help_*.ambr) — regenerate with--snapshot-update, never hand-edit. Don't drop the period on internal helper docstrings (they aren't snapshot-covered, so the mutation gate would flag the changed line). - Deprecate flags with hidden traps, not removal: keep the old flag parsing (
hidden=True), emit a one-line "use X instead" warning, and drop it a release or two later — never hard-break a script mid-cycle.login --api-key(→--with-api-key) is the pattern to copy. - Secrets never ride argv: a key/token-valued option must read from stdin (
--with-api-key) or the env, so it can't leak into shell history orps. Run commands deliberately have no--api-keyat all. - Every NDJSON stream line carries a
"type"field (see REFERENCE.md "JSON output"); new event types are additive, existing fields stay stable.