Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ uv run assembly --help # run the CLI from the locked environment

Dev tooling is a PEP 735 `[dependency-groups]` group with `default-groups = ["dev"]`, not a `[project]` extra — `uv sync --extra dev` errors.

`scripts/check.sh` is the authoritative gate; keep this list in sync with it. It runs, in order: `uv lock --check` → `ruff check` → `ruff format --check` → `mypy` → `pyright` (src strict) → `pyright` (tests) → `vulture` (dead code) → `deptry` (dependency hygiene) → `lint-imports` (import-linter architecture contracts) → max-file-length (500 lines) → `xenon` (cyclomatic complexity: function max B, module avg A, project avg A) → `swiftlint` + swift compile (macOS only, skipped elsewhere) → `markdownlint` → `codespell` (spell-check code/comments/docs via `uvx`; config in `[tool.codespell]`) → `prettier` (init template JS/CSS) → `shellcheck` → `actionlint` + `zizmor` (workflow lint/audit) → `gitleaks` (secret scan) → generated `--show-code` compile gate → init template contract gate → unused snapshot/fixture gate (`scripts/unused_fixtures_gate.py`: orphaned `.ambr`/API fixtures, since xdist disables syrupy's own unused detection) → docs consistency gate (`scripts/docs_consistency_gate.py`: REFERENCE.md/README.md env vars, exit codes, and `assembly …` command refs stay in sync with the code) → docstring coverage gate (`scripts/docstring_coverage_gate.py`: public-API docstring ratchet, an `interrogate` stand-in that handles PEP 695 generics) → `brew audit --strict` (the shipped `Formula/assembly.rb`; self-skips without Homebrew) → `pytest` (90% branch coverage) → `diff-cover` (100% patch coverage vs `origin/main`) → **mutation gate** (diff-scoped: mutates each changed line and reruns the tests that cover it — a surviving mutant fails the gate, so changed lines need assertions that would *fail* if the line broke, not just coverage; suppress a genuinely unassertable line with `# pragma: no mutate`) → a "no new escape hatches" gate (`# type: ignore` / `# noqa` / `pragma: no cover` / `Any` / `cast(` / test skip/xfail/sleep, all **count-gated against the merge-base** so moving an existing hatch in a refactor doesn't false-positive but a net-new one fails) → `uv build` + `twine check --strict`. The `vulture`/`deptry`/`lint-imports`/`xenon`, patch-coverage, and mutation stages catch the failures that `ruff`+`mypy` alone won't — don't claim the gate is green until the script prints `All checks passed.` **CodeQL is intentionally NOT in this gate** — it's the slowest check (~minutes) and is enforced separately by the `codeql.yml` workflow (which also covers CI; `check.sh` self-skipped it on the hosted runner anyway), so dropping it keeps the local gate fast with no loss of CI coverage. `scripts/codeql_gate.py` still exists to reproduce a code-scanning alert locally (`uv run python scripts/codeql_gate.py`).
`scripts/check.sh` is the authoritative gate; keep this list in sync with it. It runs, in order: `uv lock --check` → `ruff check` → `ruff format --check` → `mypy` → `pyright` (src strict) → `pyright` (tests) → `vulture` (dead code) → `deptry` (dependency hygiene) → `lint-imports` (import-linter architecture contracts) → max-file-length (500 lines) → `xenon` (cyclomatic complexity: function max B, module avg A, project avg A) → `swiftlint` + swift compile (macOS only, skipped elsewhere) → `markdownlint` → `codespell` (spell-check code/comments/docs via `uvx`; config in `[tool.codespell]`) → `prettier` (init template JS/CSS) → `shellcheck` → `actionlint` + `zizmor` (workflow lint/audit) → `gitleaks` (secret scan) → generated `--show-code` compile gate → init template contract gate → unused snapshot/fixture gate (`scripts/unused_fixtures_gate.py`: orphaned `.ambr`/API fixtures, since xdist disables syrupy's own unused detection) → docs consistency gate (`scripts/docs_consistency_gate.py`: REFERENCE.md/README.md env vars, exit codes, and `assembly …` command refs stay in sync with the code) → docstring coverage gate (`scripts/docstring_coverage_gate.py`: public-API docstring ratchet, an `interrogate` stand-in that handles PEP 695 generics) → `brew audit --strict` (the shipped `Formula/assembly.rb`; self-skips without Homebrew) → `pytest` (90% branch coverage) → Textual TUI coverage (≥90% on the `textual`-importing modules — a per-surface floor so a fragile TUI module can't rot under the project-wide average; the module set is derived from the `textual` import and reuses the pytest `.coverage`, no re-run) → `diff-cover` (100% patch coverage vs `origin/main`) → **mutation gate** (diff-scoped: mutates each changed line and reruns the tests that cover it — a surviving mutant fails the gate, so changed lines need assertions that would *fail* if the line broke, not just coverage; suppress a genuinely unassertable line with `# pragma: no mutate`) → a "no new escape hatches" gate (`# type: ignore` / `# noqa` / `pragma: no cover` / `Any` / `cast(` / test skip/xfail/sleep, all **count-gated against the merge-base** so moving an existing hatch in a refactor doesn't false-positive but a net-new one fails) → `uv build` + `twine check --strict`. The `vulture`/`deptry`/`lint-imports`/`xenon`, patch-coverage, and mutation stages catch the failures that `ruff`+`mypy` alone won't — don't claim the gate is green until the script prints `All checks passed.` **CodeQL is intentionally NOT in this gate** — it's the slowest check (~minutes) and is enforced separately by the `codeql.yml` workflow (which also covers CI; `check.sh` self-skipped it on the hosted runner anyway), so dropping it keeps the local gate fast with no loss of CI coverage. `scripts/codeql_gate.py` still exists to reproduce a code-scanning alert locally (`uv run python scripts/codeql_gate.py`).

**Commits are gated.** On success `check.sh` records a working-tree signature (`scripts/gate_marker.py record` → `.git/aai-gate-pass`), and a PreToolUse hook (`.claude/hooks/require-gate-before-commit.sh`) blocks `git commit` unless that signature still matches — so run the full gate to completion *before* committing (a single-file `pytest` does not satisfy it), and re-run it after any further edit. Iterate with the fast targeted commands above, gate once at the end. For a deliberate work-in-progress commit, prefix `AAI_ALLOW_COMMIT=1 git commit …`.

Expand Down
6 changes: 4 additions & 2 deletions aai_cli/code_agent/modals.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,10 @@ class ApprovalScreen(ModalScreen[str]):

DEFAULT_CSS = """
ApprovalScreen { align: center bottom; background: transparent; }
/* width: 100% (not 1fr) so the box honors its 1-col side margins — a docked 1fr container
ignores horizontal margin and overflows the screen, clipping the right border off-edge. */
ApprovalScreen #approvalbox {
dock: bottom; width: 1fr; height: auto;
dock: bottom; width: 100%; height: auto;
border: round #f59e0b; background: #000000; padding: 0 1; margin: 0 1 1 1;
}
ApprovalScreen #approvalbox Label { height: auto; }
Expand Down Expand Up @@ -163,7 +165,7 @@ class AskScreen(ModalScreen[str]):
DEFAULT_CSS = """
AskScreen { align: center bottom; background: transparent; }
AskScreen #askbox {
dock: bottom; width: 1fr; height: auto;
dock: bottom; width: 100%; height: auto;
border: round #3a3f55; background: #000000; padding: 0 1; margin: 0 1 1 1;
}
"""
Expand Down
4 changes: 3 additions & 1 deletion aai_cli/code_agent/tui.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,9 @@ class CodeAgentApp(_VoiceLegs):
/* The transcript is a scroll container of mounted message widgets (not a RichLog), so the
reply streams in place and tool output can expand/collapse. */
#log {{ height: 1fr; border: none; background: #000000; padding: 1 2; }}
#promptbar {{ dock: bottom; height: 3; background: #000000; border: round #3a3f55; margin: 1 1; }}
/* width: 100% (not the 1fr default) so the bordered box fits inside its 1-col side margins;
a docked 1fr container ignores horizontal margin and overflows, clipping the right border. */
#promptbar {{ dock: bottom; height: 3; width: 100%; background: #000000; border: round #3a3f55; margin: 1 1; }}
#promptmark {{ width: 3; color: {banner.BRAND_HEX}; content-align: center middle; }}
#prompt {{ border: none; background: #000000; padding: 0; }}
/* Shown in place of the prompt while voice capture is on (Ctrl-V brings the prompt back). */
Expand Down
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ dev = [
# failure instead of a wedged session (not in addopts — opt-in per run).
"pytest-timeout>=2.3.1",
"time-machine>=3.1.0",
# Visual-regression snapshots for the Textual TUIs (`assembly code` / `live`): the
# `snap_compare` fixture renders an app to SVG and diffs it against a committed golden,
# catching CSS/layout/docking regressions the behavioral pilot tests can't see. Stores
# SVGs under tests/__snapshots__/<module>/ (regenerate with --snapshot-update like the
# .ambr goldens). See tests/AGENTS.md "Textual visual snapshots".
"pytest-textual-snapshot>=1.0.0",
"hypothesis>=6.155.1",
"ruff>=0.15.15",
"mypy>=2.1.0",
Expand Down
15 changes: 15 additions & 0 deletions scripts/check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,21 @@ echo "==> pytest (with branch-coverage gate)"
# splitting it across workers is safe.
uv run pytest -q --strict-config --strict-markers -n auto -m "not e2e and not install" --cov=aai_cli --cov-branch --cov-context=test --cov-report=term-missing --cov-report=xml --cov-fail-under=90

echo "==> Textual TUI coverage (>=90% on the textual-importing modules)"
# The project-wide 90% gate above is an average, so a TUI module can rot while the rest
# of the suite carries it. The Textual TUIs (`assembly code` / `live`) are the most
# layout-fragile, regression-prone surface in the repo (see tests/AGENTS.md), so hold
# them to their own >=90% floor. The module set is *derived* — every aai_cli file that
# imports `textual` — so a new TUI module is picked up automatically with no list to
# hand-maintain. Reuses the .coverage data the pytest step just wrote (no re-run), and
# counts branches because that data was collected with --cov-branch.
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)"

@aikido-pr-checks aikido-pr-checks Bot Jun 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if [[ -z "$tui_modules" ]] branch is effectively unreachable: with set -euo pipefail, git grep ... | paste ... exits early on no matches before that check runs.

Suggested change
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)"
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, - || true)"
Details

✨ AI Reasoning
​​1) The new block tries to derive module paths and then handle the "none found" case explicitly.
​2) The pipeline used for derivation returns failure when no files match.
​3) In the current shell mode, that failure aborts execution before the explicit fallback branch is evaluated.
​4) This creates contradictory control flow: a branch intended to handle an empty result is effectively unreachable in the exact scenario it describes.
​5) This is a definite logic issue in the changed control flow, not a style concern.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

if [[ -z "$tui_modules" ]]; then
echo " no textual-importing modules found (the derive pattern is stale?)"
exit 1
fi
uv run coverage report --include="$tui_modules" --fail-under=90

echo "==> diff-cover (patch coverage: every changed line must be tested)"
# The 90% gate above is project-wide, so new code can ride on the existing suite and
# stay untested. diff-cover requires 100% coverage of the lines changed versus the
Expand Down
19 changes: 19 additions & 0 deletions tests/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,25 @@ CLI output is pinned by **syrupy snapshot tests** (`tests/__snapshots__/*.ambr`)

The `--help` goldens are split per command group (`tests/test_snapshots_help_<group>.py`) so concurrent branches touching different commands regenerate *different* `.ambr` files. The partition (`HELP_GROUPS` in `tests/_snapshot_surface.py`) is **derived from each command module's `SPEC.panel`** (see `aai_cli/command_registry.py`), so a new command lands in the right group automatically; `tests/test_snapshots_help_groups.py` guards that the derived partition matches the live Typer tree. The root `assembly --help` screen — which every new command changes — has its own golden (`tests/test_snapshots_help_root.py`), so that churn stays confined to one trivially-regenerable `.ambr` file.

## Textual visual snapshots (the `code` / `live` TUIs)

The two Textual apps — `CodeAgentApp` (`assembly code`) and `LiveAgentApp` (`assembly live`) — are **the most layout-fragile surface in the repo**: a one-line CSS edit (a dock, a width, a margin, a transparent background) silently shifts the whole painted frame, and the pilot tests (`test_code_tui.py` / `test_live_tui.py`) only ever assert one widget, region, or flag at a time — they can't see "the modal's right border is now clipped off-screen". So they're backed by **visual-regression snapshots** (`tests/test_tui_snapshots.py`, on top of the `pytest-textual-snapshot` `snap_compare` fixture): each test renders an app (or a pushed modal) to an SVG and diffs it against a committed golden under `tests/__snapshots__/test_tui_snapshots/*.raw`. (This is how the `width: 1fr` → `width: 100%` overflow bug in `#promptbar`/`#approvalbox`/`#askbox` was found — a docked `1fr` container ignores horizontal margin and overflows, and the pilot region asserts never checked the right edge.)

The two layers are complementary, so add to whichever fits: a **behavioral** assertion (a key press changes state, a modal returns a value, a region stays docked) goes in the pilot tests; a **visual** change (chrome, colors, spacing, a new transcript widget) earns a `snap_compare` golden. When a visual fix lands, pin the precise invariant in a pilot test too (e.g. `box.region.right <= 100`) so a mutant is killed deterministically, not only by the SVG diff.

Regenerate after an intentional UI change with `uv run pytest tests/test_tui_snapshots.py --snapshot-update` and **eyeball every changed SVG before committing** — a blessed-but-wrong baseline is worse than no snapshot. (No SVG viewer in a headless session? Reconstruct the text by grouping each `<text>` element's content by its `y` coordinate; that's enough to read the frame and spot a clipped border.)

A Textual app renders non-deterministically unless four things are frozen — all handled by `tests/_tui_snapshot.py` (read its module docstring before adding a test):

- **`banner.version()`** in the splash is the hatch-vcs git-tag string (`v0.1.devN+g<sha>`), different on every commit — `pin_banner_version` freezes it.
- **The voice bar's meter** advances on a 0.3s `set_interval`; the frame at screenshot time depends on wall-clock scheduling — `freeze_animation` pins it to one frame and stops the timer (and the spinner's).
- **`LiveAgentApp` starts the blocking cascade on a worker thread on mount**, which `exit()`s the app before the screenshot — `build_live_app` returns a subclass whose `_start` is a no-op, and the test drives the transcript methods directly.
- **The code status line** renders the cwd, git branch, and `~`-abbreviated home (all machine/platform-specific) — `stable_workdir` pins `Path.home` and builds a fixed `~/demo` cwd with a fake `.git/HEAD`.

The `.raw` SVGs live in a `tests/__snapshots__/test_tui_snapshots/` **subdirectory**, so `scripts/unused_fixtures_gate.py` (which globs only top-level `*.ambr`) doesn't police them — delete a renamed test's stale `.raw` by hand.

On top of the project-wide 90% gate, `check.sh` enforces a **per-surface ≥90% coverage floor on the Textual modules** (every `aai_cli` file that imports `textual` — derived, not hand-listed — reusing the pytest `.coverage`), so a fragile TUI module can't rot while the rest of the suite carries the average. Keep these modules well-covered by the pilot tests; a new TUI module is held to the floor automatically.

## Hermeticity (enforced three ways)

The suite is hermetic by construction (`tests/conftest.py` + `pyproject.toml` `[tool.pytest.ini_options]`): **pytest-randomly** shuffles order, an autouse `pin_timezone` fixture pins `TZ` to a fixed non-UTC zone (UTC-normalized rendering must be unaffected; use **time-machine** to freeze `now`), and **pytest-socket** (`--disable-socket`) blocks real network so an unmocked SDK/HTTP call fails loudly instead of hitting the API. A test that only binds a loopback server opts back in with the tight `@pytest.mark.allow_hosts(["127.0.0.1"])` (still blocks external hosts). The `e2e`/`install` marker suites legitimately reach the real network in-process (PyPI reachability probes, real-API runs), so a `pytest_collection_modifyitems` hook in `conftest.py` auto-grants them full sockets — adding a network marker is all that's needed, no per-test `enable_socket`.
Expand Down
Loading
Loading