Add visual-regression snapshot testing for the code/live TUIs#253
Conversation
The two Textual apps (assembly code / live) are the most layout-fragile surface in the repo: a one-line CSS edit silently reflows the whole frame, and the pilot tests only ever assert one widget/region at a time. Add pytest-textual-snapshot SVG goldens that pin the painted frame — splash, prompt bar, status line, voice bar, transcript widgets, and the docked approval/ask modals. Building the harness surfaced a real bug: a docked container with `width: 1fr` (the Horizontal/Vertical default) ignores horizontal margin and overflows the screen, clipping its rounded right border off-edge. This affected #promptbar (code TUI) and the #approvalbox / #askbox modals — all three rendered as unclosed boxes on the right. Fixed by switching them to `width: 100%`, which honors the side margins, and added precise region assertions (box.region.right <= screen width) so a regression is killed deterministically, not only by the SVG diff. tests/_tui_snapshot.py freezes the four sources of non-determinism (version string, voice-bar animation timer, the cascade worker, and the cwd/branch status line); the strategy is documented in tests/AGENTS.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
The project-wide 90% coverage gate is an average, so a layout-fragile TUI module could rot while the rest of the suite carries it. Add a per-surface floor: after the main pytest run, `coverage report --fail-under=90` over just the Textual modules, reusing the .coverage data already written (no re-run, branches included). The module set is derived — every aai_cli file that imports `textual` — so a new TUI module is held to the floor automatically with no list to hand-maintain. Documented in the AGENTS.md gate-stage list and the tests/AGENTS.md Textual testing section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Pin four more regression-prone painted states the initial suite missed: - approval modal with args expanded (`e`) — the multi-line content reveal - tool output collapsed — the clipped preview + "(Ctrl+O to expand)" hint - tool output expanded (Ctrl+O) — full content + collapse hint - live "thinking" phase — the amber voice bar between turns Deliberately did not add a snapshot for the `assembly code` voice-input mode: it was the other high-priority candidate, but voice is slated for removal, so pinning it would be counterproductive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Pin the remaining medium-value painted states:
assembly code:
- working spinner ("✶ Working… (7s)") docked above the prompt
- reply mid-stream (plain text, literal markdown) before the Markdown swap
- approval prompt for a benign command (no warning label)
- failed turn rendered as a red ✗ error line
assembly live:
- interim (partial) user transcript growing in place while listening
- mid-turn tool-call progress note ("Searching the web…")
- interrupted reply, finalized and tagged "(interrupted)"
- cascade failure surfaced as a red ✗ error line
The spinner frame is rendered through the real _spinner_text formatter with
a fixed elapsed/frame so it doesn't depend on wall-clock timing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Voice mode is staying, so pin its chrome: in voice mode the prompt bar is swapped for the animated listening bar (with the "(Ctrl-V to type)" hint) and the status line gains the green "● voice on" badge. The mic-capture leg is stubbed (_SnapshotCodeApp._begin_listening no-op) so the listening state renders synchronously without a background thread racing the screenshot. FakeVoice satisfies _VoiceIO and is covered by a small unit test since no render reaches it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
| # imports `textual` — so a new TUI module is picked up automatically with no list to | ||
| # hand-maintain. Reuses the .coverage data the pytest step just wrote (no re-run), and | ||
| # counts branches because that data was collected with --cov-branch. | ||
| tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)" |
There was a problem hiding this comment.
The if [[ -z "$tui_modules" ]] branch is effectively unreachable: with set -euo pipefail, git grep ... | paste ... exits early on no matches before that check runs.
| tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)" | |
| tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, - || true)" |
Details
✨ AI Reasoning
1) The new block tries to derive module paths and then handle the "none found" case explicitly.
2) The pipeline used for derivation returns failure when no files match.
3) In the current shell mode, that failure aborts execution before the explicit fallback branch is evaluated.
4) This creates contradictory control flow: a branch intended to handle an empty result is effectively unreachable in the exact scenario it describes.
5) This is a definite logic issue in the changed control flow, not a style concern.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
The two Textual apps (assembly code / live) are the most layout-fragile
surface in the repo: a one-line CSS edit silently reflows the whole frame,
and the pilot tests only ever assert one widget/region at a time. Add
pytest-textual-snapshot SVG goldens that pin the painted frame — splash,
prompt bar, status line, voice bar, transcript widgets, and the docked
approval/ask modals.
Building the harness surfaced a real bug: a docked container with
width: 1fr(the Horizontal/Vertical default) ignores horizontal marginand overflows the screen, clipping its rounded right border off-edge.
This affected #promptbar (code TUI) and the #approvalbox / #askbox modals
— all three rendered as unclosed boxes on the right. Fixed by switching
them to
width: 100%, which honors the side margins, and added preciseregion assertions (box.region.right <= screen width) so a regression is
killed deterministically, not only by the SVG diff.
tests/_tui_snapshot.py freezes the four sources of non-determinism
(version string, voice-bar animation timer, the cascade worker, and the
cwd/branch status line); the strategy is documented in tests/AGENTS.md.
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL