Skip to content

Add visual-regression snapshot testing for the code/live TUIs#253

Merged
alexkroman merged 5 commits into
mainfrom
claude/textual-testing-strategy-9jt37q
Jun 19, 2026
Merged

Add visual-regression snapshot testing for the code/live TUIs#253
alexkroman merged 5 commits into
mainfrom
claude/textual-testing-strategy-9jt37q

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

The two Textual apps (assembly code / live) are the most layout-fragile
surface in the repo: a one-line CSS edit silently reflows the whole frame,
and the pilot tests only ever assert one widget/region at a time. Add
pytest-textual-snapshot SVG goldens that pin the painted frame — splash,
prompt bar, status line, voice bar, transcript widgets, and the docked
approval/ask modals.

Building the harness surfaced a real bug: a docked container with
width: 1fr (the Horizontal/Vertical default) ignores horizontal margin
and overflows the screen, clipping its rounded right border off-edge.
This affected #promptbar (code TUI) and the #approvalbox / #askbox modals
— all three rendered as unclosed boxes on the right. Fixed by switching
them to width: 100%, which honors the side margins, and added precise
region assertions (box.region.right <= screen width) so a regression is
killed deterministically, not only by the SVG diff.

tests/_tui_snapshot.py freezes the four sources of non-determinism
(version string, voice-bar animation timer, the cascade worker, and the
cwd/branch status line); the strategy is documented in tests/AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL

claude added 5 commits June 19, 2026 03:18
The two Textual apps (assembly code / live) are the most layout-fragile
surface in the repo: a one-line CSS edit silently reflows the whole frame,
and the pilot tests only ever assert one widget/region at a time. Add
pytest-textual-snapshot SVG goldens that pin the painted frame — splash,
prompt bar, status line, voice bar, transcript widgets, and the docked
approval/ask modals.

Building the harness surfaced a real bug: a docked container with
`width: 1fr` (the Horizontal/Vertical default) ignores horizontal margin
and overflows the screen, clipping its rounded right border off-edge.
This affected #promptbar (code TUI) and the #approvalbox / #askbox modals
— all three rendered as unclosed boxes on the right. Fixed by switching
them to `width: 100%`, which honors the side margins, and added precise
region assertions (box.region.right <= screen width) so a regression is
killed deterministically, not only by the SVG diff.

tests/_tui_snapshot.py freezes the four sources of non-determinism
(version string, voice-bar animation timer, the cascade worker, and the
cwd/branch status line); the strategy is documented in tests/AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
The project-wide 90% coverage gate is an average, so a layout-fragile TUI
module could rot while the rest of the suite carries it. Add a per-surface
floor: after the main pytest run, `coverage report --fail-under=90` over
just the Textual modules, reusing the .coverage data already written (no
re-run, branches included).

The module set is derived — every aai_cli file that imports `textual` — so
a new TUI module is held to the floor automatically with no list to
hand-maintain. Documented in the AGENTS.md gate-stage list and the
tests/AGENTS.md Textual testing section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Pin four more regression-prone painted states the initial suite missed:

- approval modal with args expanded (`e`) — the multi-line content reveal
- tool output collapsed — the clipped preview + "(Ctrl+O to expand)" hint
- tool output expanded (Ctrl+O) — full content + collapse hint
- live "thinking" phase — the amber voice bar between turns

Deliberately did not add a snapshot for the `assembly code` voice-input
mode: it was the other high-priority candidate, but voice is slated for
removal, so pinning it would be counterproductive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Pin the remaining medium-value painted states:

assembly code:
- working spinner ("✶ Working… (7s)") docked above the prompt
- reply mid-stream (plain text, literal markdown) before the Markdown swap
- approval prompt for a benign command (no warning label)
- failed turn rendered as a red ✗ error line

assembly live:
- interim (partial) user transcript growing in place while listening
- mid-turn tool-call progress note ("Searching the web…")
- interrupted reply, finalized and tagged "(interrupted)"
- cascade failure surfaced as a red ✗ error line

The spinner frame is rendered through the real _spinner_text formatter with
a fixed elapsed/frame so it doesn't depend on wall-clock timing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
Voice mode is staying, so pin its chrome: in voice mode the prompt bar is
swapped for the animated listening bar (with the "(Ctrl-V to type)" hint)
and the status line gains the green "● voice on" badge.

The mic-capture leg is stubbed (_SnapshotCodeApp._begin_listening no-op) so
the listening state renders synchronously without a background thread racing
the screenshot. FakeVoice satisfies _VoiceIO and is covered by a small unit
test since no render reaches it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BCoAfjg7pCUxD3nxAhGcRL
@alexkroman alexkroman enabled auto-merge June 19, 2026 03:59
Comment thread scripts/check.sh
# imports `textual` — so a new TUI module is picked up automatically with no list to
# hand-maintain. Reuses the .coverage data the pytest step just wrote (no re-run), and
# counts branches because that data was collected with --cov-branch.
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)"

@aikido-pr-checks aikido-pr-checks Bot Jun 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if [[ -z "$tui_modules" ]] branch is effectively unreachable: with set -euo pipefail, git grep ... | paste ... exits early on no matches before that check runs.

Suggested change
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, -)"
tui_modules="$(git grep -lP '^\s*(from|import) textual' -- 'aai_cli/**/*.py' | paste -sd, - || true)"
Details

✨ AI Reasoning
​​1) The new block tries to derive module paths and then handle the "none found" case explicitly.
​2) The pipeline used for derivation returns failure when no files match.
​3) In the current shell mode, that failure aborts execution before the explicit fallback branch is evaluated.
​4) This creates contradictory control flow: a branch intended to handle an empty result is effectively unreachable in the exact scenario it describes.
​5) This is a definite logic issue in the changed control flow, not a style concern.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

@alexkroman alexkroman added this pull request to the merge queue Jun 19, 2026
Merged via the queue into main with commit 7c8af2c Jun 19, 2026
20 checks passed
@alexkroman alexkroman deleted the claude/textual-testing-strategy-9jt37q branch June 19, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants