Add record/replay fixture harness for end-to-end CLI tests#60
Conversation
Capture real AssemblyAI API responses once and replay CLI commands through them offline, so transcribe/transcripts/llm/account paths are exercised end-to-end (command + parsing + rendering) without touching the network. - scripts/record_fixtures.py: manual recorder (outside the gate) that drives the real client/llm/ams functions, scrubs every credential, and writes tests/fixtures/api/*.json. Refresh with `ASSEMBLYAI_API_KEY=… uv run python scripts/record_fixtures.py`. - tests/replay_fixtures.py: rebuilds a real aai.Transcript (from_response) and an OpenAI ChatCompletion (model_construct, matching the SDK's lenient wire parsing of the gateway's Anthropic-flavored fields) from recorded JSON. - tests/test_replay_e2e.py: 7 replay tests, one per command family, fully offline (pytest-socket untouched). - tests/fixtures/api/: 7 scrubbed snapshots (key/JWT redacted, email and account_id faked, private upload URLs redacted; gitleaks-clean). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both pass on Linux CI but fail the dev gate on macOS:
- share.py: `mypy --warn-unreachable` targets one platform at a time, so the
if/return on `sys.platform == "darwin"` proved the other return dead (line 33
on macOS, the first return on Linux). Rewrite as a ternary expression so
neither branch is a statement mypy can flag. Existing tests cover both
branches.
- test_source_validation: a long tmp path let Rich wrap mid-word ("py test"),
defeating the `" ".join(split())` unwrap. Compare with all whitespace
removed, matching the pattern already used in test_share.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| """ | ||
| secret_set = {s for s in secrets if s} | ||
|
|
||
| def scrub(obj: Any) -> Any: |
There was a problem hiding this comment.
The nested function scrub performs unbounded recursion over input structures (dict/list) without a depth limit; add an explicit max-depth parameter or convert to an iterative traversal to prevent stack overflows on deeply nested inputs.
Details
✨ AI Reasoning
The PR adds a recursive scrubber function used to traverse arbitrary JSON-like objects. It directly calls itself for dict values and list items without any explicit depth limiting or loop-based alternative. Recursive traversal of untrusted or very deep structures can cause unbounded call depth and stack overflow. The change introduced this recursion rather than modifying pre-existing recursive code.
🔧 How do I fix it?
Add depth limiting via counter parameters that are checked and enforced, or replace with iterative approaches using explicit loops or stack data structures. For graphs, combine depth limiting with visited set tracking.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
`gitleaks dir` scans the working tree regardless of .gitignore, so high-entropy values in a developer's gitignored `.claude/settings.local.json` (a personal Claude Code file that never enters the repo) fail the local gate while CI — which lacks the file — passes. Cost a real diagnose/move-aside/restore detour this session. Allowlist that one path; the regex is anchored to it, so tracked `.claude/` files (settings.json, agents/, skills/) and everything else stay scanned. Verified: full-repo scan goes 12 findings -> 0 with the file present, and a secret at any other path is still caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Future agent sessions shouldn't have to rediscover where fixtures live, how to refresh them, or why the LLM response is rebuilt with model_construct rather than model_validate. Add a "Replay fixtures" subsection covering the three moving parts (recorder / fixtures / replay helper), the refresh command, that the recorder is outside the gate, and the Transcript/ChatCompletion reconstruction gotchas. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Replace the `assert sample_id is not None` type-narrowing guard with an explicit `if ... raise CLIError` — asserts are stripped under PYTHONOPTIMIZE, so the check must be a real statement. - Bound the scrubber's recursion at a max depth (API JSON is shallow; a deeper structure is malformed/hostile input) so it can't stack-overflow. The string redaction is extracted to a module-level `_scrub_str` helper to keep the function under the project's cyclomatic-complexity cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Captures real AssemblyAI API responses once and replays CLI commands through them offline, so the transcribe / transcripts / llm / account command families are exercised end-to-end (command + parsing + rendering) without touching the network.
Harness
scripts/record_fixtures.py— a manual recorder (deliberately outside the gate; it hits the network). Drives the sameclient.* / llm.* / ams.*functions the CLI uses, scrubs every credential on the way out, and writestests/fixtures/api/*.json. Refresh with:aai login. Neither is ever written to a fixture.)tests/replay_fixtures.py— rebuilds the boundary objects from recorded JSON: a realaai.Transcriptviafrom_response, and an OpenAIChatCompletionviamodel_construct(which mirrors the SDK's own lenient wire parsing — the gateway returns Anthropic-flavored fields likefinish_reason: "end_turn"that strict validation rejects).tests/test_replay_e2e.py— 7 replay tests, one per command family, fully offline (pytest-socket untouched).tests/fixtures/api/— 7 scrubbed snapshots: API key / JWT redacted,email→user@example.com,account_id→12345, privatecdn.assemblyai.com/upload/<hash>URLs redacted. gitleaks-clean.Drive-by fixes (two pre-existing cross-platform gate failures)
Both pass on Linux CI but fail the dev gate on macOS:
share.py—mypy --warn-unreachabletargets one platform at a time, so theif sys.platform == "darwin": return …made the otherreturnprovably dead (line 33 on macOS, the first return on Linux). Rewritten as a ternary expression so neither branch is a statement mypy can flag.test_source_validation.py— a long tmp path let Rich wrap mid-word (py test), defeating the test's" ".join(split())unwrap. Now compares with all whitespace removed, matching the existing pattern intest_share.py.Testing
./scripts/check.sh→ All checks passed (1391 tests, 100% patch coverage, mutation gate clean, build + twine OK).🤖 Generated with Claude Code