diff --git a/AGENTS.md b/AGENTS.md index 2c651933..4bc00dc5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -58,6 +58,26 @@ The post-edit hook (`.claude/settings.json`) runs `ruff check --fix --unfixable The suite is hermetic by construction, enforced three ways (`tests/conftest.py` + `pyproject.toml` `[tool.pytest.ini_options]`): **pytest-randomly** shuffles order, an autouse `pin_timezone` fixture pins `TZ` to a fixed non-UTC zone (UTC-normalized rendering must be unaffected; use **time-machine** to freeze `now`), and **pytest-socket** (`--disable-socket`) blocks real network so an unmocked SDK/HTTP call fails loudly instead of hitting the API. A test that only binds a loopback server opts back in with the tight `@pytest.mark.allow_hosts(["127.0.0.1"])` (still blocks external hosts). The `e2e`/`install`/`install_script` marker suites legitimately reach the real network in-process (PyPI reachability probes, real-API runs), so a `pytest_collection_modifyitems` hook in `conftest.py` auto-grants them full sockets — adding a network marker is all that's needed, no per-test `enable_socket`. +### Manual QA / running the CLI in sandboxed sessions + +Lessons that cost time in agent sessions — read before exercising `uv run aai` by hand: + +- **Probe network reachability first.** Remote/sandboxed environments often allowlist + PyPI but block `api.assemblyai.com` / `streaming.assemblyai.com` / `llm-gateway.assemblyai.com` + (`curl -s https://api.assemblyai.com/v2/transcript -H "authorization: $ASSEMBLYAI_API_KEY"` + returning a proxy 403 like "Host not in allowlist" means **no** real-API path can work — + test error handling and `--show-code` instead of burning time on happy paths). +- **Isolate the config dir per test run.** The CLI persists profiles in + `platformdirs`-resolved `config.toml` (e.g. `~/.config/assemblyai/`). Concurrent or + destructive manual tests (corrupt-config probes, profile/env switches) stomp each other + through that shared file — set `XDG_CONFIG_HOME=$(mktemp -d)` per run instead. +- **Write scratch output to `/tmp`, never the repo root.** Redirects like `cmd > out.txt` + in the repo show up as untracked files and trip commit hooks/gates. +- **Headless boxes have no mic/speakers/browser.** `aai stream`/`aai agent` mic paths and + `aai login`'s browser flow can't complete; wrap exploratory runs in `timeout 30 …` so a + blocking path can't wedge the session. For pytest, `--timeout N` (pytest-timeout, in the + dev group) does the same per-test. + ## Naming & packaging gotchas - The **package/module** is `aai_cli`; the **distribution** name is `aai-cli`; the **console command** is `aai` (`[project.scripts] aai = "aai_cli.main:run"`). @@ -70,7 +90,7 @@ A Typer CLI. `aai_cli/main.py` builds the `app`, registers each command sub-app, ### Command layer -Each file in `aai_cli/commands/` is a Typer sub-app (`transcribe`, `stream`, `transcripts`, `agent`, `llm`, `login`, `doctor`, `init`, `claude`). Command bodies run through `context.run_command(ctx, fn, json=...)`, which maps any `CLIError` to clean stderr output + the error's exit code. Commands never print tracebacks for expected failures. +Each file in `aai_cli/commands/` is a Typer sub-app (`transcribe`, `stream`, `transcripts`, `agent`, `llm`, `login` (login/logout/whoami), `doctor`, `init`, `dev`, `share`, `deploy`, `setup`, `onboard`, `account` (balance/usage/limits), `keys`, `sessions`, `audit`). Command bodies run through `context.run_command(ctx, fn, json=...)`, which maps any `CLIError` to clean stderr output + the error's exit code. Commands never print tracebacks for expected failures. ### Cross-cutting state (resolution order matters) @@ -78,7 +98,7 @@ Each file in `aai_cli/commands/` is a Typer sub-app (`transcribe`, `stream`, `tr - **`config.py`** — profiles persisted in `config.toml` (via `platformdirs`); the **API key lives only in the OS keyring** (`KEYRING_SERVICE = "assemblyai-cli"`), never in a dotfile. Key resolution order: `--api-key` flag (validation paths only) → `ASSEMBLYAI_API_KEY` env → keyring. **Run commands deliberately expose no `--api-key` flag** so keys can't leak into `ps`/shell history. - **`environments.py`** — a frozen `Environment` (api_base, streaming_host, llm_gateway_base, ams_base, stytch_*). `DEFAULT_ENV` is **`production`**; use `--sandbox` (or `--env sandbox000` / `AAI_ENV`) to target the sandbox. The active environment is a process-global set once at startup; precedence: `--env` → `AAI_ENV` → profile's stored env → default. A credential is only valid against the environment that minted it. - **`client.py`** — thin wrappers over the `assemblyai` SDK (`transcribe`, `list_transcripts`, `stream_audio`, etc.). It normalizes SDK exceptions: auth failures become a single clean `auth_failure()` `CLIError`; everything else becomes `APIError`. New SDK calls should follow this try/except shape. -- **`errors.py`** — the `CLIError` hierarchy (each with `error_type` + `exit_code`). `output.py` emits errors to **stderr**; stdout stays clean for pipelines. `--json` (auto-enabled when piped/agent-run) switches to machine-readable output. +- **`errors.py`** — the `CLIError` hierarchy (each with `error_type` + `exit_code`). `output.py` emits errors to **stderr**; stdout stays clean for pipelines. `--json` switches to machine-readable output; it is never auto-enabled — `output.resolve_json()` deliberately keeps human text the default even when piped or agent-run. ### Feature subsystems diff --git a/aai_cli/agent/render.py b/aai_cli/agent/render.py index e10c4351..5c62cf7d 100644 --- a/aai_cli/agent/render.py +++ b/aai_cli/agent/render.py @@ -40,13 +40,15 @@ def connected(self) -> None: self._line(Text("Connected — start talking. (Ctrl-C to stop)", style="aai.muted")) def notice(self, text: str) -> None: - """Print a human-facing notice (suppressed in JSON; to stderr in text mode).""" + """Print a human-facing notice: suppressed in JSON, to stderr otherwise. + + Stderr in *every* non-JSON mode (not just ``-o text``): the default human + mode is also piped sometimes (``aai agent | head``), and a notice on stdout + would be consumed as transcript data there. + """ if self.json_mode: return - if self.text_mode: - self._status(text.rstrip("\n")) - else: - self._line(text.rstrip("\n")) + self._status(text.rstrip("\n")) # --- user -------------------------------------------------------------- def user_partial(self, text: str) -> None: diff --git a/aai_cli/agent/session.py b/aai_cli/agent/session.py index 77e52674..f5375430 100644 --- a/aai_cli/agent/session.py +++ b/aai_cli/agent/session.py @@ -3,6 +3,7 @@ import base64 import contextlib import json +import logging import threading from collections.abc import Callable from dataclasses import dataclass @@ -31,6 +32,9 @@ def ws_url() -> str: # session.error codes that mean the connection is unauthorized -> exit 2. _AUTH_ERROR_CODES = {"UNAUTHORIZED", "FORBIDDEN"} +# A pre-upgrade HTTP 403 on the WebSocket handshake (see _is_rejected_key). +_HTTP_FORBIDDEN = 403 + # The websocket connection, the `connect` factory, and the renderer/player/mic I/O # objects come from libraries/modules with no usable type stubs. Alias that untyped # boundary here so each role is named in signatures and `Any` stays in one place. @@ -189,10 +193,44 @@ def _send_audio_loop(ws: _WebSocket, session: VoiceAgentSession, mic: _IO) -> No return +# The sync websockets client logs through these; both are silenced for the session +# (the parent covers any future child logger, the client logger is the one that fires). +_WEBSOCKETS_LOGGERS = ("websockets", "websockets.client") + + +def _silence_websockets_logging() -> None: + """Keep websockets' internal logging off the user's stderr for the session. + + The sync client's background reader thread logs unhandled teardown errors (e.g. + ``EOFError: stream ended``) as "unexpected internal error" + traceback through the + ``websockets.client`` logger, which would land on stderr right next to our clean + CLIError. Those internals are never user-actionable from the CLI, so raise the + loggers above every level they emit at. Idempotent: re-setting the level is a no-op. + """ + for name in _WEBSOCKETS_LOGGERS: + logging.getLogger(name).setLevel(logging.CRITICAL) + + +def _is_rejected_key(exc: Exception) -> bool: + """Is this connect/session failure auth-shaped (the key itself was rejected)? + + Mirrors how `stream` classifies handshake failures: a plain HTTP 403 on the + WebSocket upgrade stays an API error there ("Streaming error: WebSocket handshake + rejected (HTTP 403)"), so it must not become "Your API key was rejected" here — + 403 also covers non-credential blocks (WAF, region, plan). Only 401, the Voice + Agent's 1008 policy-violation close, or an explicitly auth-worded message + (`is_auth_failure`'s text hints) count as a rejected key. + """ + status = getattr(getattr(exc, "response", None), "status_code", None) + if status == _HTTP_FORBIDDEN: + return False + return is_auth_failure(exc) + + def _auth_or_api_error(exc: Exception, message: str) -> CLIError: """Map a connect/session exception to the right CLIError: a rejected key becomes auth_failure(), anything else becomes APIError(f"{message}: {exc}").""" - if is_auth_failure(exc): + if _is_rejected_key(exc): return auth_failure() return APIError(f"{message}: {exc}") @@ -243,6 +281,7 @@ def run_session( the agent's first reply to the spoken input and the capture thread waits for session.ready before streaming the source. """ + _silence_websockets_logging() if connect is None: from websockets.sync.client import connect diff --git a/aai_cli/auth/flow.py b/aai_cli/auth/flow.py index c8a086fe..62243eda 100644 --- a/aai_cli/auth/flow.py +++ b/aai_cli/auth/flow.py @@ -8,7 +8,7 @@ from aai_cli import output from aai_cli.auth import ams, discovery, endpoints, loopback -from aai_cli.errors import APIError +from aai_cli.errors import APIError, NotAuthenticated @dataclass @@ -97,8 +97,8 @@ def _open_browser(url: str) -> None: ) -def _capture() -> loopback.CallbackResult: - return loopback.capture_callback() +def _start_capture() -> loopback.CallbackCapture: + return loopback.start_capture() def _reusable_cli_key(token: _Token) -> str | None: @@ -137,13 +137,21 @@ def find_or_create_cli_key(account_id: int, session_jwt: str) -> str: def run_login_flow() -> LoginResult: """Drive the full browser + AMS login and return a LoginResult.""" + # Bind the loopback callback server *before* opening the browser: if the port is + # taken, fail cleanly now instead of stranding the user mid-OAuth in a flow that + # can never call back. + capture = _start_capture() _open_browser(discovery.build_start_url()) - result = _capture() + output.error_console.print( + "[aai.muted]Waiting up to 2 minutes for you to finish signing in…[/aai.muted]\n" + "[aai.muted]No browser here? Run 'aai login --api-key ' instead.[/aai.muted]" + ) + result = capture.wait() if result.error == "timeout": - raise APIError( + raise NotAuthenticated( "Login timed out waiting for the browser.", - suggestion="Run 'aai login' again.", + suggestion="Run 'aai login' again, or use 'aai login --api-key '.", ) if result.token_type != "discovery_oauth" or not result.token: # noqa: S105 raise APIError( diff --git a/aai_cli/auth/loopback.py b/aai_cli/auth/loopback.py index b27b77e7..b00b8cac 100644 --- a/aai_cli/auth/loopback.py +++ b/aai_cli/auth/loopback.py @@ -30,15 +30,47 @@ class CallbackResult: error: str | None = None -def capture_callback( - timeout: float = 120.0, # pragma: no mutate (default window; tests pass explicit timeouts) -) -> CallbackResult: - """Bind the fixed loopback port, capture one OAuth callback, return its token. +@dataclass +class CallbackCapture: + """A loopback callback server that is already bound and serving. - Only a callback to the registered path that carries a `token` is accepted; any - other request (a different path, or no token) gets a 4xx and the server keeps - waiting, so a stray request can't end the capture early. Returns a - CallbackResult; `error="timeout"` if no matching callback arrives in time. + Splitting the bind (`start_capture`) from the blocking wait lets the login flow + fail on a taken port *before* it sends the user's browser into the OAuth flow. + `wait()` blocks for one matching callback and always shuts the server down. + """ + + result: CallbackResult + done: threading.Event + server: HTTPServer + thread: threading.Thread + + def wait( + self, + timeout: float = 120.0, # pragma: no mutate (default window; tests pass explicit timeouts) + ) -> CallbackResult: + """Block for one OAuth callback (or the timeout), then shut the server down. + + Returns the CallbackResult; `error="timeout"` if no matching callback + arrived in time. + """ + try: + if not self.done.wait(timeout): + self.result.error = "timeout" + finally: + self.server.shutdown() # stop serve_forever() + self.thread.join(timeout=5) # pragma: no mutate (cleanup grace period only) + self.server.server_close() # close the listening socket (shutdown() leaves it open) + return self.result + + +def start_capture() -> CallbackCapture: + """Bind the fixed loopback port and start serving; the returned capture's + ``wait()`` collects one OAuth callback. + + Raises a clean APIError when the bind fails (port taken) so callers can abort + before opening the browser. Only a callback to the registered path that carries + a `token` is accepted; any other request (a different path, or no token) gets a + 4xx and the server keeps waiting, so a stray request can't end the capture early. """ result = CallbackResult() done = threading.Event() @@ -81,11 +113,11 @@ def log_message(self, format: str, *args: object) -> None: # silence stderr log ) from exc thread = threading.Thread(target=server.serve_forever, daemon=True) thread.start() - try: - if not done.wait(timeout): - result.error = "timeout" - finally: - server.shutdown() # stop serve_forever() - thread.join(timeout=5) - server.server_close() # close the listening socket (shutdown() leaves it open) - return result + return CallbackCapture(result=result, done=done, server=server, thread=thread) + + +def capture_callback( + timeout: float = 120.0, # pragma: no mutate (default window; tests pass explicit timeouts) +) -> CallbackResult: + """Bind the port, capture one OAuth callback, and shut down (one-shot helper).""" + return start_capture().wait(timeout) diff --git a/aai_cli/client.py b/aai_cli/client.py index 7e59d4ea..fe327147 100644 --- a/aai_cli/client.py +++ b/aai_cli/client.py @@ -49,19 +49,36 @@ def resolve_audio_source(source: str | None, *, sample: bool, check_local: bool don't have yet is legitimate. """ if sample: + if source: + # Never silently prefer one over the other: the user asked for both. + raise UsageError( + "An audio source and --sample cannot be combined.", + suggestion="Pass the file/URL or --sample, not both.", + ) return SAMPLE_AUDIO_URL if not source: raise UsageError( "Provide an audio path or URL.", suggestion="Or pass --sample to use the hosted demo file.", ) - if check_local and not source.startswith(("http://", "https://")) and not Path(source).exists(): - raise CLIError( - f"File not found: {source}", - error_type="file_not_found", - exit_code=2, - suggestion="Check the path. For remote audio, pass an http(s):// URL.", - ) + if check_local and not source.startswith(("http://", "https://")): + path = Path(source) + if not path.exists(): + raise CLIError( + f"File not found: {source}", + error_type="file_not_found", + exit_code=2, + suggestion="Check the path. For remote audio, pass an http(s):// URL.", + ) + if not path.is_file(): + # A directory (or socket/FIFO) would otherwise fall through to credential + # resolution and fail much later as an opaque upload error. + raise CLIError( + f"Not a file: {source}", + error_type="not_a_file", + exit_code=2, + suggestion="Pass an audio file, not a directory.", + ) return source @@ -90,17 +107,42 @@ def _sdk_errors(message: str) -> Generator[None]: raise APIError(f"{message}: {exc}") from exc +def _list_transcript_params(limit: int) -> aai.ListTranscriptParameters: + """List-transcripts params that serialize without the spurious ``model_config`` key. + + assemblyai==0.64.4 under pydantic==2.13.4: the SDK's pydantic-v1-shim request model + picks up the v2-style ``model_config`` class attribute as a regular field, so the + ``.dict(exclude_none=True)`` the SDK puts on the query string ships a junk + ``?model_config=...`` param on every request. Null the bogus field out so + ``exclude_none`` drops it from the wire. + """ + params = aai.ListTranscriptParameters(limit=limit) + object.__setattr__(params, "model_config", None) + return params + + +# httpx-backed SDK errors embed a multi-line repr ("…\nReason: …\nRequest: "). +_REQUEST_REPR_RE = re.compile(r"Request: <[^>]*>") + + +def _compact_reason(exc: object) -> str: + """``str(exc)`` as a single clean line: drop the trailing ``Request: <…>`` repr and + collapse all whitespace/newlines, keeping the informative reason text.""" + text = _REQUEST_REPR_RE.sub("", str(exc)) + return re.sub(r"\s+", " ", text).strip() + + def validate_key(api_key: str) -> bool: """True if the key authenticates, False on an auth failure. Raises APIError otherwise.""" _configure(api_key) try: - aai.Transcriber().list_transcripts(aai.ListTranscriptParameters(limit=1)) + aai.Transcriber().list_transcripts(_list_transcript_params(1)) except aai.types.AssemblyAIError as exc: if is_auth_failure(exc): return False - raise APIError(f"Could not validate key: {exc}") from exc + raise APIError(f"Could not validate key: {_compact_reason(exc)}") from exc except Exception as exc: - raise APIError(f"Network error contacting AssemblyAI: {exc}") from exc + raise APIError(f"Network error contacting AssemblyAI: {_compact_reason(exc)}") from exc return True @@ -114,7 +156,7 @@ def _item_to_dict(item: Any) -> dict[str, Any]: def list_transcripts(api_key: str, *, limit: int = 10) -> list[dict[str, object]]: _configure(api_key) with _sdk_errors("Could not list transcripts"): - resp = aai.Transcriber().list_transcripts(aai.ListTranscriptParameters(limit=limit)) + resp = aai.Transcriber().list_transcripts(_list_transcript_params(limit)) return [_item_to_dict(item) for item in resp.transcripts] diff --git a/aai_cli/code_gen/__init__.py b/aai_cli/code_gen/__init__.py index 5ed4cbaa..7758414d 100644 --- a/aai_cli/code_gen/__init__.py +++ b/aai_cli/code_gen/__init__.py @@ -42,11 +42,15 @@ def stream( merged: dict[str, object], *, llm: dict[str, object] | None = None, + source: str | None = None, ) -> str: """Generate runnable Python that reproduces this streaming invocation. - With `llm` (a dict of ``prompts``/``model``/``max_tokens``/``interval``), the script + ``source`` mirrors the CLI argument: ``None`` streams the microphone, ``"-"`` + reads raw PCM16 from stdin, and anything else is a file path/URL decoded through + ffmpeg — so the generated script reads the same input the real run would. With + `llm` (a dict of ``prompts``/``model``/``max_tokens``/``interval``), the script refreshes a prompt-chain over the growing transcript every ``interval`` seconds (0 = every turn) — the live sibling of `transcribe --llm` — mirroring how `stream --llm` runs. """ - return _stream.render(merged, llm=llm) + return _stream.render(merged, llm=llm, source=source) diff --git a/aai_cli/code_gen/stream.py b/aai_cli/code_gen/stream.py index e02bd0fd..fed3a910 100644 --- a/aai_cli/code_gen/stream.py +++ b/aai_cli/code_gen/stream.py @@ -15,7 +15,7 @@ "TurnEvent", ] -_PREAMBLE = """import os +_PREAMBLE = """{stdlib_imports} import assemblyai as aai from assemblyai.streaming.v3 import ( @@ -39,8 +39,7 @@ def on_turn(client: StreamingClient, event: TurnEvent) -> None: client.on(StreamingEvents.Turn, on_turn) """ -_LLM_PREAMBLE = """import os -import time +_LLM_PREAMBLE = """{stdlib_imports} import assemblyai as aai from assemblyai.streaming.v3 import ( @@ -108,9 +107,9 @@ def on_turn(client: StreamingClient, event: TurnEvent) -> None: """ _FOOTER = """ -print("Listening… press Ctrl-C to stop.") +{setup}print({banner}) try: - client.stream(aai.extras.MicrophoneStream(sample_rate={rate})) + client.stream({stream_expr}) finally: client.disconnect(terminate=True) """ @@ -118,14 +117,56 @@ def on_turn(client: StreamingClient, event: TurnEvent) -> None: # Same as _FOOTER, but flushes a closing summary (incl. on Ctrl-C) so the turns since the # last interval tick are reflected before disconnecting. _LLM_FOOTER = """ -print("Listening… press Ctrl-C to stop.") +{setup}print({banner}) try: - client.stream(aai.extras.MicrophoneStream(sample_rate={rate})) + client.stream({stream_expr}) finally: summarize(final=True) client.disconnect(terminate=True) """ +# Source-specific audio plumbing. The v3 client accepts any iterable of PCM16 byte +# chunks, so the non-mic variants define a small generator and stream that instead of +# aai.extras.MicrophoneStream. Both mirror what the CLI itself runs: StdinSource reads +# raw PCM16 off stdin, and FileSource decodes any file/URL through ffmpeg. +_STDIN_SETUP = """ +# Raw PCM16 mono at {rate} Hz piped on stdin, e.g.: +# ffmpeg -i input.mp4 -f s16le -acodec pcm_s16le -ac 1 -ar {rate} - | python script.py +def stdin_chunks(): + chunk_bytes = {rate} * 2 // 10 # ~100 ms of 16-bit mono PCM + while True: + data = sys.stdin.buffer.read(chunk_bytes) + if not data: + return + yield data + + +""" + +_FILE_SETUP = """ +# Decode the source (any local file or http(s) URL ffmpeg can read) to PCM16 mono at +# {rate} Hz and pace it at ~real time — the same pipeline `aai stream ` runs. +def file_chunks(): + chunk_bytes = {rate} * 2 // 10 # ~100 ms of 16-bit mono PCM + ffmpeg = subprocess.Popen( + ["ffmpeg", "-nostdin", "-loglevel", "error", "-i", {source}, + "-f", "s16le", "-acodec", "pcm_s16le", "-ac", "1", "-ar", "{rate}", "-"], + stdout=subprocess.PIPE, + ) + try: + while True: + data = ffmpeg.stdout.read(chunk_bytes) + if not data: + return + yield data + time.sleep(len(data) / ({rate} * 2)) # ~real-time pacing + finally: + ffmpeg.terminate() + ffmpeg.wait() + + +""" + def _imports_block(merged: dict[str, object]) -> str: """Sorted streaming-class import lines; SpeechModel only when a model kwarg is emitted.""" @@ -135,7 +176,7 @@ def _imports_block(merged: dict[str, object]) -> str: return "\n".join(f" {name}," for name in sorted(names)) -def _build_preamble(imports: str, llm: dict[str, object] | None) -> str: +def _build_preamble(imports: str, llm: dict[str, object] | None, stdlib_imports: str) -> str: """Pick and fill the plain vs. LLM-Gateway preamble for the given imports. Hosts come from the active environment, so a sandbox run generates a script @@ -145,6 +186,7 @@ def _build_preamble(imports: str, llm: dict[str, object] | None) -> str: if llm: prompts = "\n".join(f" {p!r}," for p in cast("list[str]", llm["prompts"])) return _LLM_PREAMBLE.format( + stdlib_imports=stdlib_imports, imports=imports, api_host=env.streaming_host, base_url=env.llm_gateway_base, @@ -153,7 +195,9 @@ def _build_preamble(imports: str, llm: dict[str, object] | None) -> str: max_tokens=llm["max_tokens"], interval=llm.get("interval", 0.0), ) - return _PREAMBLE.format(imports=imports, api_host=env.streaming_host) + return _PREAMBLE.format( + stdlib_imports=stdlib_imports, imports=imports, api_host=env.streaming_host + ) def _build_connect(merged: dict[str, object]) -> str: @@ -165,16 +209,61 @@ def _build_connect(merged: dict[str, object]) -> str: return f"client.connect(\n StreamingParameters(\n{kwargs}\n )\n)" -def render(merged: dict[str, object], *, llm: dict[str, object] | None = None) -> str: - """Generate a runnable microphone-streaming script with the given params. +def _source_parts(source: str | None, rate: object) -> tuple[set[str], str, str, str]: + """The (stdlib imports, setup block, banner text, stream expression) for a source. - With `llm`, the script transforms the live transcript through the LLM Gateway, - refreshing a prompt chain on every finalized turn (the live sibling of - `transcribe --llm`). + ``source`` mirrors the CLI argument: ``None`` is the microphone, ``"-"`` is raw + PCM16 on stdin, anything else is a file path or URL decoded through ffmpeg. + """ + if source == "-": + return ( + {"sys"}, + _STDIN_SETUP.format(rate=rate), + f"Reading raw PCM16 mono audio at {rate} Hz from stdin…", + "stdin_chunks()", + ) + if source is not None: + return ( + {"subprocess", "time"}, + _FILE_SETUP.format(rate=rate, source=repr(source)), + f"Streaming {source}…", + "file_chunks()", + ) + return ( + set(), + "", + "Listening… press Ctrl-C to stop.", + (f"aai.extras.MicrophoneStream(sample_rate={rate})"), + ) + + +def render( + merged: dict[str, object], + *, + llm: dict[str, object] | None = None, + source: str | None = None, +) -> str: + """Generate a runnable streaming script with the given params. + + ``source`` selects the audio input the script reads, mirroring the CLI run path: + ``None`` captures the microphone, ``"-"`` reads raw PCM16 from stdin, and anything + else is a file path or URL decoded to PCM through ffmpeg (the same pipeline a real + `aai stream ` run uses). With `llm`, the script transforms the live + transcript through the LLM Gateway, refreshing a prompt chain on every finalized + turn (the live sibling of `transcribe --llm`). """ - preamble = _build_preamble(_imports_block(merged), llm) - # Mic capture rate must match StreamingParameters.sample_rate, else audio is corrupt. + # Capture/decode rate must match StreamingParameters.sample_rate, else audio is corrupt. rate = merged.get("sample_rate", 16000) + source_stdlib, setup, banner, stream_expr = _source_parts(source, rate) + stdlib = {"os"} | source_stdlib | ({"time"} if llm else set[str]()) + stdlib_imports = "\n".join(f"import {name}" for name in sorted(stdlib)) + preamble = _build_preamble(_imports_block(merged), llm, stdlib_imports) connect = _build_connect(merged) footer = _LLM_FOOTER if llm else _FOOTER - return preamble + "\n" + connect + "\n" + footer.format(rate=rate) + return ( + preamble + + "\n" + + connect + + "\n" + + footer.format(setup=setup, banner=repr(banner), stream_expr=stream_expr) + ) diff --git a/aai_cli/code_gen/transcribe.py b/aai_cli/code_gen/transcribe.py index a35a8723..8c1ab5fa 100644 --- a/aai_cli/code_gen/transcribe.py +++ b/aai_cli/code_gen/transcribe.py @@ -5,12 +5,27 @@ from aai_cli import environments, llm from aai_cli.code_gen import serialize, snippets +# ``-o/--output`` choice -> printed-result code, mirroring the run path's +# ``client._FIELD_RENDERERS`` semantics: plain fields, the speaker-labeled +# utterances loop, the SRT export endpoint, and the raw ``json_response`` payload. +_OUTPUT_SNIPPETS: dict[str, str] = { + "text": "print(transcript.text)", + "id": "print(transcript.id)", + "status": "print(transcript.status.value)", + "utterances": ( + 'for utt in transcript.utterances or []:\n print(f"Speaker {utt.speaker}: {utt.text}")' + ), + "srt": "print(transcript.export_subtitles_srt())", + "json": "print(json.dumps(transcript.json_response, default=str))", +} + def render( merged: dict[str, object], source: str, *, llm_gateway: dict[str, object] | None = None, + output: str | None = None, ) -> str: """Generate a runnable transcribe script reproducing this CLI invocation. @@ -18,21 +33,32 @@ def render( script transforms the transcript through AssemblyAI's LLM Gateway and prints that result instead of the analysis sections — mirroring how `--llm-gateway-prompt` replaces the normal output. + + When `output` (a ``-o/--output`` field name) is given, the script prints that one + field instead — and, as in the real command, it takes precedence over the LLM chain + and the analysis sections. """ - if merged: - kwargs = "\n".join(serialize.config_kwarg_lines(merged, indent=4)) - config_block = f"config = aai.TranscriptionConfig(\n{kwargs}\n)" - call = f"transcript = transcriber.transcribe({source!r}, config=config)" - else: - config_block = "" - call = f"transcript = transcriber.transcribe({source!r})" + if output is not None: + llm_gateway = None # `-o` returns before the chain runs in the real command + parts = ( + _header_block(llm_gateway, output) + + _transcribe_block(merged, source) + + _result_block(merged, llm_gateway, output) + ) + parts.append("") + return "\n".join(parts) + +def _header_block(llm_gateway: dict[str, object] | None, output: str | None) -> list[str]: + """Imports plus the api-key (and non-default environment) settings lines.""" + stdlib_imports = ["import os"] + if output == "json": + stdlib_imports.insert(0, "import json") imports = ["import assemblyai as aai"] if llm_gateway: imports.append("from openai import OpenAI") - parts = [ - "import os", + *stdlib_imports, "", *imports, "", @@ -44,13 +70,20 @@ def render( env = environments.active() if env.api_base != environments.get(environments.DEFAULT_ENV).api_base: parts.append(f"aai.settings.base_url = {env.api_base!r}") - parts += [ - "", - "transcriber = aai.Transcriber()", - ] - if config_block: - parts += ["", config_block] - parts += [ + return parts + + +def _transcribe_block(merged: dict[str, object], source: str) -> list[str]: + """The transcriber setup, optional config, the transcribe call, and error check.""" + parts = ["", "transcriber = aai.Transcriber()"] + if merged: + kwargs = "\n".join(serialize.config_kwarg_lines(merged, indent=4)) + parts += ["", f"config = aai.TranscriptionConfig(\n{kwargs}\n)"] + call = f"transcript = transcriber.transcribe({source!r}, config=config)" + else: + call = f"transcript = transcriber.transcribe({source!r})" + return [ + *parts, "", call, "", @@ -59,13 +92,17 @@ def render( "", ] - if llm_gateway: - parts += _llm_gateway_block(llm_gateway) - else: - parts.append(snippets.result_handling(merged)) - parts.append("") - return "\n".join(parts) +def _result_block( + merged: dict[str, object], llm_gateway: dict[str, object] | None, output: str | None +) -> list[str]: + """The printed-result lines: one ``-o`` field, the LLM chain, or the analysis sections.""" + if output is not None: + # Unknown names fall back to the plain text, like select_transcript_field does. + return [_OUTPUT_SNIPPETS.get(output, _OUTPUT_SNIPPETS["text"])] + if llm_gateway: + return _llm_gateway_block(llm_gateway) + return [snippets.result_handling(merged)] def _llm_gateway_block(llm_gateway: dict[str, object]) -> list[str]: diff --git a/aai_cli/commands/account.py b/aai_cli/commands/account.py index 445929eb..c507fe3a 100644 --- a/aai_cli/commands/account.py +++ b/aai_cli/commands/account.py @@ -156,10 +156,12 @@ def usage( """Show usage over a date range (defaults to the last 30 days).""" def body(state: AppState, json_mode: bool) -> None: - _, jwt = resolve_session(state) + # Parse/validate the date flags before any session resolution or network + # work, so a bad --start/--end is a fast usage error even when not logged in. today = datetime.now(UTC).date() start_date = _utc_day_start(start or (today - timedelta(days=30)).isoformat()) end_date = _utc_day_start(end or today.isoformat()) + _, jwt = resolve_session(state) data = ams.get_usage(jwt, start_date, end_date, window) def render(d: dict[str, object]) -> object: diff --git a/aai_cli/commands/agent.py b/aai_cli/commands/agent.py index e6240f97..a7cd7d3e 100644 --- a/aai_cli/commands/agent.py +++ b/aai_cli/commands/agent.py @@ -19,6 +19,7 @@ from aai_cli.context import AppState, run_command from aai_cli.errors import CLIError, UsageError from aai_cli.help_text import examples_epilog +from aai_cli.streaming.session import validate_output_flags from aai_cli.streaming.sources import FileSource app = typer.Typer() @@ -56,7 +57,8 @@ def _open_audio( # One full-duplex stream for mic + speaker: macOS rejects two separate # streams on a device, which silently kills capture. duplex = DuplexAudio(target_rate=SAMPLE_RATE, device=device) - # notice() self-suppresses in JSON mode and routes to stderr in text mode. + # notice() self-suppresses in JSON mode and routes to stderr otherwise, so a + # piped `aai agent | …` never reads this advisory as transcript data. renderer.notice( "Use headphones — the mic stays open while the agent speaks, " "so speakers would let it hear itself.\n" @@ -131,6 +133,7 @@ def agent( raise typer.Exit(code=0) def body(state: AppState, json_mode: bool) -> None: + validate_output_flags(json_mode=json_mode, output_field=output_field) text_mode, json_mode = output.stream_output_modes(output_field, json_mode=json_mode) if voice not in VOICES: raise UsageError( @@ -142,6 +145,16 @@ def body(state: AppState, json_mode: bool) -> None: if show_code: # Print-only: emit the equivalent agent script from the flags and exit # without authenticating or opening audio. Raw stdout for `> script.py`. + if source or sample: + # A faithful file-driven agent script would need the CLI's whole + # ffmpeg-decode + ready-gate + exit-after-reply machinery, which is + # impractical to inline; the snippet is microphone-driven, so say so + # on stderr instead of silently dropping the source. stderr keeps + # `--show-code > script.py` byte-clean. + output.error_console.print( + "[aai.warn]Note:[/aai.warn] the generated script uses the microphone; " + "it does not stream the audio source you passed." + ) output.print_code(code_gen.agent(voice, system_prompt_text, greeting)) return diff --git a/aai_cli/commands/deploy.py b/aai_cli/commands/deploy.py index f37a7ff3..487e20a6 100644 --- a/aai_cli/commands/deploy.py +++ b/aai_cli/commands/deploy.py @@ -3,6 +3,7 @@ import shutil import subprocess +import sys from dataclasses import dataclass from pathlib import Path @@ -10,8 +11,9 @@ from aai_cli import help_panels, output from aai_cli.context import AppState, run_command -from aai_cli.errors import CLIError +from aai_cli.errors import CLIError, UsageError from aai_cli.help_text import examples_epilog +from aai_cli.init import procfile # Flattened single-command sub-typer (same pattern as `aai dev`). app = typer.Typer() @@ -22,10 +24,11 @@ class Target: name: str # human label, e.g. "Vercel" bin: str # executable resolved via shutil.which flag: str # CLI selector, e.g. "--vercel" - install: str # full hint sentence shown when the CLI is missing + install: str # hint sentence shown when the CLI is missing (everywhere, or macOS-only) deploy_args: tuple[str, ...] # subcommand(s) appended after `bin` supports_prod: bool = False # whether `--prod` adds a production flag post_deploy_args: tuple[str, ...] | None = None # command run after a successful deploy + install_non_darwin: str | None = None # hint off-macOS, when `install` is brew-specific def command(self, *, prod: bool) -> list[str]: argv = [self.bin, *self.deploy_args] @@ -54,7 +57,9 @@ def command(self, *, prod: bool) -> list[str]: name="Fly", bin="fly", flag="--fly", + # brew is macOS-specific; elsewhere point at the official install docs. install="Install it with `brew install flyctl`.", + install_non_darwin="Install it: https://fly.io/docs/flyctl/install/", # `fly launch` does it all: creates the app, generates fly.toml (detecting the # shipped Dockerfile), and deploys — so no fly.toml needs to exist beforehand. deploy_args=("launch",), @@ -74,10 +79,17 @@ def _resolve_target(selected: list[Target]) -> Target: return selected[0] if selected else VERCEL # Vercel is the default +def _install_hint(target: Target) -> str: + """The platform-appropriate install hint: brew on macOS, docs URL elsewhere.""" + if target.install_non_darwin is not None and sys.platform != "darwin": + return target.install_non_darwin + return target.install + + def _require_cli(target: Target) -> None: if shutil.which(target.bin) is None: raise CLIError( - f"The {target.name} CLI is required to deploy. {target.install}", + f"The {target.name} CLI is required to deploy. {_install_hint(target)}", error_type="missing_dependency", exit_code=1, ) @@ -101,6 +113,14 @@ def _confirmed(target: Target, *, assume_yes: bool) -> bool: def run_deploy(*, target: Target, prod: bool, assume_yes: bool) -> None: """Confirm, then run the target's deploy command in the current directory.""" + if prod and not target.supports_prod: + raise UsageError( + "--prod is only supported for Vercel deploys.", + suggestion=f"Drop --prod, or drop {target.flag} to deploy to Vercel.", + ) + # Same not-a-project guard as `aai dev`/`aai share`, checked before CLI presence + # so an empty directory says "run `aai init`", not "install the Vercel CLI". + procfile.require_procfile(Path.cwd()) _require_cli(target) if not _confirmed(target, assume_yes=assume_yes): output.console.print("Aborted.") diff --git a/aai_cli/commands/dev.py b/aai_cli/commands/dev.py index 0c5cb2e9..8e2861bb 100644 --- a/aai_cli/commands/dev.py +++ b/aai_cli/commands/dev.py @@ -17,7 +17,7 @@ app = typer.Typer() -def run_dev(*, port: int, no_install: bool, no_open: bool, json_mode: bool) -> None: +def run_dev(*, port: int, host: str, no_install: bool, no_open: bool, json_mode: bool) -> None: """Boot the project's Procfile `web:` process locally, with live reload.""" target = Path.cwd() use_uv = runner.has_uv() @@ -34,8 +34,11 @@ def run_dev(*, port: int, no_install: bool, no_open: bool, json_mode: bool) -> N if any(s["status"] == "failed" for s in report): raise typer.Exit(code=1) - command = devserver.dev_command(target, web, use_uv=use_uv) - url = f"http://localhost:{chosen_port}" + command = devserver.dev_command(target, web, use_uv=use_uv, host=host) + # The printed URL reflects the actual bind: "localhost" for the loopback + # default, the literal host for an explicit --host. + url_host = "localhost" if host == devserver.LOCAL_HOST else host + url = f"http://{url_host}:{chosen_port}" if not json_mode: output.console.print( f"[aai.heading]Starting[/aai.heading] [aai.url]{escape(url)}[/aai.url]" @@ -62,6 +65,11 @@ def run_dev(*, port: int, no_install: bool, no_open: bool, json_mode: bool) -> N def dev( ctx: typer.Context, port: int = typer.Option(3000, "--port", help="Local server port."), + host: str = typer.Option( + devserver.LOCAL_HOST, + "--host", + help="Interface to bind. Loopback by default; pass 0.0.0.0 to expose on your network.", + ), no_open: bool = typer.Option(False, "--no-open", help="Launch, but don't open the browser."), no_install: bool = typer.Option( False, "--no-install", help="Skip dependency install; launch directly." @@ -75,6 +83,6 @@ def dev( """ def body(_state: AppState, json_mode: bool) -> None: - run_dev(port=port, no_install=no_install, no_open=no_open, json_mode=json_mode) + run_dev(port=port, host=host, no_install=no_install, no_open=no_open, json_mode=json_mode) run_command(ctx, body, json=json_out) diff --git a/aai_cli/commands/doctor.py b/aai_cli/commands/doctor.py index b6fe12e9..e4839612 100644 --- a/aai_cli/commands/doctor.py +++ b/aai_cli/commands/doctor.py @@ -3,12 +3,12 @@ import shutil import sys from collections.abc import Mapping, Sequence -from typing import Protocol, TypedDict +from typing import NotRequired, Protocol, TypedDict import typer from rich.markup import escape -from aai_cli import client, config, help_panels, options, output, theme +from aai_cli import client, config, environments, help_panels, options, output, theme from aai_cli.context import AppState, resolve_profile, run_command from aai_cli.errors import CLIError, NotAuthenticated from aai_cli.help_text import examples_epilog @@ -28,6 +28,11 @@ class Check(TypedDict): class DoctorResult(TypedDict): ok: bool + # Which profile/environment the checks ran against. `aai doctor` always fills + # these in; the onboarding wizard reuses `render` for a partial check without + # them, so they stay optional. + profile: NotRequired[str] + environment: NotRequired[str] checks: list[Check] @@ -82,7 +87,8 @@ def _check_api_key(profile: str) -> Check: affects=["everything"], ) # validate_key doubles as the connectivity probe: it makes one cheap authed call, - # so a pass means the key is valid AND api.assemblyai.com is reachable. + # so a pass means the key is valid AND the active environment's API is reachable. + api_host = environments.active().api_base.removeprefix("https://") try: valid = client.validate_key(key) except CLIError as exc: @@ -90,7 +96,7 @@ def _check_api_key(profile: str) -> Check: "api-key", "fail", f"Could not reach AssemblyAI: {exc.message}", - fix="Check your network/proxy and that api.assemblyai.com is reachable.", + fix=f"Check your network/proxy and that {api_host} is reachable.", affects=["everything"], ) if valid: @@ -197,6 +203,11 @@ def _check_coding_agent() -> Check: def render(data: DoctorResult) -> str: checks = data["checks"] lines = [output.heading("Environment check")] + profile, environment = data.get("profile"), data.get("environment") + if profile is not None and environment is not None: + lines.append( + " " + output.hint(f"profile: {escape(profile)} · environment: {escape(environment)}") + ) for c in checks: symbol, style = _SYMBOL.get(c["status"], (theme.SYMBOL_HINT, "aai.muted")) lines.append( @@ -238,7 +249,12 @@ def body(state: AppState, json_mode: bool) -> None: _check_coding_agent(), ] ok = not any(c["status"] == "fail" for c in checks) - payload: DoctorResult = {"ok": ok, "checks": checks} + payload: DoctorResult = { + "ok": ok, + "profile": profile, + "environment": environments.active().name, + "checks": checks, + } output.emit(payload, render, json_mode=json_mode) if not ok: raise typer.Exit(code=1) diff --git a/aai_cli/commands/init.py b/aai_cli/commands/init.py index 11297cd3..1d58b285 100644 --- a/aai_cli/commands/init.py +++ b/aai_cli/commands/init.py @@ -182,8 +182,9 @@ def run_init( running dev server — it stops after install and leaves the run command as a hint. """ if not json_mode: - # Vercel-style banner at the top of the run. - output.console.print( + # Vercel-style banner at the top of the run. Decoration goes to stderr (data → + # stdout): it must never pollute a piped stdout, even on an error path. + output.error_console.print( f"[aai.heading]AssemblyAI CLI[/aai.heading] [aai.muted]{__version__}[/aai.muted]" ) chosen = _resolve_template(template) @@ -243,7 +244,13 @@ def run_init( def init( ctx: typer.Context, template: str | None = typer.Argument( - None, help="Template to scaffold (omit to pick interactively)." + None, + # Enumerate the registry so the help text can never drift from the templates + # that actually ship. + help=( + f"Template to scaffold: {', '.join(templates.TEMPLATE_ORDER)} " + "(omit to pick interactively)." + ), ), directory: str | None = typer.Argument(None, help="Target directory (default: