AssemblyAI · alexkroman · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/.importlinter b/.importlinter
@@ -13,7 +13,7 @@ type = layers
 ; assembles the command layer — main, command_registry, help_panels, options —
 ; stays at the package root, above `commands`, and is intentionally unlisted
 ; (it legitimately imports the command modules to discover/register them).
-; Feature slices (agent, tts, streaming, code_gen, init, auth, onboard) are
+; Feature slices (agent, tts, streaming, code_agent, code_gen, init, auth, onboard) are
 ; likewise unlisted vertical slices governed by contract 2.
 layers =
     commands
@@ -34,6 +34,7 @@ source_modules =
     aai_cli.agent
     aai_cli.agent_cascade
     aai_cli.auth
+    aai_cli.code_agent
     aai_cli.code_gen
     aai_cli.init
     aai_cli.onboard

diff --git a/README.md b/README.md
@@ -51,6 +51,7 @@ That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-ins
 | `assembly agent-cascade` | Same live conversation, but wired client-side from Streaming STT + the LLM Gateway + streaming TTS, like the `agent-cascade` starter (sandbox-only) |
 | `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
 | `assembly llm` | Prompt the LLM Gateway over a transcript, files, stdin, or a live stream |
+| `assembly code` | Terminal coding agent (deepagents SDK) backed only by the LLM Gateway — reads/writes/edits files, runs shell, searches the docs MCP, and can invoke the `assembly` CLI itself; mutating actions ask for approval |
 | `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range (`--video` keeps the picture for URL sources) — clip boundaries snap into nearby silence |
 | `assembly dub` | Re-voice an audio/video file or URL in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) |
 | `assembly caption` | Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched |

diff --git a/aai_cli/AGENTS.md b/aai_cli/AGENTS.md
@@ -44,8 +44,8 @@ contract:
   `help_panels`, `options`. They assemble/define the command layer (and
   `command_registry` imports the command modules to discover them), so they live
   *above* `commands` and stay at the root.
-- **Feature slices** — `agent/`, `tts/`, `streaming/`, `code_gen/`, `init/`,
-  `auth/`, `onboard/`. These are cohesive vertical slices that internally mix
+- **Feature slices** — `agent/`, `tts/`, `streaming/`, `code_agent/`, `code_gen/`,
+  `init/`, `auth/`, `onboard/`. These are cohesive vertical slices that internally mix
   protocol + rendering, so they aren't a single horizontal layer; contract 2
   forbids them from importing `commands`.
 
@@ -153,6 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
 - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
 - **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, per-sentence TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker.
 - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless.
+- **`code_agent/`** + `commands/code/` — `assembly code`: a terminal coding agent (a bespoke port of langchain-ai/deepagents' `code` agent) that talks **only** to the LLM Gateway. `model.py` pins the model to `ChatOpenAI` against `llm_gateway_base`; `agent.py` builds the deepagents graph over a cwd-scoped `LocalShellBackend` (filesystem + shell tools), plus extra tools: the custom `assembly` CLI tool (`cli_tool.py`, runs `python -m aai_cli` with the key via child env, never argv), a URL `fetch_url` tool (`fetch_tool.py`), Tavily web search when `TAVILY_API_KEY` is set (`web_search.py`), an `ask_user` tool routed through an `AskBridge` to the front-end (`ask_tool.py`), and best-effort docs MCP tools (`docs_mcp.py`). Middleware adds installed skills (`skills.py`) and long-term memory (`memory.py`), each over its own dedicated backend. Sessions persist via a SQLite checkpointer (`store.py`) keyed by `--session`, so conversations resume. Approval gates the mutating tools (write/edit/execute/`assembly`/`fetch_url`); the general-purpose `task` subagent comes from deepagents by default. `session.py` drives the graph turn-by-turn (interrupt/resume = human approval), emitting framework-agnostic `events.py` to either the Textual TUI (`tui.py`, modeled on deepagents-code: transcript + input + approval/ask modals + clipboard copy) or the Rich fallback (`render.py`). The whole orchestration is tested by driving the **real** graph with a fake `BaseChatModel` (`tests/test_code_agent.py`), so no network/TTY is needed.
 - **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`).
 - **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps.
 - **`init/`** — scaffolds a self-contained FastAPI + HTML starter (`audio-transcription`/`live-captions`/`voice-agent` templates), optionally installs deps and opens the browser; writes the key to a git-ignored `.env`.

diff --git a/aai_cli/code_agent/__init__.py b/aai_cli/code_agent/__init__.py
@@ -0,0 +1,16 @@
+"""`assembly code` — a terminal coding agent built on the deepagents SDK.
+
+A bespoke port of langchain-ai/deepagents' `code` agent, wired so it **only**
+talks to the AssemblyAI LLM Gateway (an OpenAI-compatible endpoint reached via
+`langchain_openai.ChatOpenAI`; see `model.py`). The agent gets deepagents'
+built-in filesystem + shell tools — rooted at the working directory through a
+`LocalShellBackend` — plus a custom `assembly` tool that invokes this very CLI,
+so it can transcribe/stream/run-LLM as part of a coding task (`cli_tool.py`).
+
+The pieces are split so the orchestration (`session.py`) is unit-tested against
+a fake chat model driving the *real* deepagents graph, with no network: `agent.py`
+builds the graph, `render.py` draws the conversation, and the Typer command in
+`aai_cli/commands/code/` wires the gateway model + real CLI runner in.
+"""
+
+from __future__ import annotations
diff --git a/aai_cli/code_agent/agent.py b/aai_cli/code_agent/agent.py
@@ -0,0 +1,86 @@
+"""Assemble the deepagents graph for `assembly code`.
+
+Wires the gateway model to deepagents' built-in coding toolset (filesystem + shell,
+rooted at the working directory via a `LocalShellBackend`), plus the custom `assembly`
+CLI tool and any MCP/docs tools, the installed-skills middleware, and human-in-the-loop
+approval on the mutating tools. The compiled graph is driven turn-by-turn from
+`session.py`; an `InMemorySaver` checkpointer gives both conversation memory and the
+interrupt/resume the approval flow needs.
+"""
+
+from __future__ import annotations
+
+from collections.abc import Mapping, Sequence
+from pathlib import Path
+from typing import TYPE_CHECKING, Protocol
+
+from aai_cli.code_agent.cli_tool import CLI_TOOL_NAME
+from aai_cli.code_agent.fetch_tool import FETCH_TOOL_NAME
+from aai_cli.code_agent.prompt import build_system_prompt
+
+if TYPE_CHECKING:
+    from langchain.agents.middleware import AgentMiddleware
+    from langchain_core.language_models.chat_models import BaseChatModel
+    from langchain_core.tools import BaseTool
+    from langgraph.checkpoint.base import BaseCheckpointSaver
+
+# The tools whose effects reach outside the model — file writes, edits, arbitrary
+# shell, the AssemblyAI CLI (which can spend account credits), and URL fetches (which
+# can reach internal/SSRF targets). Each is gated behind human approval unless the
+# session opts into --auto.
+MUTATING_TOOLS = ("write_file", "edit_file", "execute", CLI_TOOL_NAME, FETCH_TOOL_NAME)
+
+
+class CompiledAgent(Protocol):
+    """The slice of the compiled langgraph graph the session drives.
+
+    A structural type so we needn't name langgraph's deeply-generic
+    ``CompiledStateGraph`` (and don't drag its type params through our code).
+    """
+
+    def invoke(
+        self, input: object, config: Mapping[str, object] | None = None
+    ) -> dict[str, object]:
+        """Run one step of the graph, returning the updated state (incl. messages)."""
+
+
+def _interrupt_config(*, auto_approve: bool) -> dict[str, bool] | None:
+    """The ``interrupt_on`` map: approve every mutating tool, or ``None`` under --auto."""
+    if auto_approve:
+        return None
+    return dict.fromkeys(MUTATING_TOOLS, True)
+
+
+def build_agent(
+    *,
+    model: BaseChatModel,
+    root_dir: Path,
+    tools: Sequence[BaseTool] = (),
+    middlewares: Sequence[AgentMiddleware] = (),
+    checkpointer: BaseCheckpointSaver | None = None,
+    auto_approve: bool = False,
+) -> CompiledAgent:
+    """Compile the coding agent over ``root_dir`` with ``tools`` and ``middlewares``.
+
+    ``model`` is the only network seam — tests pass a fake chat model so the real
+    deepagents graph (filesystem + shell tools, approval, checkpointing) runs offline.
+    ``checkpointer`` defaults to an in-memory saver (one ephemeral session); the command
+    passes a SQLite saver for persistent, resumable sessions.
+    """
+    from deepagents import create_deep_agent
+    from deepagents.backends import LocalShellBackend
+    from langgraph.checkpoint.memory import InMemorySaver
+
+    # virtual_mode=True maps the model's "/"-rooted paths under root_dir and blocks
+    # traversal escapes, so file ops and shell stay inside the working directory.
+    backend = LocalShellBackend(root_dir=str(root_dir), virtual_mode=True)
+
+    return create_deep_agent(
+        model=model,
+        backend=backend,
+        system_prompt=build_system_prompt(str(root_dir)),
+        tools=list(tools),
+        middleware=list(middlewares),
+        interrupt_on=_interrupt_config(auto_approve=auto_approve),
+        checkpointer=checkpointer if checkpointer is not None else InMemorySaver(),
+    )
diff --git a/aai_cli/code_agent/ask_tool.py b/aai_cli/code_agent/ask_tool.py
@@ -0,0 +1,51 @@
+"""An `ask_user` tool so the agent can ask the user a question mid-task.
+
+deepagents-code ships an AskUser middleware; base deepagents does not, so we add a
+small tool. The actual prompting is injected through an :class:`AskBridge`: the Rich
+REPL reads a line, the Textual TUI pops an input modal, and tests script the answer —
+the tool itself just calls the bridge, so it stays framework-agnostic. It is *not*
+approval-gated (it is itself the user interaction).
+"""
+
+from __future__ import annotations
+
+from collections.abc import Callable
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from langchain_core.tools import BaseTool
+
+ASK_TOOL_NAME = "ask_user"
+
+
+def _unanswered(_question: str) -> str:
+    """Default handler before a front-end registers one: no human is attached."""
+    return "No user is available to answer; proceed with your best judgment."
+
+
+@dataclass
+class AskBridge:
+    """A late-bound seam for asking the user a question.
+
+    The agent (and its tools) are built before the front-end exists, so the tool
+    captures this bridge and the REPL/TUI sets :attr:`handler` once it's running.
+    """
+
+    handler: Callable[[str], str] = field(default=_unanswered)
+
+    def ask(self, question: str) -> str:
+        return self.handler(question)
+
+
+def build_ask_tool(bridge: AskBridge) -> BaseTool:
+    """Wrap an :class:`AskBridge` as the ``ask_user`` tool."""
+    from langchain_core.tools import tool
+
+    @tool(ASK_TOOL_NAME)
+    def ask_user(question: str) -> str:
+        """Ask the user a clarifying question and return their answer. Use when you
+        genuinely need information only the user has before continuing."""
+        return bridge.ask(question)
+
+    return ask_user
diff --git a/aai_cli/code_agent/banner.py b/aai_cli/code_agent/banner.py
@@ -0,0 +1,42 @@
+"""The `assembly code` startup splash — the ASSEMBLY wordmark + a short intro.
+
+Rendered once at session start (in the TUI transcript and the headless REPL). The
+wordmark is the ANSI-Shadow block font; built from a per-letter map so the rows stay
+aligned without hand-editing one giant string. The accent is the AssemblyAI brand blue.
+"""
+
+from __future__ import annotations
+
+from aai_cli.ui import theme
+
+# The wordmark accent — the AssemblyAI brand blue (Cobolt 400), as a hex literal so it
+# renders identically in Rich and Textual without our theme being loaded.
+BRAND_HEX = theme.BRAND
+
+# Intro copy, shared by both front-ends so the wording stays in one place.
+READY_LINE = "Ready to code! What would you like to build?"
+TIP_LINE = "Tip: approve tools as they run, or pass --auto to skip the prompts."
+
+# Each glyph is six rows tall (ANSI-Shadow). Only the letters in "ASSEMBLY" are needed.
+_LETTERS: dict[str, list[str]] = {
+    "A": [" █████╗ ", "██╔══██╗", "███████║", "██╔══██║", "██║  ██║", "╚═╝  ╚═╝"],
+    "S": ["███████╗", "██╔════╝", "███████╗", "╚════██║", "███████║", "╚══════╝"],
+    "E": ["███████╗", "██╔════╝", "█████╗  ", "██╔══╝  ", "███████╗", "╚══════╝"],
+    "M": ["███╗   ███╗", "████╗ ████║", "██╔████╔██║", "██║╚██╔╝██║", "██║ ╚═╝ ██║", "╚═╝     ╚═╝"],
+    "B": ["██████╗ ", "██╔══██╗", "██████╔╝", "██╔══██╗", "██████╔╝", "╚═════╝ "],
+    "L": ["██╗     ", "██║     ", "██║     ", "██║     ", "███████╗", "╚══════╝"],
+    "Y": ["██╗   ██╗", "╚██╗ ██╔╝", " ╚████╔╝ ", "  ╚██╔╝  ", "   ██║   ", "   ╚═╝   "],
+}
+_ROWS = 6
+
+
+def wordmark() -> list[str]:
+    """The six plain rows of the ASSEMBLY block wordmark."""
+    return [" ".join(_LETTERS[ch][row] for ch in "ASSEMBLY") for row in range(_ROWS)]
+
+
+def version() -> str:
+    """The CLI version string (e.g. ``v0.1.19``)."""
+    from aai_cli import __version__
+
+    return f"v{__version__}"
diff --git a/aai_cli/code_agent/cli_tool.py b/aai_cli/code_agent/cli_tool.py
@@ -0,0 +1,87 @@
+"""Expose the AssemblyAI CLI to the agent as a tool.
+
+The agent gets an ``assembly`` tool that runs *this* CLI as a subprocess
+(``python -m aai_cli …``), so a coding task can transcribe a file, run an LLM
+transform, list transcripts, etc. without the model hand-rolling shell quoting.
+
+Secrets never ride argv (the project-wide rule): the resolved API key is injected
+into the child's environment, never appended to the argument list, so it can't leak
+into ``ps`` or the model's own transcript of the command it ran.
+"""
+
+from __future__ import annotations
+
+import subprocess
+import sys
+from collections.abc import Callable
+from typing import TYPE_CHECKING
+
+from aai_cli.core import config, env
+
+if TYPE_CHECKING:
+    from langchain_core.tools import BaseTool
+
+# The tool name the model calls and the approval flow gates on.
+CLI_TOOL_NAME = "assembly"
+
+# Cap captured output so a chatty command can't blow the model's context window.
+_MAX_OUTPUT_CHARS = 20000
+# Backstop so a hung command (e.g. a stuck network call) can't wedge the session.
+_DEFAULT_TIMEOUT = 600
+
+# A runner takes the CLI argument list and returns the combined, formatted output.
+CliRunner = Callable[[list[str]], str]
+
+
+def _truncate(text: str) -> str:
+    """Clip captured output to the context-window budget, marking that we did."""
+    if len(text) <= _MAX_OUTPUT_CHARS:
+        return text
+    return text[:_MAX_OUTPUT_CHARS] + "\n…[output truncated]"
+
+
+def _format_result(proc: subprocess.CompletedProcess[str]) -> str:
+    """Render a finished CLI run as text the model can read: exit code + both streams."""
+    parts = [f"exit code: {proc.returncode}"]
+    if proc.stdout:
+        parts.append(f"stdout:\n{proc.stdout.rstrip()}")
+    if proc.stderr:
+        parts.append(f"stderr:\n{proc.stderr.rstrip()}")
+    return _truncate("\n".join(parts))
+
+
+def run_assembly(args: list[str], *, api_key: str, timeout: float = _DEFAULT_TIMEOUT) -> str:
+    """Run ``assembly <args>`` as a subprocess and return its formatted output.
+
+    Invoked as ``python -m aai_cli`` so it's the very CLI in use, independent of
+    whatever ``assembly`` may (or may not) be on PATH. The key is passed through the
+    environment, never argv.
+    """
+    proc = subprocess.run(
+        [sys.executable, "-m", "aai_cli", *args],
+        capture_output=True,
+        text=True,
+        stdin=subprocess.DEVNULL,
+        env=env.child_env(**{config.ENV_API_KEY: api_key}),
+        timeout=timeout,
+        check=False,
+    )
+    return _format_result(proc)
+
+
+def build_cli_tool(runner: CliRunner) -> BaseTool:
+    """Wrap a :data:`CliRunner` as the ``assembly`` LangChain tool the agent can call.
+
+    The runner is injected so the orchestration is tested without spawning a real
+    subprocess; the command layer passes :func:`run_assembly` bound to the session's key.
+    """
+    from langchain_core.tools import tool
+
+    @tool(CLI_TOOL_NAME)
+    def assembly(arguments: list[str]) -> str:
+        """Run the AssemblyAI CLI. Pass CLI arguments as a list of strings, e.g.
+        ["transcribe", "audio.mp3", "--json"]. Returns the command's exit code and
+        output. Do not include an API key — it is provided via the environment."""
+        return runner(arguments)
+
+    return assembly