Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion aai_cli/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
- **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, per-sentence TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`); under `-v` (`debuglog.active()`) `brain._run_graph` *streams* that graph instead of `invoke`-ing it and logs each tool call/result/interim line as it lands (reusing `code_agent.events.message_events`), so a spoken turn that stalls mid-tool is debuggable — plain `invoke` runs the whole loop internally and `-v` would otherwise show only the httpx lines.
- **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless.
- **`code_agent/`** + `commands/code/` — `assembly code`: a terminal coding agent (a bespoke port of langchain-ai/deepagents' `code` agent) that talks **only** to the LLM Gateway. `model.py` pins the model to `ChatOpenAI` against `llm_gateway_base`; `agent.py` builds the deepagents graph over a cwd-scoped `LocalShellBackend` (filesystem + shell tools), plus extra tools: the custom `assembly` CLI tool (`cli_tool.py`, runs `python -m aai_cli` with the key via child env, never argv), a URL `fetch_url` tool (`fetch_tool.py`), Tavily web search when `TAVILY_API_KEY` is set (`web_search.py`), an `ask_user` tool routed through an `AskBridge` to the front-end (`ask_tool.py`), and best-effort docs MCP tools (`docs_mcp.py`). Middleware adds installed skills (`skills.py`) and long-term memory (`memory.py`), each over its own dedicated backend. Sessions persist via a SQLite checkpointer (`store.py`) keyed by `--session`, so conversations resume. Approval gates the mutating tools (write/edit/execute/`assembly`/`fetch_url`); the general-purpose `task` subagent comes from deepagents by default. `session.py` drives the graph turn-by-turn (interrupt/resume = human approval), emitting framework-agnostic `events.py` to either the Textual TUI (`tui.py`, modeled on deepagents-code: transcript + input + approval/ask modals + clipboard copy) or the Rich fallback (`render.py`). The whole orchestration is tested by driving the **real** graph with a fake `BaseChatModel` (`tests/test_code_agent.py`), so no network/TTY is needed. **Voice is the default front-end in an interactive TTY** (`voice.py` + `_exec._run_voice`): `VoiceSession.listen` captures one spoken turn over Streaming STT (gating the mic shut the instant a turn finalizes) and `VoiceSession.speak` reads each assistant reply back over streaming TTS. It runs the **Rich REPL** loop (not the keyboard TUI) with a voice `read_line` + a reply-speaking sink. Readback needs streaming TTS, so it's **sandbox-only** (`tts.session.is_available`); in production the mic input still works and replies stay on screen. A mic-less box degrades to typed input on the first `AUDIO_ERROR_TYPES` `CLIError`; `--no-voice` selects the TUI, and a non-TTY (pipe/CI) the headless loop. Both legs (STT/TTS) are injected like the cascade's, so `tests/test_code_voice.py` drives it with fakes — no mic/speaker/socket.
- **`code_agent/`** + `commands/code/` — `assembly code`: a terminal coding agent (a bespoke port of langchain-ai/deepagents' `code` agent) that talks **only** to the LLM Gateway. `model.py` pins the model to `ChatOpenAI` against `llm_gateway_base`; `agent.py` builds the deepagents graph over a cwd-scoped `LocalShellBackend` (filesystem + shell tools), plus extra tools: the custom `assembly` CLI tool (`cli_tool.py`, runs `python -m aai_cli` with the key via child env, never argv), a URL `fetch_url` tool (`fetch_tool.py`), Firecrawl web search when `FIRECRAWL_API_KEY` is set (`firecrawl_search.py`, shared with the live voice agent), an `ask_user` tool routed through an `AskBridge` to the front-end (`ask_tool.py`), and best-effort docs MCP tools (`docs_mcp.py`). Middleware adds installed skills (`skills.py`) and long-term memory (`memory.py`), each over its own dedicated backend. Sessions persist via a SQLite checkpointer (`store.py`) keyed by `--session`, so conversations resume. Approval gates the mutating tools (write/edit/execute/`assembly`/`fetch_url`); the general-purpose `task` subagent comes from deepagents by default. `session.py` drives the graph turn-by-turn (interrupt/resume = human approval), emitting framework-agnostic `events.py` to either the Textual TUI (`tui.py`, modeled on deepagents-code: transcript + input + approval/ask modals + clipboard copy) or the Rich fallback (`render.py`). The whole orchestration is tested by driving the **real** graph with a fake `BaseChatModel` (`tests/test_code_agent.py`), so no network/TTY is needed. **Voice is the default front-end in an interactive TTY** (`voice.py` + `_exec._run_voice`): `VoiceSession.listen` captures one spoken turn over Streaming STT (gating the mic shut the instant a turn finalizes) and `VoiceSession.speak` reads each assistant reply back over streaming TTS. It runs the **Rich REPL** loop (not the keyboard TUI) with a voice `read_line` + a reply-speaking sink. Readback needs streaming TTS, so it's **sandbox-only** (`tts.session.is_available`); in production the mic input still works and replies stay on screen. A mic-less box degrades to typed input on the first `AUDIO_ERROR_TYPES` `CLIError`; `--no-voice` selects the TUI, and a non-TTY (pipe/CI) the headless loop. Both legs (STT/TTS) are injected like the cascade's, so `tests/test_code_voice.py` drives it with fakes — no mic/speaker/socket.
- **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`).
- **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps.
- **`init/`** — scaffolds a self-contained FastAPI + HTML starter (`audio-transcription`/`live-captions`/`voice-agent` templates), optionally installs deps and opens the browser; writes the key to a git-ignored `.env`.
Expand Down
2 changes: 1 addition & 1 deletion aai_cli/code_agent/fetch_tool.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""A URL-fetch tool for the coding agent (deepagents-code parity).

Distinct from web *search* (Tavily): this fetches a specific URL the agent already
Distinct from web *search* (Firecrawl): this fetches a specific URL the agent already
knows and returns its text. It is approval-gated (see ``MUTATING_TOOLS``) because an
arbitrary fetch can reach internal/SSRF targets, so the user confirms each one.
"""
Expand Down
6 changes: 3 additions & 3 deletions aai_cli/code_agent/firecrawl_search.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
"""Optional Firecrawl web search for the live voice agent.
"""Optional Firecrawl web search for the coding and live voice agents.

Firecrawl grounds the agent with live web search, enabled when a ``FIRECRAWL_API_KEY``
is present in the environment. Search is read-only, so it is *not* gated behind the
approval flow. With no key set we simply omit the tool (the agent still has its URL
fetch and the AssemblyAI docs MCP), rather than erroring.

This mirrors ``web_search.py`` (Tavily) but reuses Firecrawl's official LangChain
integration; the live agent prefers it as its default search tool.
Both ``assembly code`` (approval-gated, opt-out via ``--no-web``) and the live voice
agent share this single search tool via Firecrawl's official LangChain integration.
"""

from __future__ import annotations
Expand Down
37 changes: 0 additions & 37 deletions aai_cli/code_agent/web_search.py

This file was deleted.

2 changes: 1 addition & 1 deletion aai_cli/commands/code/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def code(
True, "--skills/--no-skills", help="Load installed agent skills (e.g. the assemblyai skill)"
),
web: bool = typer.Option(
True, "--web/--no-web", help="Enable Tavily web search when TAVILY_API_KEY is set"
True, "--web/--no-web", help="Enable Firecrawl web search when FIRECRAWL_API_KEY is set"
),
memory: bool = typer.Option(
True, "--memory/--no-memory", help="Load and persist the agent's long-term memory"
Expand Down
9 changes: 5 additions & 4 deletions aai_cli/commands/code/_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from aai_cli.code_agent.docs_mcp import load_docs_tools
from aai_cli.code_agent.events import AssistantText, Event
from aai_cli.code_agent.fetch_tool import build_fetch_tool
from aai_cli.code_agent.firecrawl_search import FIRECRAWL_API_KEY_ENV, build_web_search_tool
from aai_cli.code_agent.memory import build_memory_middleware
from aai_cli.code_agent.model import build_model
from aai_cli.code_agent.prompt import DEFAULT_MODEL
Expand All @@ -38,7 +39,6 @@
build_voice_session,
spoken_summary,
)
from aai_cli.code_agent.web_search import TAVILY_API_KEY_ENV, build_web_search_tool
from aai_cli.core import env, errors, stdio
from aai_cli.ui import output

Expand Down Expand Up @@ -136,10 +136,11 @@ def _read_line() -> str | None:


def _web_note(opts: CodeOptions) -> str | None:
"""The "web search disabled" notice when --web is on but no Tavily key is set."""
if opts.web and not env.get(TAVILY_API_KEY_ENV):
"""The "web search disabled" notice when --web is on but no Firecrawl key is set."""
if opts.web and not env.get(FIRECRAWL_API_KEY_ENV):
return (
"TAVILY_API_KEY is not set, so web search is disabled. Get a key at https://tavily.com"
"FIRECRAWL_API_KEY is not set, so web search is disabled. "
"Get a key at https://firecrawl.dev"
)
return None

Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ dependencies = [
"langchain-core>=1.4.7",
"langchain-mcp-adapters>=0.3.0",
"textual>=8.2.7",
"langchain-tavily>=0.2.18",
"langgraph-checkpoint-sqlite>=3.1.0",
"pyperclip>=1.11.0",
"langchain-text-splitters>=1.0.0",
Expand Down
4 changes: 2 additions & 2 deletions tests/__snapshots__/test_snapshots_help_run.ambr
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,8 @@
│ --skills --no-skills Load installed agent skills (e.g. │
│ the assemblyai skill) │
│ [default: skills] │
│ --web --no-web Enable Tavily web search when
TAVILY_API_KEY is set
│ --web --no-web Enable Firecrawl web search when │
FIRECRAWL_API_KEY is set │
│ [default: web] │
│ --memory --no-memory Load and persist the agent's │
│ long-term memory │
Expand Down
12 changes: 6 additions & 6 deletions tests/test_code_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
docs_mcp,
events,
fetch_tool,
firecrawl_search,
memory,
skills,
store,
web_search,
)
from aai_cli.code_agent import model as model_mod
from aai_cli.code_agent.agent import MUTATING_TOOLS, build_agent
Expand Down Expand Up @@ -225,12 +225,12 @@ def test_skills_middleware_present_and_absent(tmp_path: Path) -> None:


def test_web_search_tool_gated_on_api_key(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv("TAVILY_API_KEY", raising=False)
assert web_search.build_web_search_tool() is None
monkeypatch.delenv("FIRECRAWL_API_KEY", raising=False)
assert firecrawl_search.build_web_search_tool() is None

monkeypatch.setenv("TAVILY_API_KEY", "tvly-key")
tool = web_search.build_web_search_tool()
assert tool is not None and tool.name == "tavily_search"
monkeypatch.setenv("FIRECRAWL_API_KEY", "fc-key")
tool = firecrawl_search.build_web_search_tool()
assert tool is not None and tool.name == "firecrawl_search"


def test_message_events_coerces_list_content() -> None:
Expand Down
4 changes: 2 additions & 2 deletions tests/test_code_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,10 +192,10 @@ def test_build_agent_wires_model_tools_and_checkpointer(monkeypatch):


def test_web_note_only_without_key(monkeypatch):
monkeypatch.delenv("TAVILY_API_KEY", raising=False)
monkeypatch.delenv("FIRECRAWL_API_KEY", raising=False)
assert _exec._web_note(_opts(web=True)) is not None
assert _exec._web_note(_opts(web=False)) is None
monkeypatch.setenv("TAVILY_API_KEY", "tvly")
monkeypatch.setenv("FIRECRAWL_API_KEY", "fc-x")
assert _exec._web_note(_opts(web=True)) is None


Expand Down
17 changes: 0 additions & 17 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading