Skip to content

assembly code/live: gateway tool-calling resilience, voice interruptibility, gpt-5.1 default#251

Merged
alexkroman merged 6 commits into
mainfrom
code-agent-fixes
Jun 18, 2026
Merged

assembly code/live: gateway tool-calling resilience, voice interruptibility, gpt-5.1 default#251
alexkroman merged 6 commits into
mainfrom
code-agent-fixes

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Fixes uncovered while driving assembly code (and assembly live) against the LLM Gateway, plus the gpt-5.1 default switch. Each gateway issue is also reported upstream for a server-side fix; the CLI-side changes keep it resilient regardless of timeline, and all are idempotent once the gateway is fixed.

Commits

  • read_skill tool + empty tool-args workaround (57bea22) — the skills middleware loads skills from ~/.claude/skills but told the model to open them with the cwd-bound read_file; added a dedicated read_skill tool. Also: the gateway drops Anthropic's required tool_use.input when OpenAI arguments is empty (""/"{}"), 400/500-ing and wedging the session — outgoing empty args now get a placeholder.
  • Drop spurious blank tool-call delta (11b2027) — every streamed turn starts with an empty tool-call delta {"function":{"id":"","name":"","arguments":""}}; on a pure-text turn that became a tool call with name="" (Error: is not a valid tool). Now dropped before langchain sees it.
  • Voice readback interruptible on the daemon thread (6ff3241) — readback played on a daemon thread where neither Ctrl-C's KeyboardInterrupt nor the between-synth-chunks flag check could stop it; the cancel flag is now polled during playback so Ctrl-C interrupts speaking.
  • Default assembly code and assembly live to gpt-5.1 (8819df6) — both override with --model; assembly llm is unchanged. Verified the gateway accepts gpt-5.1; --help snapshot regenerated.
  • Test split + type narrowing (1d81481) — moved the model tests into test_code_model.py (the original was over the 500-line gate) and narrowed types.
  • WIP: Firecrawl search + live agent MCP tools (c0c8ad2) — @alexkroman's in-progress work, rides along on this branch (now gate-clean).

Verification

./scripts/check.shAll checks passed — ruff/mypy/pyright/xenon/import-linter, 100% patch coverage, mutation gate (13 mutants), escape-hatch gate, build + twine.

The streaming tool-call id fix from this investigation already landed as #247.

🤖 Generated with Claude Code

alexkroman-assembly and others added 6 commits June 18, 2026 15:26
… empty tool-args

Two `assembly code` fixes uncovered while building a voice agent:

1. read_skill tool. The skills middleware loads skills from its own backend
   rooted at ~/.claude/skills, but deepagents' stock prompt tells the model to
   open each SKILL.md with `read_file` — which is bound to the cwd sandbox and
   can't reach them, so the model got `File '/aai-cli/SKILL.md' not found`. Add
   a read-only `read_skill` tool bound to the skills directory (with a traversal
   guard) and a prompt that points the model at it. build_skills() now returns
   the (middleware, tool) pair, wired together in _build_agent.

2. Empty tool-call arguments. The LLM Gateway maps OpenAI `arguments` onto
   Anthropic `tool_use.input` but drops `input` entirely when arguments are
   empty (""/"{}"), which Anthropic rejects (400, surfaced as 500 when
   streaming) — and because the failing call sits in history, every later turn
   fails too, wedging the session. _ensure_tool_call_arguments substitutes a
   minimal non-empty placeholder in the outgoing payload so the gateway emits a
   valid input. Request-only; the tool already ran locally with its real args.
   (Reported upstream for a server-side fix; this keeps the CLI resilient.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elta

Every streaming turn (when tools are available) begins with an empty tool-call
delta — {"function": {"id": "", "name": "", "arguments": ""}}. On a pure-text
turn (e.g. the agent asking clarifying questions) no real tool call follows, so
langchain is left with a tool call whose name is "", deepagents dispatches it,
and the turn dies with `Error:  is not a valid tool`.

Extend the streaming normalizer to drop any tool-call delta with no name, id, or
arguments before langchain converts the chunk (this also harmlessly drops the
gateway's empty argument-continuation deltas). A real text+tool turn still yields
exactly one correct tool call; a pure-text turn yields none.

Reported upstream for a server-side fix; this keeps the CLI resilient meanwhile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The coding-agent TUI plays each spoken reply on a daemon thread. Its two cancel
channels both failed there: PcmPlayer chunks writes "so a Ctrl-C lands between
them", but that relies on KeyboardInterrupt reaching the *playing* thread — true
for the foreground `assembly speak` CLI, not the TUI, where Ctrl-C is handled by
Textual on the UI thread. The only cross-thread signal, the `_cancel` event, was
checked solely between synthesizer chunks (the feed wrapper), never during
sounddevice's blocking playback. So readback was effectively uninterruptible:
Ctrl-C did nothing, and the daemon thread stayed blocked in speak() instead of
advancing to listen for the next turn.

Poll the cancel flag inside PcmPlayer's piece-write loop (abort the device and
drop the rest of the chunk when set), and have voice.speak hand the player a live
poll of `_cancel`. Cancellation is now honored within ~one 4 KiB piece (~85 ms)
regardless of which thread the interrupt arrives on. The poll is optional, so
`assembly speak` (foreground, KeyboardInterrupt-driven) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch the default LLM Gateway model for `assembly code` (code_agent/prompt.py)
and `assembly live` (agent_cascade/config.py) to gpt-5.1. The live default is now
a literal rather than llm.DEFAULT_MODEL, so it's independent of the one-shot
`assembly llm` default (still claude-haiku). Both override with --model.

Verified gpt-5.1 is accepted by the gateway. Updated the cascade config test and
regenerated the `--help` snapshot (both commands show [default: gpt-5.1]).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the branch gate-clean now that the firecrawl WIP no longer blocks it:
- Move the code_agent/model.py unit tests out of test_code_agent.py (which had
  grown past the 500-line file-length gate) into a new test_code_model.py, and
  add it to pyrightconfig.tests.json's ignore list alongside the other
  langchain-boundary test files.
- Narrow `delta` to dict in _hoist_in_choice via early returns so mypy accepts
  the in-place `delta["tool_calls"]` assignment, and isinstance-narrow the chat
  model in the payload/convert-chunk tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- code_agent/tui.py: guard on_worker_state_changed on is_running — a turn worker
  can finish after the app tears down (quit / test run_test exit), and driving
  _finish_turn then queries an unmounted DOM (NoMatches on "#spinner"). Skip it
  when the app isn't running. Fixes test_code_tui_voice flakiness on Windows.
- test_live_tui.py: the reply-text assertion waited only for the AssistantMessage
  widget to mount, but agent_transcript sets the text via a separate
  call_from_thread hop — so the wait raced the text on a slow runner. Wait for the
  text itself.
- pyrightconfig.tests.json: ignore test_code_tui_voice.py alongside the other
  Textual-boundary test files (it now duck-types a Worker.StateChanged event).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexkroman alexkroman added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit c765bc1 Jun 18, 2026
20 checks passed
@alexkroman alexkroman deleted the code-agent-fixes branch June 18, 2026 22:58
@alexkroman alexkroman restored the code-agent-fixes branch June 18, 2026 23:24
@alexkroman alexkroman deleted the code-agent-fixes branch June 18, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants