assembly code/live: gateway tool-calling resilience, voice interruptibility, gpt-5.1 default#251
Merged
Conversation
… empty tool-args
Two `assembly code` fixes uncovered while building a voice agent:
1. read_skill tool. The skills middleware loads skills from its own backend
rooted at ~/.claude/skills, but deepagents' stock prompt tells the model to
open each SKILL.md with `read_file` — which is bound to the cwd sandbox and
can't reach them, so the model got `File '/aai-cli/SKILL.md' not found`. Add
a read-only `read_skill` tool bound to the skills directory (with a traversal
guard) and a prompt that points the model at it. build_skills() now returns
the (middleware, tool) pair, wired together in _build_agent.
2. Empty tool-call arguments. The LLM Gateway maps OpenAI `arguments` onto
Anthropic `tool_use.input` but drops `input` entirely when arguments are
empty (""/"{}"), which Anthropic rejects (400, surfaced as 500 when
streaming) — and because the failing call sits in history, every later turn
fails too, wedging the session. _ensure_tool_call_arguments substitutes a
minimal non-empty placeholder in the outgoing payload so the gateway emits a
valid input. Request-only; the tool already ran locally with its real args.
(Reported upstream for a server-side fix; this keeps the CLI resilient.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elta
Every streaming turn (when tools are available) begins with an empty tool-call
delta — {"function": {"id": "", "name": "", "arguments": ""}}. On a pure-text
turn (e.g. the agent asking clarifying questions) no real tool call follows, so
langchain is left with a tool call whose name is "", deepagents dispatches it,
and the turn dies with `Error: is not a valid tool`.
Extend the streaming normalizer to drop any tool-call delta with no name, id, or
arguments before langchain converts the chunk (this also harmlessly drops the
gateway's empty argument-continuation deltas). A real text+tool turn still yields
exactly one correct tool call; a pure-text turn yields none.
Reported upstream for a server-side fix; this keeps the CLI resilient meanwhile.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The coding-agent TUI plays each spoken reply on a daemon thread. Its two cancel channels both failed there: PcmPlayer chunks writes "so a Ctrl-C lands between them", but that relies on KeyboardInterrupt reaching the *playing* thread — true for the foreground `assembly speak` CLI, not the TUI, where Ctrl-C is handled by Textual on the UI thread. The only cross-thread signal, the `_cancel` event, was checked solely between synthesizer chunks (the feed wrapper), never during sounddevice's blocking playback. So readback was effectively uninterruptible: Ctrl-C did nothing, and the daemon thread stayed blocked in speak() instead of advancing to listen for the next turn. Poll the cancel flag inside PcmPlayer's piece-write loop (abort the device and drop the rest of the chunk when set), and have voice.speak hand the player a live poll of `_cancel`. Cancellation is now honored within ~one 4 KiB piece (~85 ms) regardless of which thread the interrupt arrives on. The poll is optional, so `assembly speak` (foreground, KeyboardInterrupt-driven) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch the default LLM Gateway model for `assembly code` (code_agent/prompt.py) and `assembly live` (agent_cascade/config.py) to gpt-5.1. The live default is now a literal rather than llm.DEFAULT_MODEL, so it's independent of the one-shot `assembly llm` default (still claude-haiku). Both override with --model. Verified gpt-5.1 is accepted by the gateway. Updated the cascade config test and regenerated the `--help` snapshot (both commands show [default: gpt-5.1]). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the branch gate-clean now that the firecrawl WIP no longer blocks it: - Move the code_agent/model.py unit tests out of test_code_agent.py (which had grown past the 500-line file-length gate) into a new test_code_model.py, and add it to pyrightconfig.tests.json's ignore list alongside the other langchain-boundary test files. - Narrow `delta` to dict in _hoist_in_choice via early returns so mypy accepts the in-place `delta["tool_calls"]` assignment, and isinstance-narrow the chat model in the payload/convert-chunk tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- code_agent/tui.py: guard on_worker_state_changed on is_running — a turn worker can finish after the app tears down (quit / test run_test exit), and driving _finish_turn then queries an unmounted DOM (NoMatches on "#spinner"). Skip it when the app isn't running. Fixes test_code_tui_voice flakiness on Windows. - test_live_tui.py: the reply-text assertion waited only for the AssistantMessage widget to mount, but agent_transcript sets the text via a separate call_from_thread hop — so the wait raced the text on a slow runner. Wait for the text itself. - pyrightconfig.tests.json: ignore test_code_tui_voice.py alongside the other Textual-boundary test files (it now duck-types a Worker.StateChanged event). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
db00550 to
8a84c96
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes uncovered while driving
assembly code(andassembly live) against the LLM Gateway, plus the gpt-5.1 default switch. Each gateway issue is also reported upstream for a server-side fix; the CLI-side changes keep it resilient regardless of timeline, and all are idempotent once the gateway is fixed.Commits
read_skilltool + empty tool-args workaround (57bea22) — the skills middleware loads skills from~/.claude/skillsbut told the model to open them with the cwd-boundread_file; added a dedicatedread_skilltool. Also: the gateway drops Anthropic's requiredtool_use.inputwhen OpenAIargumentsis empty (""/"{}"), 400/500-ing and wedging the session — outgoing empty args now get a placeholder.11b2027) — every streamed turn starts with an empty tool-call delta{"function":{"id":"","name":"","arguments":""}}; on a pure-text turn that became a tool call withname=""(Error: is not a valid tool). Now dropped before langchain sees it.6ff3241) — readback played on a daemon thread where neither Ctrl-C'sKeyboardInterruptnor the between-synth-chunks flag check could stop it; the cancel flag is now polled during playback so Ctrl-C interrupts speaking.assembly codeandassembly liveto gpt-5.1 (8819df6) — both override with--model;assembly llmis unchanged. Verified the gateway acceptsgpt-5.1;--helpsnapshot regenerated.1d81481) — moved the model tests intotest_code_model.py(the original was over the 500-line gate) and narrowed types.WIP: Firecrawl search + live agent MCP tools(c0c8ad2) — @alexkroman's in-progress work, rides along on this branch (now gate-clean).Verification
./scripts/check.sh→ All checks passed — ruff/mypy/pyright/xenon/import-linter, 100% patch coverage, mutation gate (13 mutants), escape-hatch gate, build + twine.The streaming tool-call
idfix from this investigation already landed as #247.🤖 Generated with Claude Code