Skip to content

Support streaming tool output and deduplication#343

Draft
timvisher-dd wants to merge 33 commits intoxenodium:mainfrom
timvisher-dd:streaming-dedup
Draft

Support streaming tool output and deduplication#343
timvisher-dd wants to merge 33 commits intoxenodium:mainfrom
timvisher-dd:streaming-dedup

Conversation

@timvisher-dd
Copy link
Contributor

@timvisher-dd timvisher-dd commented Feb 26, 2026

Fixes #342

See that issue for the full problem description, root cause analysis, and perf measurements.

Depends on xenodium/acp.el#15 (adds :terminal-capability and :meta-capabilities to acp-make-initialize-request).

Checklist

  • I agree to communicate (PR description and comments) with the author myself (not AI-generated).
  • I've reviewed all code in PR myself and will vouch for its quality.
  • I've read and followed the Contributing guidelines.
  • I've filed a feature request/discussion for a new feature.
  • I've added tests where applicable.
  • I've run M-x checkdoc and M-x byte-compile-file.

Implementation

New files

agent-shell-meta.el — extractors for ACP _meta payloads:

  • agent-shell--meta-lookup — key lookup handling both symbol and string keys in alists.
  • agent-shell--meta-find-tool-response — walks any _meta namespace to find a toolResponse value (used by claude-agent-acp).
  • agent-shell--tool-call-meta-response-text — extracts stdout text from _meta.*.toolResponse in its various shapes (string, alist with stdout key, vector of content blocks).
  • agent-shell--tool-call-terminal-output-data — extracts _meta.terminal_output.data (used by codex-acp for incremental streaming chunks).

agent-shell-streaming.el — streaming tool call update handler:

  • agent-shell--tool-call-normalize-output — strips markdown fences, strips <persisted-output> XML tags (rendering the preview with font-lock-comment-face), and ensures trailing newlines.
  • agent-shell--append-tool-call-output — accumulates streamed output in the state's :tool-calls hash under an :accumulated key per tool call ID.
  • agent-shell--handle-tool-call-update-streaming — the main handler, replacing the inline tool_call_update block in agent-shell.el. Three branches:
    1. Terminal data (_meta.terminal_output.data): normalize the chunk, accumulate it, and immediately append it to the fragment body for live streaming.
    2. Meta response (_meta.*.toolResponse): normalize and accumulate silently (rendered only on final update to avoid duplication).
    3. Final update (status is "completed" or "failed"): render accumulated output (or fall back to content text), log to transcript, clean up permission dialogs, and apply title/label updates.
  • agent-shell--mark-tool-calls-cancelled — marks all in-progress tool calls as cancelled (called from agent-shell-interrupt).

Changes to agent-shell.el

  • (require 'agent-shell-streaming) added.
  • The ~50-line inline tool_call_update rendering block (lines 1290-1346 on main) is replaced by a single call to agent-shell--handle-tool-call-update-streaming. The metadata save (title/description/command/raw-input/diff) remains inline before the handler call.
  • The initialize request now passes :terminal-capability t and :meta-capabilities '((terminal_output . t)) to acp-make-initialize-request.
  • agent-shell-interrupt calls agent-shell--mark-tool-calls-cancelled after sending the cancel notification.
  • shell-maker-define-major-mode call passes 'agent-shell-mode-map (quoted symbol) instead of the bare variable.

Tests

7 new tests in tests/agent-shell-streaming-tests.el:

  • agent-shell--tool-call-meta-response-text-test — extracts text from _meta.claudeCode.toolResponse.stdout.
  • agent-shell--tool-call-normalize-output-test — strips fences and ensures trailing newline.
  • agent-shell--tool-call-normalize-output-persisted-output-test — strips <persisted-output> tags.
  • agent-shell--tool-call-update-writes-output-test — verifies accumulated output is written to the fragment body.
  • agent-shell--tool-call-meta-response-no-duplication-test — meta response text is rendered once, not duplicated with content.
  • agent-shell-initialize-request-meta-capabilities-test — the initialize request includes _meta.terminal_output.
  • agent-shell--tool-call-terminal-output-data-streaming-test — codex-style _meta.terminal_output.data chunks are accumulated and rendered incrementally.

@timvisher-dd timvisher-dd changed the title # Support streaming tool output and deduplication Support streaming tool output and deduplication Feb 26, 2026
@xenodium
Copy link
Owner

I appreciate the contribution, but please be mindful there's a lot being sent my way here, from long issue descriptions to chunky PRs. When I ask folks to file a feature request before sending a PR, I don't mean it to be just a checkbox item so folks send the PR soon after that.

From CONTRIBUTING.org

Before implementing new features, please file a feature request first to discuss the proposal.

There's a span of minutes between the feature request/issue and the PRs submitted for review.

According to commit logs, I see this feature was all built today. Being a larger contribution, please use it for some time before sending it my way. Let it settle for some time. Use it. It takes a lot of concentration for me, energy, and time to parse of all this. I am maintaining this package, so I need to understand the contributions sent my way and that takes a significant effort. Please be mindful.

@timvisher-dd
Copy link
Contributor Author

I appreciate the contribution, but please be mindful there's a lot being sent my way here, from long issue descriptions to chunky PRs. When I ask folks to file a feature request before sending a PR, I don't mean it to be just a checkbox item so folks send the PR soon after that.

From CONTRIBUTING.org

Before implementing new features, please file a feature request first to discuss the proposal.

There's a span of minutes between the feature request/issue and the PRs submitted for review.

So my intent was to be respectful of your wish to use an Issue to discuss the feature and the PR is opened in Draft to indicate that you shouldn't review it yet unless you wish. It's meant to be a both/and offering rather than to indicate that you should feel the need to read both.

If you'd prefer I can keep the draft changeset entirely in my local dev integration branches until you tell me explicitly that I may open a PR. But from my perspective the 'this code might answer this issue if we both agree that the issue is real and make sense' is a useful thing especially when I'm mostly opening the issue because I've fixed it for myself.

According to commit logs, I see this feature was all built today. Being a larger contribution, please use it for some time before sending it my way. Let it settle for some time. Use it.

The commit logs show that just because I rebase and push every time I work to be sure that I'm not in conflict with anything happening on trunk. I assure you that I'm not opening Issues or PRs without using the changes I'm making for at least a few days.

It takes a lot of concentration for me, energy, and time to parse of all this. I am maintaining this package, so I need to understand the contributions sent my way and that takes a significant effort. Please be mindful.

I wonder if there's some way to set a 'max in flight Issues/PRs' setting on the repo? To me I'm opening issues as I see and solve them for myself and I have zero expectation that you'll spend any more time than is reasonable for you on it. If that means my Issues/PRs sit unwatched for months then so be it. That's the nature of open source. Glob knows my own open source projects often will go that long with zero attention paid by me.

That said I'm unsure how better to indicate to you that something's available for you to look at. Like if we had a discord somewhere I'd still be sending you a message 'Hey this thing maybe could use your attention' and then you'd have to decide whether you have any to spare at that time or not. Again it's just kind of the nature of an open source community.

I guess finally if I'm pushing too much into your attention we could go where I just work on my fork and integrate your work and you can check my fork every now and then to see if anything interesting has landed. That's a reasonable stance for you to take as well.

LMK how you'd like to proceed! Really appreciate all the effort you've put into this so far. :)

@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 3 times, most recently from 16a7ad0 to ae7e165 Compare March 3, 2026 20:21
timvisher-dd and others added 22 commits March 12, 2026 14:55
Tests for agent-shell-usage.el covering notification updates,
context usage indicator rendering, number formatting, and a
compaction replay driven by real observed ACP traffic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the global `agent-shell--state` function redefinition with
a `cl-letf` wrapper macro scoped to each test, preventing
order-dependent side effects across the test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 6 tests from the add-terminal-caps branch adapted for
non-terminal streaming/dedup:

- meta-response-text: extract toolResponse from _meta namespaces
- normalize-output trailing newline and persisted-output stripping
- tool-call-update writes output to buffer
- meta-response stdout no-duplication (simplified, no terminal traffic)
- initialize request includes terminal_output meta capability

agent-shell-meta.el is a stub (functions return nil) so the tests
load but fail at assertion time, giving a clean set of failures to
drive the implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace stubs with real implementations of meta-lookup,
meta-find-tool-response, and tool-call-meta-response-text.
Handles three toolResponse shapes: stdout alist, content alist,
and text vector. Omits terminal-specific tool-call-terminal-output-data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Streaming tool call handler that accumulates incremental output
from _meta.*.toolResponse and renders it once on the final update,
avoiding duplicate output.

Includes: normalize-output, content-text extraction, chunk
accumulation (append/output-text/clear/markers), streaming handler
with two cond branches (accumulate non-final, render final), and
mark-tool-calls-cancelled.

Uses declare-function for agent-shell.el functions that this
module calls but does not define.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Require agent-shell-streaming alongside other modules
- Replace inline tool_call_update rendering with
  agent-shell--handle-tool-call-update-streaming (preserves
  rich metadata save and emit-event, delegates rendering)
- Add agent-shell--mark-tool-calls-cancelled in agent-shell-interrupt
- Advertise terminal_output meta capability in initialize request
  so agents send toolResponse.stdout streaming updates
- Fix shell-maker-define-major-mode call to pass quoted symbol
  instead of keymap value (prevents void-function keymap error)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port agent-shell--tool-call-terminal-output-data to
agent-shell-meta.el for extracting _meta.terminal_output.data
from updates. Add terminal-data cond branch to the streaming
handler so codex-acp's incremental output streams live to the
buffer.

Verified with perf demo: codex streaming shows
terminal_chunks=10007, matching the source branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test mirroring codex-acp's notification pattern: tool_call with
terminal content, incremental terminal_output.data chunks, then
completed update. Asserts both chunks render and no duplication.

Fix: remove already-has-output guard from terminal-data branch.
Without the full terminal protocol, all terminal chunks arrive via
the streaming handler (not via a separate terminal process), so
each chunk is new incremental data, not a duplicate of stdout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- meta-find-tool-response: remove dead cond branch where
  (listp (cdr entry)) is redundant with (consp (cdr entry))
- tool-call-terminal-output-data: remove redundant string-key
  fallback (meta-lookup already handles symbol-to-string)
- tool-call-clear-output: remove dead :output-last field
- Move codex test after file header so lexical-binding cookie
  is on line 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variable was renamed during conflict resolution but one reference
in the session-resume display code was missed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous fix commits were tailored to the old rebased code.
On clean upstream/main, acp-session (not acp-selected-session) is the
correct variable, and agent-shell--select-session-to-load doesn't
exist — tests should use agent-shell--prompt-select-session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex-acp sends cumulative re-deliveries of thought text after
incremental tokens, causing doubled output in the buffer.
agent-shell--thought-chunk-delta detects prefix/suffix overlap and
emits only the genuinely new tail. Accumulator state resets at each
new thought group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rebase onto upstream/main dropped (require 'agent-shell-streaming)
from agent-shell.el and unquoted the mode-map symbol passed to
shell-maker-define-major-mode.  The missing require caused
void-function errors for streaming handlers; the unquoted symbol
caused (void-function keymap) when agent-shell-mode tried to evaluate
the keymap data structure as code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
agent-shell-ui-update-fragment previously deleted and reinserted the
entire fragment on every :append t call.  This destroyed
save-excursion markers, causing point to jump to the fragment start
on each streaming token.

When appending to a fragment that already has a non-empty body, the
new path inserts directly at the body end using insert-and-inherit
(insert alone does not respect text-property stickiness, so
agent-shell-ui-section would not extend to the new text).  Collapsed
fragments inherit invisible via stickiness; expanded fragments
un-hide old trailing whitespace before inserting and re-hide new
trailing whitespace after.  Falls back to the full rebuild for first
body content, label changes, or non-append replacements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream's xenodium#345 fix renamed :notification to :acp-notification and
added (shell-maker-busy) guards that treat all notifications as stale
when the shell is idle.  Update every test call site and mock
shell-maker-busy to return t so notifications are processed.

Add agent-shell-ui-update-fragment-append-preserves-point-test to
assert that the append-in-place path keeps point stable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs prevented tool call status labels from updating:

1. agent-shell--tool-call-update-overrides called make-diff-info with
   :tool-call instead of :acp-tool-call.  cl-defun rejects unknown
   keywords, so handle-tool-call-final errored before reaching the
   label update — every completed tool call stayed at [wait].

2. The streaming handler only updated labels on final notifications.
   Upstream updates labels on every tool_call_update, so intermediate
   transitions like pending→in_progress ([wait]→[busy]) were missed.

The non-final label rebuild invalidates the output marker used by live
terminal streaming, so reset it afterwards to let ensure-output-marker
recreate it at the correct body-end position.

Also harden test stubs: replace (lambda (&rest _args) nil) with
cl-function stubs that validate the :acp-tool-call keyword, and add
three tests for label transitions (keyword, done, busy).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
markdown-overlays-put moves point to (point-min) of the narrowed
region — its internal parsers call goto-char without saving point.
Every streaming chunk triggered a full overlay re-process that
silently reset point to the start of the body, causing the cursor
to fight during message streaming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
markdown-overlays-put re-parses the entire body on every chunk
(remove all overlays, find source blocks, inline codes, headers,
bold, italic, etc., then recreate).  As the body grows each chunk
takes progressively longer — O(n) per chunk, O(n²) overall —
causing visible jerkiness during streaming.

For append updates, schedule overlay processing via an idle timer
(0.15s) that resets on each chunk.  Overlays only run once when
streaming pauses.  Non-append updates (new blocks, label changes)
still process synchronously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The append-in-place optimization (e4b07bc) only checked for
append + new-body + existing-body before taking the fast path,
which inserts body text without touching labels.  When
finalize-session-init passed both :append t and a new :label-left
(in_progress → completed), the label change was silently dropped
and Starting agent stayed stuck at [busy].

Guard the in-place path so it only fires when no new labels are
provided; otherwise fall through to the full rebuild that handles
label, body, and state together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues prevented context text (gathered from the source buffer
via agent-shell--context) from being included when submitting:

1. save-excursion in agent-shell--insert-to-shell-buffer restored
   point to before the inserted text.  Add (goto-char insert-start)
   after save-excursion so point lands at the prompt where the user
   types.

2. shell-maker-with-auto-scroll-edit in agent-shell--update-fragment
   and agent-shell--update-text advanced process-mark past the context
   during subsequent fragment updates.  Fragment updates only touch
   the output area above the prompt, so save and restore process-mark
   around the macro to prevent it from being dragged past pending
   input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test that context inserted at the prompt by
agent-shell--insert-to-shell-buffer satisfies three invariants:

- Point lands at the prompt (not after the context)
- Context sits between process-mark and point-max
- A subsequent fragment update does not drag process-mark past it

Covers representative text from each context source: line (magit),
region (file path + code), files (bare path), error (diagnostic),
and markdown (fenced code blocks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream commit 75cc736 introduced prepending a ```console code
block with the saved shell command to the tool call body.  The
streaming extraction into handle-tool-call-final dropped this
feature.  Restore it so execute-kind tool calls display the
command that was run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timvisher-dd and others added 8 commits March 12, 2026 14:57
Verifies that completed execute-kind tool calls with a saved
:command display a ```console fenced block in the rendered body.
Uses :active-request instead of shell-maker-busy mock to match
the upstream stale-notification guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mock

Upstream commit 4ad3dc9 replaced the (shell-maker-busy) stale-notification
guard with (map-elt state :active-request).  Update all streaming tests
to set :active-request t in state setup and remove the now-ineffective
shell-maker-busy mocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The regex used a literal backtick instead of \` (start-of-string
anchor), so it matched four-backtick sequences but missed standard
``` and ```lang fence lines.  Add test covering plain fences,
language-tagged fences, indented fences, and inline backtick
preservation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cond branch for meta-response only matched non-final updates,
so stdout sent exclusively on the completed tool_call_update was
never accumulated and handle-tool-call-final rendered empty output.
Move meta-response accumulation before the cond so it runs
regardless of final status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Silences byte-compile warning about free variable reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace cl-loop with seq-find in agent-shell-meta.el (CONTRIBUTING
  limits cl-lib to cl-defun)
- Extract store-output lambda to named helper
  agent-shell--store-tool-call-output
- Replace nreverse+copy-sequence with reverse
- Replace save-mark-and-excursion with save-excursion where mark is
  not modified
- Convert agent-shell-show-session-id from defcustom to defvar per
  "avoid premature customization" guideline
- Extract agent-shell--with-preserved-process-mark macro to DRY the
  process-mark save/restore pattern in update-fragment and update-text
- Extract markdown overlay debounce delay to defvar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix completely flat indentation in codex terminal output dedup test
- Move assertions inside cl-letf in meta-response-stdout dedup test
  so unwind-protect cleanup runs unconditionally
- Avoid shadowing agent-shell--state in let* then setq-local (removes
  "Making buffer-local while locally let-bound" warnings)
- Suppress message output in copy-session-id-test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timvisher-dd and others added 3 commits March 13, 2026 11:40
…flag

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The in-place append path (introduced in 9d991c6) copied the
leading-newline strip from the initial-insert path.  The initial
insert correctly strips them (label-to-body padding already adds
\n\n), but appended chunks carry content newlines — list-item
separators, paragraph breaks — that must be preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that single newlines (list-item separators) and double
newlines (paragraph breaks) at the start of appended chunks
survive the in-place append path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support streaming tool output and deduplication

2 participants