Skip to content

Defend against ACP server reporting used > size#364

Draft
timvisher-dd wants to merge 1 commit intoxenodium:mainfrom
timvisher-dd:timvisher-dd/tests/add-shell-usage-regression-tests
Draft

Defend against ACP server reporting used > size#364
timvisher-dd wants to merge 1 commit intoxenodium:mainfrom
timvisher-dd:timvisher-dd/tests/add-shell-usage-regression-tests

Conversation

@timvisher-dd
Copy link
Contributor

@timvisher-dd timvisher-dd commented Mar 3, 2026

The ACP server (claude-agent-acp) has a bug where model switches cause used to exceed size in session/update notifications. For example, switching from Opus 1M to Sonnet 200k drops size to 200000 while used keeps growing past it (observed: 419574/200000 = 209.8%). This results in nonsensical context indicators and percentages e.g. (from a real session)

 Context: 420k/200k (209.8%)
  Tokens: 32 in · 11k out · 11m cached (11m total)
    Cost: USD75.96

While I intend to get a fix for this improper reporting into claude-agent-acp, agent-shell should be robust against it. To that end, when the garbage usage data is observed, the UI now signals unreliable data instead of showing nonsense:

  • Context indicator shows ? with warning face when used > size
  • Formatted usage shows raw numbers with (?) instead of a bogus percentage
  • A regression test replays the real observed ACP traffic from the model-switch scenario so this class of bug is caught going forward

This also adds comprehensive ERT test coverage for agent-shell-usage.el: notification updates, indicator scaling/colors, compaction replay, token saving, and number formatting.

For the claude-agent-acp side of this see zed-industries/claude-agent-acp#412

Test plan

  • All 21 ERT tests pass
  • checkdoc clean on both files
  • byte-compile clean on both files
  • Manual baking verification

@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from e528a97 to 3e99679 Compare March 3, 2026 19:49
@timvisher-dd timvisher-dd changed the title Add usage tracking tests and fix existing test failures Add usage tracking and context indicator regression tests Mar 3, 2026
@timvisher-dd timvisher-dd marked this pull request as ready for review March 3, 2026 20:08
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 3e99679 to 6d914d4 Compare March 11, 2026 17:31
@timvisher-dd timvisher-dd changed the title Add usage tracking and context indicator regression tests Defend against ACP server reporting used > size Mar 11, 2026
@timvisher-dd timvisher-dd marked this pull request as draft March 12, 2026 15:54
Add comprehensive ERT tests for agent-shell-usage.el covering
notification updates, context indicator scaling/colors, compaction
replay, token saving, and number formatting.

The ACP server has a bug where model switches cause used to exceed
size in session/update notifications. Rather than clamping, signal
unreliable data: indicator shows ? with warning face, format shows
(?) instead of a bogus percentage. A regression test replays real
observed traffic from the Opus 1M -> Sonnet 200k switch scenario.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 6d914d4 to c98e427 Compare March 13, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant