Skip to content

Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion#47

Open
tony wants to merge 22 commits into
mainfrom
stale-wait-for-text
Open

Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion#47
tony wants to merge 22 commits into
mainfrom
stale-wait-for-text

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented May 12, 2026

Closes #45.

What changed for users

wait_for_text becomes a "wait for output written after this call" primitive instead of "match anything currently in the pane." For agents whose first poll used to return found=True on stale scrollback, this is a real fix. The CHANGES ## libtmux-mcp 0.1.x (unreleased) section describes the published-release-reader contract.

Concrete user-visible changes vs 0.1.0a6:

  • Anchors on (history_size, cursor_y) at entry; only matches lines written after the call begins.
  • content_start / content_end parameters removed (the anchor supersedes them).
  • Joins visually wrapped lines for matching via tmux's -J flag — long error strings that wrap across rows now match.
  • Rejects empty pattern, sub-10 ms interval, non-positive timeout at entry.
  • Raises ToolError on pane death, pane respawn, or clear-history mid-wait.
  • Emits a notifications/message warning when polling approaches history-limit — the band where tmux's grid_collect_history trim+scroll bounce keeps hsize clamped at the cap and the strict-shrink guard can't catch every trim.
  • Server instructions, send_keys docstring, and wait_for_text docstring all point agents at wait_for_channel (composed with tmux wait-for -S) as the deterministic primitive for command completion. The polling-scraper wait_for_text is now framed for output the agent does not author.
  • docs/topics/prompting.md system-prompt fragment and docs/topics/troubleshooting.md "Timing" bullet brought in line with the channel-first guidance.

Theoretical grounding

Reviewed against tmux source pinned at 134ba6c:

  • Capture math: cmd-capture-pane.c:158 (top = gd->hsize + n)
  • Bottom-row clip: cmd-capture-pane.c:159-160
  • Trim mechanics: grid.c:386 (grid_collect_history decrements hsize by hlimit/10)
  • Scroll-after-trim sequence: grid-view.c:106 (grid_view_scroll_region_up)
  • Resize-grow / hsize pull-back: screen.c:451-465 (screen_resize_y)
  • Pane respawn preserves hsize: screen.c:127 (screen_reinit)
  • No native trim-detection primitive exists: format.c history format strings + options-table.c hooks list both checked, no monotonic counter or history-trimmed hook

The rollover detection's partial coverage is acknowledged explicitly. A 2 ms high-frequency probe across ~3000 samples confirmed there is no sub-poll window where hsize dips below entry once entry was sub-cap — so the strict-shrink guard catches clear-history and the entry-at-cap edge but not bursty trim. The runtime ctx.warning in the trim-risk band is the agent-facing mitigation; wait_for_channel is the deterministic alternative.

Test plan

rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run mypy; uv run py.test --reruns 0 -vvv; just build-docs

Green at every commit. 461 tests pass.

New regression tests cover: stale scrollback no-match (#45), bottom-row clip, mid-wait respawn, pane death under remain-on-exit, input rejection (empty pattern / tiny interval / non-positive timeout), wrap-spanning patterns, resize-grow not raising, history shrinking via clear-history raising, idle wait already in trim-risk band warning, burst wait advancing into trim-risk band warning, cancellation propagation, server instruction substring assertions, send_keys docstring cross-link.

Filed follow-ups (deferred, not in this PR)

Breaking changes

  • content_start / content_end parameters removed from wait_for_text. Pre-alpha API; CHANGES carries a before/after block.
  • wait_for_text raises ToolError on empty pattern, interval < 0.01, timeout <= 0, pane death, pane respawn, and history_size shrink below entry baseline. Previously most of these silently miscaptured or returned found=False after the full timeout.

@tony tony force-pushed the stale-wait-for-text branch from aa8de89 to 317fbee Compare May 12, 2026 22:58
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.18%. Comparing base (30da675) to head (dc0389b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   84.87%   85.18%   +0.31%     
==========================================
  Files          40       40              
  Lines        2268     2316      +48     
  Branches      286      295       +9     
==========================================
+ Hits         1925     1973      +48     
  Misses        260      260              
  Partials       83       83              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony
Copy link
Copy Markdown
Member Author

tony commented May 13, 2026

Code review

No issues found. Checked for bugs and AGENTS.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony force-pushed the stale-wait-for-text branch 3 times, most recently from 5c72dd9 to 9cafb35 Compare May 15, 2026 01:04
@tony
Copy link
Copy Markdown
Member Author

tony commented May 15, 2026

Code review

Found 1 issue:

  1. CHANGES unreleased block orders the subheadings ### Breaking changes### Fixes### Dependencies, but AGENTS.md mandates ### Breaking changes### Dependencies### What's new### Fixes### Documentation### Development. Move the Dependencies block to land between Breaking changes and Fixes.

libtmux-mcp/CHANGES

Lines 7 to 60 in fbe4698

<!-- END PLACEHOLDER - ADD NEW CHANGELOG ENTRIES BELOW THIS LINE -->
### Breaking changes
**{tooliconl}`wait-for-text` drops `content_start` and `content_end`**
The baseline anchor introduced in this release supersedes the manual capture-range knobs: capture now follows the pane's grid position automatically, so the two parameters had no remaining purpose. Agents that named them should drop them from their call sites. (#45)
```python
# Before
wait_for_text(pattern="OK", content_start=-100)
# After
wait_for_text(pattern="OK")
```
### Fixes
**{tooliconl}`wait-for-text` now waits for *new* output, not stale scrollback**
Previously {tooliconl}`wait-for-text` returned `found=True` on the first poll if the pattern was already present anywhere in the pane when the call began. It now snapshots `#{history_size} + #{cursor_y}` at entry and only matches lines written below that baseline, so agents using the tool to synchronise on command output see the result they expected. For the synchronous "is the pattern in the pane right now?" use case, call {tooliconl}`search-panes` instead. (#45)
**{tooliconl}`wait-for-text` guards against tmux's bottom-row capture clip**
`capture-pane -S` clips a below-visible start back to the bottom visible row, so a cursor sitting at the last visible row (the normal state after a fresh shell prompt) would match whatever stale text already occupied that row. The poll loop now reads `#{pane_height}` at entry and short-circuits the capture when the computed start would land below the pane. (#45)
**{tooliconl}`wait-for-text` surfaces pane death and respawn**
Each poll tick now reads `#{pane_dead}` and `#{pane_pid}` alongside the grid state. A pane whose process exited under `remain-on-exit`, or one that was swapped out by `respawn-pane` mid-wait, raises a `ToolError` instead of silently miscapturing against a frozen or reset grid. (#45)
**{tooliconl}`wait-for-text` rejects footgun inputs**
Empty `pattern`, sub-10ms `interval`, and non-positive `timeout` each raise a `ToolError` at entry. The previous behaviour would have matched every line (empty pattern), spun the tmux server (zero interval), or completed a surprise single probe instead of waiting (`timeout=0`). (#45)
**{tooliconl}`wait-for-text` honours the timeout budget end-to-end**
`start_time` is now captured before the initial baseline read so a stalled tmux server cannot consume the budget before the deadline timer even starts. libtmux's `tmux_cmd` uses `Popen.communicate()` with no subprocess timeout, so the worst case was real, not theoretical. (#45)
### Dependencies
**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Unlocks the new tmux-command wrappers shipped in libtmux 0.56.0 — {meth}`~libtmux.Pane.respawn`, {meth}`~libtmux.Pane.copy_mode`, {meth}`~libtmux.Pane.pipe`, {meth}`~libtmux.Pane.swap`, {meth}`~libtmux.Pane.paste_buffer`, {meth}`~libtmux.Pane.clear_history`, {meth}`~libtmux.Pane.display_message`, {meth}`~libtmux.Server.delete_buffer`, and the {meth}`~libtmux.Session.next_window` / {meth}`~libtmux.Session.previous_window` / {meth}`~libtmux.Session.last_window` trio — so the MCP no longer falls back to raw `cmd()` calls for tmux commands the upstream API now covers. (#46)
## libtmux-mcp 0.1.0a6 (2026-05-09)
libtmux-mcp 0.1.0a6 is the activation and registration cleanup release. It makes the server much easier for MCP clients to discover from ordinary "pane", "window", and "session" prompts, standardizes new setup docs around the `tmux` registration slug, and adds migration guidance for existing `libtmux` registrations. Existing installs keep working; the release changes defaults and documentation so new installs line up with the tool prefix users actually see.
### What's new
**Bare tmux prompts now find the server**
Agents no longer need the word "tmux" in every prompt before this server becomes relevant. The server instructions now name the positive cases users naturally write, such as "split this pane", "current window", and "this session", while steering clients away from unrelated browser tabs, editor splits, tiling-window-manager panes, and notebook cells. The discovery anchors most likely to answer those prompts - {tooliconl}`list-panes`, {tooliconl}`list-windows`, and {tooliconl}`snapshot-pane` - also carry the preload hint used by clients that keep a small always-available schema set. See {ref}`prompting` for the user-facing activation contract. (#37)
**New installs are documented as `tmux`**

(AGENTS.md rule:

libtmux-mcp/AGENTS.md

Lines 419 to 421 in fbe4698

**Fixed subheadings**, in this order when present: `### Breaking changes`, `### Dependencies`, `### What's new`, `### Fixes`, `### Documentation`, `### Development`. Dev tooling (helper scripts, internal automation) lives under `### Development`. For breaking changes, show the migration path with concrete inline code (e.g. a `# Before` / `# After` fenced code block). Dependency floor bumps use the form ``Minimum `pkg>=X.Y.Z` (was `>=X.Y.W`)``.
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony force-pushed the stale-wait-for-text branch 3 times, most recently from afea390 to 273d08f Compare May 16, 2026 09:40
tony added 19 commits May 16, 2026 07:43
…onger matches (#45)

why: wait_for_text returned found=True on the first poll whenever the
     pattern was already present in the pane when the call began, so
     agents using it to synchronise on command output saw the wrong
     result. The sibling wait_for_content_change already snapshots a
     baseline; wait_for_text now mirrors that pattern.

what:
- Snapshot (history_size, cursor_y) at entry; compute baseline_abs
  as the absolute grid index. Each poll re-reads history_size and
  captures from baseline_abs - hs_now + 1 onward, so the matched
  region tracks the pane's grid even as content scrolls into history.
- Add _read_grid_position and _read_history_size helpers that expand
  the corresponding tmux format strings via display-message.
- Drop content_start / content_end parameters; the baseline anchor
  supersedes them. Pre-alpha API contract makes this a clean drop.
- Rewrite the docstring around the new "wait for new appearance"
  semantics, cross-reference search_panes for synchronous matches,
  and document edge cases (scrollback truncation, reverse-index
  sequences, clear/reset).
…#45)

why: The baseline-anchor fix changes the user-visible contract for
     wait_for_text: a pattern already in the pane before the call
     must now return found=False, not the stale match the old code
     produced. The test suite needs to encode that contract so a
     future regression lands red.

what:
- Restructure WAIT_FOR_TEXT_FIXTURES with separate pre_command and
  during_command fields so fixtures can model "pattern was already
  on screen when wait_for_text started."
- Add stale_scrollback_does_not_match: emit the pattern before
  wait_for_text starts and assert found=False on timeout.
- Add matches_new_output_after_baseline: start the wait task, sleep,
  emit the marker via send_keys, assert found=True. Covers the
  positive path the baseline anchor was designed for.
why: The libtmux 0.56 bump merged on trunk ships Pane.display_message
     as a typed wrapper around display-message, and every other tool
     module in this package adopted it. The two helpers introduced for
     baseline anchoring (_read_grid_position, _read_history_size)
     reached back to raw pane.cmd("display-message", ...) instead.
     Bringing them in line keeps the codebase consistent with the
     same release-notes entry that markets the wrapper migration.

what:
- _read_grid_position and _read_history_size call
  pane.display_message(fmt, get_text=True) directly; the parse logic
  and empty-output fallback are unchanged.
- Rename the local "result" binding to "stdout" to match the call
  surface used in meta.py.
why: capture-pane -S clips a below-visible start back to the bottom
     visible row (cmd-capture-pane.c, in every tmux release we
     support). When the cursor sits at the last visible row at entry,
     start_line = cy0 + 1 lands below the pane and tmux returns the
     bottom row's content. Stale text on that row — a fresh shell
     prompt, a printed marker — matches the pattern instantly, before
     any new output has appeared. The baseline-anchor fix that
     motivated this branch did not close this corner: the math is
     correct, but it assumed an empty capture result when start_line
     pointed below the visible region.

what:
- Add _read_pane_height alongside _read_history_size. Read pane_height
  once at entry; the bottom-row clip fires on entry-state, and a
  per-tick read costs another tmux round-trip without changing
  anything for the common case.
- Before each capture, short-circuit lines to [] when
  start_line >= pane_height. Route through the existing deadline /
  sleep tail so the loop honours the timeout instead of tight-
  spinning.
- tests/test_pane_tools.py: regression for #45. Respawn the pane with
  a sh -c that pre-fills, prints the marker without a trailing
  newline, and sleeps — freezing hsize and cursor_y for the wait
  window so the guard's first-tick contract is what's actually
  tested.
…dget

why: start_time was captured after the initial _read_grid_position
     and _read_pane_height calls, so a stalled tmux server could
     block the tool for an arbitrary period before the user-supplied
     timeout deadline even started. libtmux's tmux_cmd uses
     Popen.communicate() without a subprocess timeout, so the worst
     case is real, not theoretical: a server-side stall on the first
     display-message leaks past the wall-clock budget the caller
     asked for.

what:
- Move start_time and deadline to the start of the resolved-pane
  block, before the baseline and pane-height reads. Same total work,
  no new branches; only the budget accounting changes.
why: Three parameter values turn the wait loop into a footgun rather
     than a wait primitive:

     - ``pattern=""`` — ``re.compile('')`` matches the zero-width
       position of every line, so the first poll returns
       ``found=True`` against whatever was already in the pane.
     - ``interval=0`` — ``asyncio.sleep(0)`` yields the event loop
       without idling, so the loop fires tmux subprocesses as fast
       as the scheduler hands them out (a self-inflicted server-
       side DoS).
     - ``timeout<=0`` — the loop body runs one probe before the
       deadline check, so a non-positive timeout silently turns
       "wait" into "synchronous probe" without saying so.

     Each is rejected at function entry so the failure surfaces as
     an explicit ``ToolError`` instead of corrupting the contract.

what:
- Inline parameter checks before the regex compile in wait_for_text;
  raise ToolError with the offending value in the message.
- tests/test_pane_tools.py: one test per rejection, asserting the
  matching ToolError message.
why: The baseline-anchor mechanism assumes the pane's process and
     grid stay coherent for the duration of the wait. Two events
     break that assumption silently:

     - respawn-pane mid-wait. screen_reinit (screen.c) resets cy to 0
       but preserves hsize, so the absolute anchor keeps pointing at
       the old process's grid — the new process's output ends up at
       rows the capture skips, and the wait times out as if nothing
       arrived.
     - pane death. With remain-on-exit set, tmux keeps the pane alive
       after its child exits and #{pane_dead} flips to 1. Without
       detection, the loop just keeps polling a stale frozen grid
       until the deadline.

     Both cases now surface as explicit ToolErrors so callers can
     react instead of timing out in the dark.

what:
- Replace _read_grid_position, _read_history_size, and
  _read_pane_height with a single _PaneState NamedTuple read via one
  display-message round-trip. Per-tick subprocess count drops from
  three reads (entry + per-tick history + per-tick we'd need height)
  to two (state + capture) without losing any fields.
- Capture pane_pid at entry; each tick checks pane_dead first and
  then verifies pane_pid is unchanged. Either condition raises
  ToolError naming the affected pane and the transition.
- tests/test_pane_tools.py: regression for respawn (asyncio.gather
  pattern: start wait_task, respawn after 0.1s, expect ToolError)
  and for pane death (set remain-on-exit, respawn into ``true``,
  poll #{pane_dead}, expect ToolError).
…me-row blind spot

why: Two parts of the wait_for_text docstring oversold the
     baseline-anchor model:

     - The reverse-index note claimed ``\eM`` "rewrites history
       below the baseline." It does not. screen_write_reverseindex
       (screen-write.c) scrolls the visible region within
       [rupper, rlower] or decrements cy. ``hsize`` is never
       touched. A maintainer reading the tmux source to verify a
       related claim would lose trust in the docstring.
     - The "What 'new' means" block sells the tool as observing
       NEW text without saying that in-place updates to the entry
       cursor row (carriage-return rewrites, progress spinners,
       single-line status updates) are excluded by design. Users
       wiring this against TUIs that update in place see a silent
       timeout.

what:
- Rewrite the reverse-index note around the real hazard — in-place
  repaints below the baseline — and cite screen-write.c for the
  visible-region scroll behaviour rather than asserting history is
  rewritten.
- Add a paragraph under "What 'new' means" that explicitly
  disclaims same-row rewrites and points readers at
  wait_for_content_change for the in-place case.
…md format

why: The prior CHANGES entries on this branch carried commit-message
     shape (why: / what: bodies, mechanism-heavy bullets) into a
     changelog that AGENTS.md specifies as user-vocabulary prose
     under Bold subheadings. Three CHANGES commits were dropped from
     the branch via rebase; this single commit replaces them with
     prose entries in the project's actual changelog format.

what:
- Breaking changes: parameter drop with before/after migration block.
- Fixes: one prose paragraph covering the baseline-anchor behaviour
  change (stale-scrollback fix, bottom-row clip handling,
  death/respawn surfacing) under a single user-vocabulary subheading.
- Fixes: one prose paragraph covering input validation.
- Section ordering: Breaking changes / Dependencies / Fixes, per
  AGENTS.md line 420.
…'t match the implementation

why: Two comments in tests/test_pane_tools.py — the WaitForTextFixture
     docstring and the docstring on
     test_wait_for_text_matches_new_output_after_baseline — describe
     the positive-path coordination as using asyncio.gather. The
     implementation actually uses asyncio.create_task + a sequenced
     await on the emit coroutine, because gather would schedule both
     coroutines concurrently and lose the start-baseline-then-emit
     ordering the test relies on. A future maintainer following the
     comments would either rewrite the test incorrectly or doubt the
     existing shape.

what:
- WaitForTextFixture pre_command docstring: replace the
  asyncio.gather mention with asyncio.create_task plus a sequenced
  await.
- test_wait_for_text_matches_new_output_after_baseline docstring:
  describe the actual create_task + sequenced-await mechanism and
  note explicitly that asyncio.gather is unsuitable here because it
  loses the ordering guarantee.
…t precedent

why: The _stale_settled helper inside the wait_for_text parametrized
     test waited 2 seconds for (history_size, cursor_y) to stabilise
     across two consecutive polls. Every other settle-loop in this
     file (_pane_dead, _ready, _bottom_row_ready) uses a 5-second
     budget for similar tmux-state stability under shell activity.
     The 2-second outlier risked flaking on CI nodes whose zsh
     async-prompt hooks keep firing for longer than the budget,
     missing the two-consecutive-identical-reads window that
     signals settlement.

what:
- retry_until(_stale_settled, 2, raises=True) becomes
  retry_until(_stale_settled, 5, raises=True). Happy-path wall-clock
  is unaffected (the predicate returns True as soon as state
  stabilises, usually within milliseconds); only the failure-mode
  budget widens.
…nchor stays sound

Adds a rollover guard inside the poll loop: when ``state.history_size <
entry.history_size`` (grid_collect_history fired, clear-history ran, or the
session's history-limit option was shrunk mid-wait), the absolute baseline
``hs0 + cy0`` is no longer recoverable. There is no server-side way to
disambiguate "trimmed" from "still anchored", so surface the lost anchor
as ToolError with a steer toward wait_for_channel for deterministic
synchronization.

Also fixes the stale ``pane_height`` reference in the bottom-row clip
guard — it was captured once at entry and never refreshed, so a pane
resize mid-wait left the guard keyed to a stale height. Compare against
``state.pane_height`` re-read each tick so the guard tracks the live
visible region.

Refs tmux grid.c grid_collect_history; verified against tmux master HEAD.
Passes ``join_wrapped=True`` (tmux ``-J``) to ``Pane.capture_pane`` so a
pattern that crosses the pane's visual wrap is matched against the joined
logical line. Without this, long error strings like
"Build failed: module not found" got split across two rows by tmux and a
naive ``re.search`` against each row in isolation never matched.

The returned ``matched_lines`` entry for a wrap-crossing hit is the joined
line and can therefore be longer than ``pane_width``; documented in the
Notes section so agents that bound output understand the implication.
…nistic primitive

Re-frames the wait family so agents pick the cheaper, race-free option
first. Three coordinated doc edits:

* ``send_keys`` docstring now forks the post-send choice into deterministic
  (``wait_for_channel`` + ``tmux wait-for -S``), pattern-match
  (``wait_for_text`` for output the agent doesn't control), and any-change
  (``wait_for_content_change``).
* Server ``_INSTR_WAIT_NOT_POLL`` system instruction names
  ``wait_for_channel`` first; the 2 KB instruction budget is preserved by
  tightening the supporting prose.
* ``wait_for_text`` docstring gains a ``When NOT to use this`` section
  that names the sequential-send_keys race and steers callers to the
  channel pattern with a pointer at the ``run_and_wait`` recipe.

Tests:
- ``test_base_instructions_prefer_wait_over_poll`` now requires
  ``wait_for_channel`` to appear in the system instructions and to be
  named before the fallbacks.
- ``test_send_keys_docstring_cross_links_wait_for_channel`` is a cheap
  regression against future docstring drift.
The Phase 1 rollover predicate (``state.history_size < entry.history_size``)
fires correctly on real row eviction — ``grid_collect_history`` trim,
``clear-history``, ``set-option history-limit <smaller>`` on tmux master —
but ALSO fires on resize-grow when ``hscrolled > 0``. tmux's
``screen_resize_y`` (screen.c:451-465) decrements ``gd->hsize`` on a
vertical grow by pulling rows from history back into the visible region.
The rows are not freed; only the history/visible-region partition shifts
and absolute indices stay valid. The Phase 1 predicate raised a spurious
``ToolError`` in that case.

Trim and resize-grow are distinguished by ``pane_height``: trim leaves it
unchanged, resize-grow increases it. Adding ``state.pane_height <=
entry.pane_height`` as a conjunct makes the predicate the actual signature
of row eviction, exempting resize-grow.

Also walks back two over-broad claims that shipped with Phase 1:
the "monotonically non-decreasing" framing in CHANGES (resize-grow proves
it is not monotone) and the parenthetical about "shrinking history-limit
mid-wait" — that retroactive-trim path requires tmux commit 195a9cf which
is not in any tagged release as of tmux 3.6a.

Adds test_wait_for_text_survives_resize_grow_with_scrolled_history.
… guard catches a subset

A 2 ms high-frequency repro over ~3000 hsize samples showed the rollover
guard's strict-inequality predicate (state.history_size < entry.history_size)
never observes a dip during real history-limit overflow: tmux's
grid_collect_history trims (~10% of hlimit) then immediately scrolls new
lines back, so sampled hsize stays clamped near the cap. The predicate
only fires when entry.history_size happened to land exactly at hlimit
(environment luck). All earlier claims that "wait_for_text raises on
history rollover" were therefore over-broad.

Walk back the contract:

* Docstring's "Scrollback rollover raises" Note now says "rollover
  detection is partial" and enumerates what is and isn't caught
  (clear-history yes; retroactive history-limit shrink on tmux >= 3.7
  yes; grid_collect_history trim during continuous output NO). Points
  to the upcoming risk-band warning and to wait_for_channel for
  deterministic synchronization.
* ToolError message changes from "history rolled over during wait" to
  "history shrank below entry baseline" -- describes what we actually
  detected (a shrink), not what we assumed (a rollover).
* CHANGES breaking-change entry: same walk-back; references the new
  ### Fixes warning entry and the tracking issue filed in libtmux.
* test_wait_for_text_raises_on_history_rollover renamed to
  test_wait_for_text_raises_when_history_is_cleared, since that is
  the path it actually exercises. Docstring trimmed to describe the
  test mechanism, with no dev-history narrative.
* Same dev-history narrative removed from
  test_wait_for_text_survives_resize_grow_with_scrolled_history
  (same anti-pattern, same project guidance about not referencing
  the task or fix history in code comments).

No behavior change beyond the message text and test rename. Phase 7
follow-up adds the risk-band warning predicate that catches what this
guard does not.
… band

The strict-shrink rollover guard misses grid_collect_history trim during
continuous output: tmux trims ~10% of history-limit then immediately
scrolls a line back, so sampled hsize stays clamped at the cap and the
predicate state.history_size < entry.history_size never fires. A 2 ms
high-frequency probe across ~3000 samples confirmed there is no sub-poll
window where hsize dips below entry once entry was sub-cap.

Emit a one-shot ctx.warning when sampled state enters the trim-risk
band (top 10% of history-limit) AND the pane has advanced since the
wait's baseline. MCP clients subscribed to notifications/message see
"correctness is best-effort here. For deterministic synchronization
use wait_for_channel." and can decide whether to keep waiting, retry,
or switch primitives. Warning is one-shot per call so we don't spam.

Adds history_limit: int to _PaneState; the field is read in the same
display-message round-trip as the other per-tick fields, so the loop
still costs two subprocesses per tick (state + capture).

No CHANGES update in this commit -- the consolidated wait-for-text
CHANGES entry will land in a later commit per the Narrative Bleed
doctrine (no published-release user saw "no warning" to be "fixed";
this warning is just part of the new contract).
… output

Adds the test the rollover claim should have had from the start. Earlier
test_wait_for_text_raises_when_history_is_cleared exercises clear-history
only -- not the grid_collect_history trim that fires during bursty
output. This new test covers the actual real-rollover path.

Shape: set history-limit=50 globally, split a fresh pane that inherits
the small limit, attach a stub Context that records ctx.warning calls,
start wait_for_text against a never-appearing pattern, burst 200 echo
lines through the pane. tmux's grid_collect_history fires repeatedly,
sampled history_size enters the risk band (>= 45), and the wait emits
the one-shot trim-risk-band warning.

The wait's found / timed_out result is intentionally not asserted --
the contract is best-effort once polling enters the risk band, so
pinning a specific result would be over-specifying what tmux's grid
model can't actually guarantee. The test pins the warning contract
(what the tool guarantees) only.
…t for a published-release reader

CHANGES walk-back and Phase 5 "spurious resize-grow" entries described
intra-branch transitions a 0.1.0a6 user never saw. Consolidate the
wait-for-text section into entries that describe only what the published
release reader gets:

* Breaking changes: one entry under "waits for new output, not stale
  scrollback" covers the #45 fix (real published bug), the new
  baseline-loss ToolErrors, the resize-grow exemption, the trim-risk-band
  warning, and the wait_for_channel cross-link. Reader needs zero
  branch context.

* Fixes: keep the wrapped-line match and footgun-rejection entries
  (both real published bugs). Drop the "no longer raises spuriously
  on resize-grow" entry (phantom fix: 0.1.0a6 had no rollover guard
  to raise, spuriously or otherwise). The duplicate "waits for new
  output" Fixes entry is also dropped -- the consolidated Breaking
  changes entry covers it.

Docstring Notes section in wait.py scrubbed of the tmux-commit-SHA
citation and the forward-looking libtmux tracking-issue cross-reference.
The tracking issue exists (filed at libtmux as part of this PR) but
that's branch-internal narrative; users running the shipped wait_for_text
don't need to know about a follow-up investigation issue to use the
tool correctly.
@tony tony force-pushed the stale-wait-for-text branch from 273d08f to 91ea744 Compare May 16, 2026 13:01
@tony tony changed the title Pane(fix[wait_for_text]): anchor on baseline so stale scrollback no longer matches Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion May 16, 2026
@tony
Copy link
Copy Markdown
Member Author

tony commented May 16, 2026

Code review

Found 2 issues:

  1. CHANGES unreleased section ordering still violates AGENTS.md (the May 15 review's finding remains unresolved). AGENTS.md says "Fixed subheadings, in this order when present: ### Breaking changes, ### Dependencies, ### What's new, ### Fixes, ### Documentation, ### Development." Current order is Breaking changesDocumentationDependenciesFixes. ### Documentation must come last (after ### Fixes), and ### Dependencies must come before ### Fixes.

libtmux-mcp/CHANGES

Lines 28 to 44 in 91ea744

Trim that fires during continuous output cannot be reliably detected from `history_size` alone: tmux's trim-then-scroll bounce keeps sampled `hsize` clamped near `history-limit`. When polling approaches that band, the tool emits a `notifications/message` warning so MCP clients can decide whether to keep waiting, retry, or switch to {tooliconl}`wait-for-channel`. For deterministic command-completion synchronization, compose `tmux wait-for -S <channel>` into the shell command and call {tooliconl}`wait-for-channel`. (#45)
### Documentation
**Wait family is re-framed around {tooliconl}`wait-for-channel` as the deterministic primitive**
The {tooliconl}`send-keys` docstring, the server system instructions, and the {tooliconl}`wait-for-text` docstring now point agents at {tooliconl}`wait-for-channel` with composed `tmux wait-for -S` for command completion, and reserve {tooliconl}`wait-for-text` / {tooliconl}`wait-for-content-change` for output the agent does not author. The `run_and_wait` recipe is the canonical status-preserving pattern. (#45 follow-up)
### Dependencies
**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Unlocks the new tmux-command wrappers shipped in libtmux 0.56.0 — {meth}`~libtmux.Pane.respawn`, {meth}`~libtmux.Pane.copy_mode`, {meth}`~libtmux.Pane.pipe`, {meth}`~libtmux.Pane.swap`, {meth}`~libtmux.Pane.paste_buffer`, {meth}`~libtmux.Pane.clear_history`, {meth}`~libtmux.Pane.display_message`, {meth}`~libtmux.Server.delete_buffer`, and the {meth}`~libtmux.Session.next_window` / {meth}`~libtmux.Session.previous_window` / {meth}`~libtmux.Session.last_window` trio — so the MCP no longer falls back to raw `cmd()` calls for tmux commands the upstream API now covers. (#46)
### Fixes
**{tooliconl}`wait-for-text` matches patterns across visually-wrapped lines**

AGENTS.md rule:

libtmux-mcp/AGENTS.md

Lines 419 to 421 in 91ea744

**Fixed subheadings**, in this order when present: `### Breaking changes`, `### Dependencies`, `### What's new`, `### Fixes`, `### Documentation`, `### Development`. Dev tooling (helper scripts, internal automation) lives under `### Development`. For breaking changes, show the migration path with concrete inline code (e.g. a `# Before` / `# After` fenced code block). Dependency floor bumps use the form ``Minimum `pkg>=X.Y.Z` (was `>=X.Y.W`)``.

  1. tests/test_pane_tools.py:2074 uses bash/zsh brace expansion {1..100} while every other pre-fill in the same file uses POSIX-portable $(seq 1 N) (8 other call sites: lines 1332, 1519, 1563, 1601, 1662, 1716, 2018). When the new pane's shell is dash or any non-bash POSIX /bin/sh, the literal string {1..100} is iterated once and produces no scrollback — the _prefilled retry_until then exhausts its budget before wait_for_text ever runs. Verified across all tmux 3.2a–master checkouts: shell selection sits below tmux, this is purely a shell-portability issue. Use $(seq 1 100) to match the file's convention.

# history-limit is 50. Risk floor (top 10%) is 45.
# Print 100 lines to ensure hsize reaches the cap (50).
fresh_pane.send_keys("for i in {1..100}; do echo line$i; done", True)
def _prefilled() -> bool:

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 2 commits May 16, 2026 08:50
…y_limit once

The risk-band warning's `advanced` conjunction
(state.history_size > entry.history_size OR state.cursor_y != entry.cursor_y)
gated the warning on the wait having moved since entry. Three real cases
that warning was meant to cover slipped past it:

* Entry already in the risk band, idle wait -- no output happens, nothing
  advances, warning never fires even though the baseline is in dangerous
  territory from tick zero.
* Cursor pinned at the bottom row of the visible region -- subsequent
  scrolls advance history_size but leave cursor_y unchanged; if hsize
  also stayed at the cap across two polls, advanced returned False.
* Trim+scroll bounce keeping hsize stable at the cap -- entry at 49,
  every subsequent poll also sees 49; advanced returns False.

Drop the conjunction. The one-shot warned_risk_band flag is already the
right noise filter -- it limits warnings to one per call regardless of
how long the wait sits in the band. Firing on any tick where
state.history_size >= risk_floor catches all three cases.

Also cache baseline_hlimit at entry instead of re-reading
state.history_limit each tick. history-limit is fixed at pane creation
(retroactive change only lands in tmux 3.7+), so per-tick reads are
redundant. The warning message uses the cached value too.

Adds test_wait_for_text_warns_when_already_in_risk_band -- pre-fills a
fresh pane past the risk floor, then runs an idle wait with no further
output. Asserts the warning fires anyway. Pairs with the existing
test_wait_for_text_warns_in_history_limit_risk_band which covers the
advance-into-the-band path.
…ooting at wait_for_channel

Two doc surfaces still recommended the send_keys -> wait_for_text ->
capture_pane pattern that the wait-family re-framing walked back:

* docs/topics/prompting.md has a "For general tmux workflows" code
  block tagged ``:class: system-prompt`` -- specifically meant for
  agents to copy into their AGENTS.md / CLAUDE.md / .cursorrules.
  Higher leverage than tool descriptions because it shapes the
  agent's mental model before any tool call.

* docs/topics/troubleshooting.md item 3 ("Timing") had the same
  recommendation in a "what to check" bullet -- lower visibility
  but still misleading.

Both now lead with wait_for_channel + composed tmux wait-for -S as
the deterministic primitive for command completion, and demote
wait_for_text / wait_for_content_change to "output you don't author"
duty. Matches the send_keys docstring, server instructions, and the
wait_for_text "When NOT to use this" section.

No CHANGES update -- the underlying re-framing already shipped under
### Documentation. These are stragglers being brought in line with
the announced change.
@tony tony force-pushed the stale-wait-for-text branch from 91ea744 to 82cc9cd Compare May 16, 2026 13:52
…proportions and ordering

Three corrections against the Changelog Conventions in AGENTS.md:

* **Section order.** The fixed subheading order is Breaking changes ->
  Dependencies -> What's new -> Fixes -> Documentation -> Development.
  The unreleased block had Documentation in the wrong slot (between
  Breaking changes and Dependencies) and Fixes ahead of Documentation
  out of order. Reordered to match.

* **Proportional prose.** Dropped backend-mechanism phrasing
  ("trim-then-scroll bounce", "sampled hsize clamped near history-limit",
  "no longer falls back to raw cmd() calls") that violates the
  "internal jargon" anti-pattern. The big breaking-change entry is now
  three short paragraphs in user vocabulary (what changed, what
  raises, what the warning means). The Dependencies entry no longer
  walls-of-text 9 specific {meth} method names; it describes the
  capability ("typed wrappers for tmux commands the server invokes").

* **PR ref form.** Replaced the lone `(#45 follow-up)` qualifier with
  bare `(#45)` to match the project's pattern across 25+ other entries.

The Documentation entry also picks up the doc-fragment work that
Phase 12 + Phase 14a actually shipped (quickstart, gotchas, prompting,
troubleshooting, recipes, send-keys topics), not just the three
docstring surfaces named originally.

The reordered breaking change pair leads with the bigger semantic
change (waits for new output) and follows with the syntactic break
(content_start / content_end removal) so readers see the meatier
fix first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wait_for_text matches stale scrollback immediately — needs baseline anchor

2 participants