Type: architecture · Tier: deferred · Tool: wait_for_text
What's happening
Each poll in wait_for_text runs two tmux subprocess calls in sequence:
_read_pane_state issues display-message to read history_size, cursor_y, pane_height, pane_pid, pane_dead.
pane.capture_pane(start=start_line, end=None, join_wrapped=True) issues capture-pane, where start_line = baseline_abs - state.history_size + 1.
Between (1) and (2), tmux can scroll more lines into history. tmux's capture-pane computes top = gd->hsize + n against the live hsize at capture time (cmd-capture-pane.c#L158), not the hsize we sampled in step 1. So when N new rows scroll between the two calls:
- We pass
n = baseline_abs - hsize_at_step1 + 1
- tmux computes
top = hsize_at_step2 + n = baseline_abs + 1 + (hsize_at_step2 - hsize_at_step1)
- The captured window starts N rows past the row we wanted; those N rows are invisible to the wait this tick.
When it matters
Single-tick latency under bursty output. The next poll usually picks the missed rows back up — unless the missed rows have already scrolled past the visible region and been collected by grid_collect_history, at which point the rollover guard fires and the wait raises. So the bug surface is:
- One-tick
interval of latency on transient bursts (default 50 ms; bounded).
- Permanent miss only at the moment of history rollover — but rollover now raises.
In other words: the race exists but its impact is bounded by interval and capped at "raise" rather than "silently wrong" thanks to the rollover guard.
Options under consideration
1. Re-read after capture, retry on drift
state_pre = await asyncio.to_thread(_read_pane_state, pane)
start_line = baseline_abs - state_pre.history_size + 1
lines = await asyncio.to_thread(pane.capture_pane, start=start_line, ..., join_wrapped=True)
state_post = await asyncio.to_thread(_read_pane_state, pane)
delta = state_post.history_size - state_pre.history_size
if delta > 0:
# capture started \`delta\` rows too late; re-issue with adjusted start
...
Doubles per-tick subprocess cost in the worst case (3 tmux calls instead of 2 when drift is detected). Complicates the _PaneState invariant set: now we track two state reads per tick. Test matrix grows.
2. Chain in a single tmux command
Build one pane.cmd(...) invocation that issues display-message ; capture-pane with tmux's \; chaining. One stdout stream needs to be split by the caller. Drops out of libtmux's typed API. Tightly couples to tmux's chaining quirks.
3. Document, rely on next-tick recovery (current behavior)
Acceptable because:
- The miss is bounded by
interval (default 50 ms).
- Permanent misses now raise rather than silently return wrong results, courtesy of the rollover guard.
- The deterministic alternative for command-completion synchronization is
wait_for_channel composed with tmux wait-for -S — zero polling, zero races.
Recommendation
Stay on option 3 until real-world telemetry shows flaky single-tick misses. The blast radius is small and the agent-facing escape hatch (wait_for_channel) is already documented in the wait_for_text "When NOT to use this" section. Re-evaluate if a stress-test fixture starts catching missed transitions.
References
Type: architecture · Tier: deferred · Tool:
wait_for_textWhat's happening
Each poll in
wait_for_textruns two tmux subprocess calls in sequence:_read_pane_stateissuesdisplay-messageto readhistory_size,cursor_y,pane_height,pane_pid,pane_dead.pane.capture_pane(start=start_line, end=None, join_wrapped=True)issuescapture-pane, wherestart_line = baseline_abs - state.history_size + 1.Between (1) and (2), tmux can scroll more lines into history. tmux's
capture-panecomputestop = gd->hsize + nagainst the live hsize at capture time (cmd-capture-pane.c#L158), not the hsize we sampled in step 1. So when N new rows scroll between the two calls:n = baseline_abs - hsize_at_step1 + 1top = hsize_at_step2 + n = baseline_abs + 1 + (hsize_at_step2 - hsize_at_step1)When it matters
Single-tick latency under bursty output. The next poll usually picks the missed rows back up — unless the missed rows have already scrolled past the visible region and been collected by
grid_collect_history, at which point the rollover guard fires and the wait raises. So the bug surface is:intervalof latency on transient bursts (default 50 ms; bounded).In other words: the race exists but its impact is bounded by
intervaland capped at "raise" rather than "silently wrong" thanks to the rollover guard.Options under consideration
1. Re-read after capture, retry on drift
Doubles per-tick subprocess cost in the worst case (3 tmux calls instead of 2 when drift is detected). Complicates the
_PaneStateinvariant set: now we track two state reads per tick. Test matrix grows.2. Chain in a single tmux command
Build one
pane.cmd(...)invocation that issuesdisplay-message ; capture-panewith tmux's\;chaining. One stdout stream needs to be split by the caller. Drops out of libtmux's typed API. Tightly couples to tmux's chaining quirks.3. Document, rely on next-tick recovery (current behavior)
Acceptable because:
interval(default 50 ms).wait_for_channelcomposed withtmux wait-for -S— zero polling, zero races.Recommendation
Stay on option 3 until real-world telemetry shows flaky single-tick misses. The blast radius is small and the agent-facing escape hatch (
wait_for_channel) is already documented in thewait_for_text"When NOT to use this" section. Re-evaluate if a stress-test fixture starts catching missed transitions.References
top = gd->hsize + n)grid_collect_history)