diff --git a/CHANGES b/CHANGES index 19b0e198..3a0ee931 100644 --- a/CHANGES +++ b/CHANGES @@ -6,9 +6,67 @@ _Notes on upcoming releases will be added here_ +### Breaking changes + +**{tooliconl}`wait-for-text` waits for new output, not stale scrollback** + +{tooliconl}`wait-for-text` now matches lines written *after* the call begins. The previous behaviour returned `found=True` on the first poll whenever the pattern already lived in the pane, so agents synchronising on command output got the wrong result. For the synchronous "is the pattern in the pane right now?" case, call {tooliconl}`search-panes` instead. + +Baseline-loss events surface as `ToolError`: pane respawn, pane death, `clear-history`, and any other event that drops history below the entry baseline. Pane resize that pulls lines back from history into the visible region is exempted — the anchor stays valid. + +Trim during heavy output near `history-limit` can't be reliably detected from polling alone. When polling approaches that limit, the tool emits a `notifications/message` warning so MCP clients can decide whether to keep waiting, retry, or switch to {tooliconl}`wait-for-channel`. For deterministic command completion, compose `tmux wait-for -S ` into the shell command and call {tooliconl}`wait-for-channel`. (#45) + +**{tooliconl}`wait-for-text` drops `content_start` / `content_end`** + +The new baseline anchor follows the pane's grid position automatically, so the manual capture-range parameters have no remaining purpose. Drop them from call sites. (#45) + +```python +# Before +wait_for_text(pattern="OK", content_start=-100) + +# After +wait_for_text(pattern="OK") +``` + +**Wait result models drop `timed_out`** + +{class}`~libtmux_mcp.models.WaitForTextResult` and {class}`~libtmux_mcp.models.ContentChangeResult` drop the `timed_out` field. It was mechanically the boolean negation of the primary result (`not found` / `not changed`) and carried no information beyond that. Callers should switch to `not result.found` / `not result.changed`. (#47) + +```python +# Before +result = wait_for_text(pattern="OK") +if result.timed_out: + ... + +# After +result = wait_for_text(pattern="OK") +if not result.found: + ... +``` + ### Dependencies -**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Unlocks the new tmux-command wrappers shipped in libtmux 0.56.0 — {meth}`~libtmux.Pane.respawn`, {meth}`~libtmux.Pane.copy_mode`, {meth}`~libtmux.Pane.pipe`, {meth}`~libtmux.Pane.swap`, {meth}`~libtmux.Pane.paste_buffer`, {meth}`~libtmux.Pane.clear_history`, {meth}`~libtmux.Pane.display_message`, {meth}`~libtmux.Server.delete_buffer`, and the {meth}`~libtmux.Session.next_window` / {meth}`~libtmux.Session.previous_window` / {meth}`~libtmux.Session.last_window` trio — so the MCP no longer falls back to raw `cmd()` calls for tmux commands the upstream API now covers. (#46) +**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Picks up libtmux 0.56's typed wrappers for the tmux commands the server invokes — the MCP now uses libtmux's public API instead of raw command-line escapes for pane lifecycle, scrollback, and session navigation. (#46) + +### Fixes + +**{tooliconl}`wait-for-channel` recipe no longer exits the parent shell** + +The `run_and_wait` prompt template previously appended `exit $__mcp_status` to its shell payload to preserve the command's exit status. In an interactive shell that exits the shell itself, destroying single-pane sessions. The recipe now signals completion via `tmux wait-for -S` without exiting, and the equivalent example in {doc}`/tools/pane/wait-for-channel` is similarly fixed. Exit-status preservation in interactive shells is documented as out-of-scope; agents that need it should inspect the captured output for command-specific success markers. (#47) + +**{tooliconl}`wait-for-text` matches patterns across visually-wrapped lines** + +Long patterns like `"Build failed: module not found"` that tmux wraps at the pane's column width are now matched against the joined logical line. Previously the wrap split the pattern across two captured rows and neither row matched. The joined line is returned in `matched_lines` and can exceed the pane width. (#45) + +**{tooliconl}`wait-for-text` rejects misused `pattern` / `interval` / `timeout`** + +Empty `pattern`, `interval` below 10 ms, and non-positive `timeout` each raise `ToolError` at entry. Previously they silently matched every line, spun the tmux server in a tight loop, or completed a surprise single probe. (#45) + +### Documentation + +**Wait family is re-framed around {tooliconl}`wait-for-channel` as the deterministic primitive** + +The {tooliconl}`send-keys` docstring, server system instructions, {tooliconl}`wait-for-text` docstring, and the user-facing quickstart, gotchas, prompting, troubleshooting, recipes, and send-keys topics now point agents at {tooliconl}`wait-for-channel` with composed `tmux wait-for -S` for command completion. {tooliconl}`wait-for-text` and {tooliconl}`wait-for-content-change` are reframed as the fallbacks for output the agent does not author. The `run_and_wait` recipe shows the canonical safe-completion pattern. (#45) ## libtmux-mcp 0.1.0a6 (2026-05-09) diff --git a/docs/demo.md b/docs/demo.md index 21475479..a107f0fb 100644 --- a/docs/demo.md +++ b/docs/demo.md @@ -66,11 +66,11 @@ These are the actual tool headings as they render on tool pages: ### In prose -Use {tooliconl}`search-panes` to find text across all panes. If you know which pane, use {tooliconl}`capture-pane` instead. After running a command with {tooliconl}`send-keys`, always {tooliconl}`wait-for-text` before capturing. +Use {tooliconl}`search-panes` to find text across all panes. If you know which pane, use {tooliconl}`capture-pane` instead. After running a command with {tooliconl}`send-keys`, compose `tmux wait-for -S` and call {tooliconl}`wait-for-channel` before capturing. ### Dense inline (toolref, no badges) -The fundamental pattern: {toolref}`send-keys` → {toolref}`wait-for-text` → {toolref}`capture-pane`. For discovery: {toolref}`list-sessions` → {toolref}`list-panes` → {toolref}`get-pane-info`. +The fundamental pattern: {toolref}`send-keys` → {toolref}`wait-for-channel` → {toolref}`capture-pane`. For discovery: {toolref}`list-sessions` → {toolref}`list-panes` → {toolref}`get-pane-info`. ## Environment variable references @@ -87,7 +87,7 @@ Use {tooliconl}`search-panes` before {tooliconl}`capture-pane` when you don't kn ``` ```{warning} -Do not call {toolref}`capture-pane` immediately after {toolref}`send-keys` — there is a race condition. Use {toolref}`wait-for-text` between them. +Do not call {toolref}`capture-pane` immediately after {toolref}`send-keys` — there is a race condition. Compose `tmux wait-for -S` into the command and use {toolref}`wait-for-channel` between them. ``` ```{note} diff --git a/docs/prompts.md b/docs/prompts.md index ed4febcf..00acb44f 100644 --- a/docs/prompts.md +++ b/docs/prompts.md @@ -69,12 +69,12 @@ channel is signalled — strictly cheaper in agent turns than a ````markdown Run this shell command in tmux pane %1 and block -until it finishes, preserving the command's exit status: +until it finishes: ```python send_keys( pane_id='%1', - keys='pytest; __mcp_status=$?; tmux wait-for -S libtmux_mcp_wait_; exit $__mcp_status', + keys='pytest; tmux wait-for -S libtmux_mcp_wait_', ) wait_for_channel(channel='libtmux_mcp_wait_', timeout=60.0) capture_pane(pane_id='%1', max_lines=100) @@ -83,12 +83,19 @@ capture_pane(pane_id='%1', max_lines=100) After the channel signals, read the last ~100 lines to verify the command's behaviour. Do NOT use a `capture_pane` retry loop — `wait_for_channel` is strictly cheaper in agent turns. + +The payload does not preserve the command's exit status: doing so +in an interactive shell would require exiting the shell (which kills +the pane) or routing through an out-of-band file or tmux variable. +If you need the status, inspect the captured output for +command-specific success markers. ```` -The ``__mcp_status=$?`` capture and ``exit $__mcp_status`` mean the -agent observes the command's real exit code via shell-conventional -``$?`` — even though the wait-for signal fires regardless of -success or failure. +Shell ``;`` semantics fire the ``wait-for -S`` whether ``pytest`` +succeeded or failed, so the edge-triggered signal never deadlocks the +agent on a crashed command. Status preservation is intentionally +omitted: chaining ``exit $status`` after the signal would exit the +interactive shell itself, destroying single-pane sessions. --- diff --git a/docs/quickstart.md b/docs/quickstart.md index 515bb4f6..296bdc45 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -51,11 +51,11 @@ Search all my panes for the word "error". When you say "run `make test` and show me the output", the agent executes a three-step pattern: -1. {tool}`send-keys` — send the command to a tmux pane -2. {tool}`wait-for-text` — wait for the shell prompt to return (command finished) +1. {tool}`send-keys` — send the command (composed with `tmux wait-for -S `) to a tmux pane +2. {tool}`wait-for-channel` — block deterministically until the command signals completion 3. {tool}`capture-pane` — read the terminal output -This **send → wait → capture** sequence is the fundamental workflow. Most agent interactions with tmux follow this pattern or a variation of it. +This **send → wait → capture** sequence is the fundamental workflow. For commands the agent authors, the channel pattern is deterministic; for output the agent does not author (third-party log lines, daemon prompts, interactive supervisors), substitute {tool}`wait-for-text` for step 2. ## Next steps diff --git a/docs/recipes.md b/docs/recipes.md index 9f1e2e88..ea926a75 100644 --- a/docs/recipes.md +++ b/docs/recipes.md @@ -204,8 +204,11 @@ agent calls {tooliconl}`send-keys` in the original pane: ```{warning} Calling {toolref}`capture-pane` immediately after {toolref}`send-keys` is a race condition. {toolref}`send-keys` returns the moment tmux accepts the -keystrokes, not when the command finishes. Always use {toolref}`wait-for-text` -between them. +keystrokes, not when the command finishes. For commands the agent authors, +compose `tmux wait-for -S ` into the command and call +{toolref}`wait-for-channel` — deterministic, race-free. For output the +agent does not author (server-startup banners, test-result lines like +the ones above), use {toolref}`wait-for-text` instead. ``` ### The non-obvious part @@ -391,9 +394,11 @@ long-lived process, I would not hijack it -- I would use a different pane. ### Act The agent calls {tooliconl}`clear-pane`, then {tooliconl}`send-keys` with -`keys: "pytest"`, then {tooliconl}`wait-for-text` with -`pattern: "passed|failed|error"` and `regex: true`, then -{tooliconl}`capture-pane` to read the fresh output. +`keys: "pytest; tmux wait-for -S pytest_done"`, then +{tooliconl}`wait-for-channel` with `channel: "pytest_done"`, then +{tooliconl}`capture-pane` to read the fresh output. Composing the +`tmux wait-for -S` signal directly into the shell command is the +deterministic path for authored commands. ### The non-obvious part diff --git a/docs/tools/index.md b/docs/tools/index.md index 1bd13ba2..a9e153ec 100644 --- a/docs/tools/index.md +++ b/docs/tools/index.md @@ -20,7 +20,8 @@ All tools accept an optional `socket_name` parameter for multi-server support. I - Already know the `pane_id` → use it directly **Running a command?** -- {tool}`send-keys` — then {tool}`wait-for-text` + {tool}`capture-pane` +- {tool}`send-keys` (with `tmux wait-for -S ` composed into the keys) → {tool}`wait-for-channel` → {tool}`capture-pane` — the deterministic path for commands the agent authors +- For output the agent does not author (third-party logs, daemon prompts), use {tool}`wait-for-text` or {tool}`wait-for-content-change` between `send-keys` and `capture-pane` - Pasting multi-line text? → {tool}`paste-text` **Creating workspace structure?** diff --git a/docs/tools/pane/send-keys.md b/docs/tools/pane/send-keys.md index 11f061ac..19518a05 100644 --- a/docs/tools/pane/send-keys.md +++ b/docs/tools/pane/send-keys.md @@ -7,7 +7,10 @@ terminal. This is the primary way to execute commands in tmux panes. **Avoid when** you need to run something and immediately capture the result — -send keys first, then use {tooliconl}`capture-pane` or {tooliconl}`wait-for-text`. +compose `tmux wait-for -S ` into the keys and call +{tooliconl}`wait-for-channel` for deterministic completion, or fall back to +{tooliconl}`wait-for-text` / {tooliconl}`wait-for-content-change` when you +must observe output the agent does not author. **Side effects:** Sends keystrokes to the pane. If `enter` is true (default), the command executes. diff --git a/docs/tools/pane/wait-for-channel.md b/docs/tools/pane/wait-for-channel.md index e0c3d815..1fbb7c98 100644 --- a/docs/tools/pane/wait-for-channel.md +++ b/docs/tools/pane/wait-for-channel.md @@ -2,24 +2,26 @@ tmux's `wait-for` command exposes named, server-global channels that clients can signal and block on. These give agents an explicit synchronization primitive — strictly cheaper in agent turns than polling pane content via {tooliconl}`capture-pane` or {tooliconl}`wait-for-text`. -The composition pattern: {tooliconl}`send-keys` a command that emits the signal on its exit, then `wait_for_channel`. The signal MUST fire on both success and failure paths or the wait will block until the timeout. +The composition pattern: {tooliconl}`send-keys` a command followed by `; tmux wait-for -S NAME`, then call `wait_for_channel`. Shell `;` semantics fire the second statement whether the first succeeds or fails, so the edge-triggered signal never deadlocks the agent on a crashed command. ```python send_keys( pane_id="%1", - keys="pytest; status=$?; tmux wait-for -S tests_done; exit $status", + keys="pytest; tmux wait-for -S tests_done", ) wait_for_channel("tests_done", timeout=60) ``` -The `; status=$?; tmux wait-for -S NAME; exit $status` idiom is the load-bearing safety contract — `wait-for` is edge-triggered, so a crash before the signal would deadlock until the wait's `timeout`. +The `; tmux wait-for -S NAME` suffix is the load-bearing safety contract — `wait-for` is edge-triggered, so a crash before the signal would deadlock until the wait's `timeout`. The shell separator `;` runs the next statement unconditionally, so the signal fires on both success and failure paths. + +The payload deliberately does not append `exit $?` — in an interactive shell that exits the shell itself, taking single-pane sessions down with it. If exit-status preservation matters, capture the status out-of-band (e.g. write it to a file the agent reads later, or use a dedicated scratch pane). ```{fastmcp-tool} wait_for_tools.wait_for_channel ``` **Use when** the shell command can reliably emit the signal (single test runs, build scripts, dev-server boot, anything composable with -`; status=$?; tmux wait-for -S name; exit $status`). +`; tmux wait-for -S name`). **Avoid when** the signal cannot be guaranteed — for example, when the command might be killed externally. Use {tooliconl}`wait-for-text` diff --git a/docs/tools/pane/wait-for-content-change.md b/docs/tools/pane/wait-for-content-change.md index e5d5887c..a6ed108f 100644 --- a/docs/tools/pane/wait-for-content-change.md +++ b/docs/tools/pane/wait-for-content-change.md @@ -31,8 +31,7 @@ Response: { "changed": true, "pane_id": "%0", - "elapsed_seconds": 1.234, - "timed_out": false + "elapsed_seconds": 1.234 } ``` diff --git a/docs/tools/pane/wait-for-text.md b/docs/tools/pane/wait-for-text.md index 813f998c..f7cd9b68 100644 --- a/docs/tools/pane/wait-for-text.md +++ b/docs/tools/pane/wait-for-text.md @@ -35,8 +35,7 @@ Response: "Server listening on port 8000" ], "pane_id": "%2", - "elapsed_seconds": 0.002, - "timed_out": false + "elapsed_seconds": 0.002 } ``` diff --git a/docs/topics/gotchas.md b/docs/topics/gotchas.md index b7aa2e91..76a63441 100644 --- a/docs/topics/gotchas.md +++ b/docs/topics/gotchas.md @@ -31,15 +31,15 @@ The `enter` parameter defaults to `true`, which is correct for commands (`make t {"tool": "capture_pane", "arguments": {"pane_id": "%0"}} ``` -The capture above may return the terminal state **before** pytest runs. Use {tooliconl}`wait-for-text` between them: +The capture above may return the terminal state **before** pytest runs. Compose `tmux wait-for -S ` into the command and block on {tooliconl}`wait-for-channel` — deterministic, race-free: ```json -{"tool": "send_keys", "arguments": {"keys": "pytest", "pane_id": "%0"}} -{"tool": "wait_for_text", "arguments": {"pattern": "passed|failed|error", "pane_id": "%0", "regex": true}} +{"tool": "send_keys", "arguments": {"keys": "pytest; tmux wait-for -S pytest_done", "pane_id": "%0"}} +{"tool": "wait_for_channel", "arguments": {"channel": "pytest_done", "timeout": 60}} {"tool": "capture_pane", "arguments": {"pane_id": "%0"}} ``` -See {ref}`recipes` for the complete pattern. +For output the agent does not author (third-party logs, daemon prompts, interactive supervisors), substitute {tooliconl}`wait-for-text` for `wait_for_channel`. See {ref}`recipes` for the complete pattern. ## Window names are not unique across sessions diff --git a/docs/topics/prompting.md b/docs/topics/prompting.md index 3f1fed98..c2c13b05 100644 --- a/docs/topics/prompting.md +++ b/docs/topics/prompting.md @@ -62,9 +62,9 @@ These natural-language prompts reliably trigger the right tool sequences: | Prompt | Agent interprets as | |--------|-------------------| -| [Run `pytest` in my build pane and show results]{.prompt} | {toolref}`send-keys` → {toolref}`wait-for-text` → {toolref}`capture-pane` | -| [Start the dev server and wait until it's ready]{.prompt} | {toolref}`send-keys` → {toolref}`wait-for-text` (for "listening on") | -| [Spin up the dev server in the bottom-right pane]{.prompt} | {toolref}`find-pane-by-position` (corner=bottom-right) → {toolref}`send-keys` → {toolref}`wait-for-text` | +| [Run `pytest` in my build pane and show results]{.prompt} | {toolref}`send-keys` (with `tmux wait-for -S` composed in) → {toolref}`wait-for-channel` → {toolref}`capture-pane` | +| [Start the dev server and wait until it's ready]{.prompt} | {toolref}`send-keys` → {toolref}`wait-for-text` (for "listening on" — third-party output the agent doesn't author) | +| [Spin up the dev server in the bottom-right pane]{.prompt} | {toolref}`find-pane-by-position` (corner=bottom-right) → {toolref}`send-keys` → {toolref}`wait-for-text` (for the server's readiness banner) | | [Check if any pane has errors]{.prompt} | {toolref}`search-panes` with pattern "error" | | [Set up a workspace with editor, server, and tests]{.prompt} | {toolref}`create-session` → {toolref}`split-window` (x2) → {toolref}`set-pane-title` (x3) | | [What's running in my tmux sessions?]{.prompt} | {toolref}`list-sessions` → {toolref}`list-panes` → {toolref}`capture-pane` | @@ -90,8 +90,13 @@ Copy these into your agent's system instructions (`AGENTS.md`, `CLAUDE.md`, `.cu When executing long-running commands (servers, builds, test suites), use tmux via the libtmux MCP server rather than running them directly. -This keeps output accessible for later inspection. Use the pattern: -send_keys → wait_for_text (for completion signal) → capture_pane. +This keeps output accessible for later inspection. + +For command completion, compose `tmux wait-for -S ` into the +shell command and call wait_for_channel — deterministic, no polling. +Use wait_for_text or wait_for_content_change for observation flows +(third-party logs, daemon prompts). Never capture_pane immediately +after send_keys — the command may still be running. ``` ### For safe agent behavior @@ -134,6 +139,6 @@ When an agent is unsure which tool to use, these rules help: 1. **Discovery first**: Call {toolref}`list-sessions` or {toolref}`list-panes` before acting on specific targets 2. **Prefer IDs**: Once you have a `pane_id`, use it for all subsequent calls — it never changes during the pane's lifetime -3. **Wait, don't poll**: Use {toolref}`wait-for-text` instead of repeatedly calling {toolref}`capture-pane` in a loop +3. **Wait, don't poll**: For commands the agent authors, prefer {toolref}`wait-for-channel` with `tmux wait-for -S ` composed into the command — deterministic and race-free. Fall back to {toolref}`wait-for-text` or {toolref}`wait-for-content-change` for output the agent doesn't author. Never call {toolref}`capture-pane` in a retry loop. 4. **Content vs. metadata**: If looking for text *in* a terminal, use {toolref}`search-panes`. If looking for pane *properties* (name, PID, path), use {toolref}`list-panes` or {toolref}`get-pane-info` 5. **Destructive tools are opt-in**: Never kill sessions, windows, or panes unless the user explicitly asks diff --git a/docs/topics/troubleshooting.md b/docs/topics/troubleshooting.md index 34779398..977aa280 100644 --- a/docs/topics/troubleshooting.md +++ b/docs/topics/troubleshooting.md @@ -75,7 +75,7 @@ Symptom-based guide. Find your problem, follow the steps. 2. **Special characters**: tmux interprets some key names (e.g. `C-c`, `Enter`). If sending literal text, use `literal=true`. -3. **Timing**: After `send_keys`, use `wait_for_text` to wait for the command to complete before capturing output. Don't `capture_pane` immediately — the command may still be running. +3. **Timing**: After `send_keys`, prefer composing `tmux wait-for -S ` into the shell command and calling `wait_for_channel` for deterministic completion. Use `wait_for_text` or `wait_for_content_change` only when waiting on output you do not author. Don't `capture_pane` immediately — the command may still be running. ## Silent startup failure diff --git a/src/libtmux_mcp/models.py b/src/libtmux_mcp/models.py index 85e083fd..866bd170 100644 --- a/src/libtmux_mcp/models.py +++ b/src/libtmux_mcp/models.py @@ -237,7 +237,6 @@ class WaitForTextResult(BaseModel): ) pane_id: str = Field(description="Pane ID that was polled") elapsed_seconds: float = Field(description="Time spent waiting in seconds") - timed_out: bool = Field(description="Whether the timeout was reached") class PaneSnapshot(BaseModel): @@ -445,4 +444,3 @@ class ContentChangeResult(BaseModel): changed: bool = Field(description="Whether the content changed before timeout") pane_id: str = Field(description="Pane ID that was polled") elapsed_seconds: float = Field(description="Time spent waiting in seconds") - timed_out: bool = Field(description="Whether the timeout was reached") diff --git a/src/libtmux_mcp/prompts/recipes.py b/src/libtmux_mcp/prompts/recipes.py index 098c6b28..810f3ef3 100644 --- a/src/libtmux_mcp/prompts/recipes.py +++ b/src/libtmux_mcp/prompts/recipes.py @@ -21,9 +21,10 @@ def run_and_wait( """Run a shell command in a tmux pane and wait for completion. The returned template teaches the model the safe composition - pattern — always emit ``tmux wait-for -S`` on both success and - failure paths so a crash never deadlocks the agent on an - edge-triggered signal. See ``docs/topics/prompting.md``. + pattern: shell ``;`` semantics fire ``tmux wait-for -S`` whether + the command succeeds or fails, so the edge-triggered signal + never deadlocks an agent waiting on a crashed command. See + ``docs/topics/prompting.md``. Each invocation embeds a fresh UUID-scoped channel name so concurrent agents (or parallel prompt calls from a single agent) @@ -40,11 +41,9 @@ def run_and_wait( Maximum seconds to wait for the signal. Default 60. """ channel = f"libtmux_mcp_wait_{uuid.uuid4().hex}" - shell_payload = ( - f"{command}; __mcp_status=$?; tmux wait-for -S {channel}; exit $__mcp_status" - ) + shell_payload = f"{command}; tmux wait-for -S {channel}" return f"""Run this shell command in tmux pane {pane_id} and block -until it finishes, preserving the command's exit status: +until it finishes: ```python send_keys( @@ -58,6 +57,12 @@ def run_and_wait( After the channel signals, read the last ~100 lines to verify the command's behaviour. Do NOT use a `capture_pane` retry loop — `wait_for_channel` is strictly cheaper in agent turns. + +The payload does not preserve the command's exit status: doing so +in an interactive shell would require exiting the shell (which kills +the pane) or routing through an out-of-band file or tmux variable. +If you need the status, inspect the captured output for +command-specific success markers. """ @@ -158,6 +163,10 @@ def interrupt_gracefully(pane_id: str) -> str: 2. `wait_for_text(pane_id="{pane_id}", pattern="\\$ |\\# |\\% ", regex=True, timeout=5.0)` — waits for a common shell prompt glyph. Adjust the pattern to match the user's shell theme. + The `wait_for_channel` pattern doesn't apply here — `C-c` is a + signal, not a shell command, so there's no statement to compose + `tmux wait-for -S` into. The shell prompt itself is the only + signal that the interrupt landed. 3. If the wait times out the process is ignoring SIGINT. Stop and ask the caller how to proceed — do NOT escalate automatically to `C-\\` (SIGQUIT) or `kill`. diff --git a/src/libtmux_mcp/server.py b/src/libtmux_mcp/server.py index 8bf3d2ff..8b36557d 100644 --- a/src/libtmux_mcp/server.py +++ b/src/libtmux_mcp/server.py @@ -94,9 +94,9 @@ ) _INSTR_WAIT_NOT_POLL = ( - "WAIT, DON'T POLL: use wait_for_text (text/regex) or " - "wait_for_content_change instead of capture_pane retry loops; " - "both block server-side until the condition or timeout." + "WAIT, DON'T POLL: prefer wait_for_channel (compose `tmux wait-for -S`) " + "for command completion. Else wait_for_text / wait_for_content_change " + "for output you don't author." ) #: Gap-explainer: write-hook tools are intentionally absent. See module diff --git a/src/libtmux_mcp/tools/pane_tools/io.py b/src/libtmux_mcp/tools/pane_tools/io.py index 15870aa3..de209b2d 100644 --- a/src/libtmux_mcp/tools/pane_tools/io.py +++ b/src/libtmux_mcp/tools/pane_tools/io.py @@ -32,9 +32,23 @@ def send_keys( ) -> str: """Send keys (commands or text) to a tmux pane. - After sending, use wait_for_text to block until the command completes, - or capture_pane to read the result. Do not capture_pane immediately — - there is a race condition. + After sending, choose your synchronization primitive based on what you + control: + + - **Deterministic (preferred):** compose ``tmux wait-for -S `` + into the shell command and call ``wait_for_channel``. See the + ``run_and_wait`` prompt for the canonical safe-completion pattern. + Cheaper in agent turns and immune to baseline races. + - **Pattern-match:** call ``wait_for_text`` when the output you await + is yours to author and won't appear before the wait locks its + baseline (e.g. a sentinel ``echo`` after a long command). Fast + ``echo`` statements can race the baseline read; reserve this for + output the agent does not control. + - **Any change:** call ``wait_for_content_change`` when you don't know + the output shape. + + Do NOT call ``capture_pane`` immediately — both the read and the + pattern-match paths race the pane's PTY draw. Parameters ---------- diff --git a/src/libtmux_mcp/tools/pane_tools/wait.py b/src/libtmux_mcp/tools/pane_tools/wait.py index 52767ce7..924634b0 100644 --- a/src/libtmux_mcp/tools/pane_tools/wait.py +++ b/src/libtmux_mcp/tools/pane_tools/wait.py @@ -22,6 +22,9 @@ WaitForTextResult, ) +if t.TYPE_CHECKING: + from libtmux.pane import Pane + logger = logging.getLogger(__name__) #: Exceptions that indicate "client transport is gone, keep polling". @@ -96,6 +99,60 @@ async def _maybe_log( return +class _PaneState(t.NamedTuple): + """Per-tick snapshot of pane state used by :func:`wait_for_text`. + + Read in one ``display-message`` round-trip so the loop costs two + subprocesses per tick (state + capture) instead of growing + linearly with each new field. ``|`` is the field separator — + history/cursor/height are integers, ``pane_pid`` is a numeric PID + string, and ``pane_dead`` is the literal ``"0"``/``"1"`` flag. + """ + + history_size: int + cursor_y: int + pane_height: int + pane_pid: str + pane_dead: bool + + +def _read_pane_state(pane: Pane) -> _PaneState: + """Return a :class:`_PaneState` snapshot for ``pane``. + + Combines the per-tick reads ``wait_for_text`` needs into a single + ``display-message`` call. ``history_size + cursor_y`` gives the + absolute grid anchor at entry; ``pane_height`` gates the bottom- + row capture clip; ``pane_pid`` and ``pane_dead`` surface + respawn-pane and pane-death events that invalidate the baseline. + """ + stdout = pane.display_message( + "#{history_size}|#{cursor_y}|#{pane_height}|#{pane_pid}|#{pane_dead}", + get_text=True, + ) + raw = stdout[0] if stdout else "0|0|0||0" + hs, cy, sy, pid, dead = raw.split("|", 4) + return _PaneState( + history_size=int(hs), + cursor_y=int(cy), + pane_height=int(sy), + pane_pid=pid, + pane_dead=dead == "1", + ) + + +def _read_history_limit(pane: Pane) -> int: + """Read the pane's ``history-limit`` once. + + Fixed at pane creation (retroactive change only lands in tmux 3.7+), + so the result is safe to cache for the lifetime of a wait. Kept out + of :func:`_read_pane_state` so the per-tick read doesn't pay for a + value that never changes between polls. + """ + stdout = pane.display_message("#{history_limit}", get_text=True) + raw = stdout[0] if stdout else "0" + return int(raw) + + @handle_tool_errors_async async def wait_for_text( pattern: str, @@ -107,16 +164,61 @@ async def wait_for_text( timeout: float = 8.0, interval: float = 0.05, match_case: bool = False, - content_start: int | None = None, - content_end: int | None = None, socket_name: str | None = None, ctx: Context | None = None, ) -> WaitForTextResult: - """Wait for text to appear in a tmux pane. - - Polls the pane content at regular intervals until the pattern is found - or the timeout is reached. Use this instead of polling capture_pane - manually — it saves agent tokens and turns. + r"""Wait for NEW text to appear in a tmux pane. + + Polls the pane at regular intervals until ``pattern`` appears on a + line written *after* the call starts, or the timeout is reached. + Use this instead of polling :func:`capture_pane` manually — it + saves agent tokens and turns. + + **What "new" means.** At entry the tool snapshots two things: the + pane's absolute grid position (``history_size + cursor_y``) and the + contents of every row below the entry cursor. Each tick captures + the rows below the original baseline and discards any row whose + content matches the entry snapshot — those rows are stale paint + that pre-dates the wait, not output written after it. Scrollback + that was already present when the call began is ignored, and so + is paint-style content left below the cursor by TUI repaints, + ``paste-text``, or manual cursor positioning. For the synchronous + "is the pattern in the pane right now?" check, call + {tooliconl}`search-panes` instead. + + The content-delta filter has a rare false-negative case: if new + output happens to byte-match a row in the entry snapshot, that + new row is filtered out. The patterns agents typically wait on + (command-specific markers, full status strings) make this + collision unlikely in practice. For stricter "any change" + semantics, use {tooliconl}`wait-for-content-change`. + + In-place updates to the entry cursor's row — carriage-return + rewrites, progress spinners, single-line status updates — are + not observed; only rows below the entry cursor count as "new." + Use {tooliconl}`wait-for-content-change` or pair the command + with a sentinel for those cases. + + **Adversarial-safety pattern.** If you cannot trust that the + pattern only appears after your action — for example because the + pane prints recurring prompts, log lines, or output from background + processes you do not control — bracket your command with a unique + sentinel: ``cmd; echo __WAIT_$RANDOM__`` and wait for the sentinel + instead of ``cmd``'s natural output. tmux's grid model cannot + distinguish "your output" from "theirs"; the sentinel can. + + **When NOT to use this — sequential ``send_keys`` race.** If you + call ``send_keys`` and immediately ``wait_for_text``, fast output + (``echo``, prompt-return after ``^C``) can land *before* this tool + snapshots the baseline, and the match is then invisible to the + wait. The race is small but real on CI and over remote sockets. + For commands you author, prefer the channel pattern: append + ``; tmux wait-for -S `` to your ``send_keys`` payload and + call ``wait_for_channel`` instead. The ``run_and_wait`` prompt at + ``libtmux_mcp.prompts.recipes`` shows the safe composition. + Reserve ``wait_for_text`` for output you do not control + (third-party process logs, daemon prompts, interactive + supervisors). When a :class:`fastmcp.Context` is available, this tool emits periodic ``ctx.report_progress`` notifications so MCP clients can @@ -147,10 +249,6 @@ async def wait_for_text( Seconds between polls. Default 0.05 (50ms). match_case : bool Whether to match case. Default False (case-insensitive). - content_start : int, optional - Start line for capture. Negative values reach into scrollback. - content_end : int, optional - End line for capture. socket_name : str, optional tmux socket name. ctx : fastmcp.Context, optional @@ -164,6 +262,53 @@ async def wait_for_text( Notes ----- + **Scrollback rollover detection is partial.** The tool raises + ``ToolError`` when ``hsize`` shrinks below the entry value — which + catches ``clear-history`` and any rollover where the dip is + observable between polls. It does **not** reliably detect + ``grid_collect_history`` trim that fires during continuous output: + tmux trims (~10% of ``history-limit``) then immediately scrolls + new lines back, so sampled ``hsize`` can stay clamped at the cap + and never appear below entry. For deterministic command-completion + synchronization use ``wait_for_channel``; for observation flows + that approach ``history-limit``, the tool emits a runtime + ``ctx.warning`` notification when sampled state enters the + trim-risk band. + + Note that ``hsize`` also decrements on resize-grow when there is + scrolled history available (``screen.c`` ``screen_resize_y``), + but in that case the row data is not freed — only the + history/visible-region boundary moves and absolute indices stay + valid. The guard distinguishes the two cases by also requiring + ``pane_height`` to not have grown, so resize-grow continues + polling cleanly. + + **Wrapped lines are joined for matching.** Captures pass tmux's + ``-J`` flag so a pattern that spans the pane's visual wrap is + still matched against the joined logical line. The returned + ``matched_lines`` entry for such a hit is the joined line and + can therefore be longer than ``pane_width``. + + **In-place rewrites below the baseline.** Programs that paint + over rows the tool will capture — cursor-position escape + sequences, full-screen progress displays, anything that rewrites + rows it already wrote — can re-introduce text the caller saw + earlier. Each tick captures the current contents of rows below + the baseline; tmux's grid model cannot distinguish "fresh write" + from "repaint with the same characters." + ``screen_write_reverseindex`` (``screen-write.c``) only scrolls + the visible region within ``[rupper, rlower]`` and never touches + ``hsize``, so ``\\eM`` itself does not invalidate the anchor — + but the surrounding TUI render loop may. Full-screen TUIs + typically run on the alternate screen (a separate grid that + this tool does not traverse), so the main-screen pattern is + rare in practice. + + **``clear`` / ``reset``.** With the default ``scroll-on-clear`` + option, cleared content scrolls into history (``screen-write.c`` + ``screen_write_clearscreen``), so the baseline anchor is + unaffected. + **Safety tier.** Tagged ``readonly`` because the tool observes pane state without mutating it. Readonly clients may therefore block for the caller-supplied ``timeout`` (default 8 s, caller @@ -174,6 +319,16 @@ async def wait_for_text( calls. If you need to rate-limit wait tools, do it at the transport layer or with dedicated middleware. """ + if not pattern: + msg = "pattern must be a non-empty string" + raise ToolError(msg) + if interval < 0.01: + msg = f"interval must be at least 0.01 s (received {interval})" + raise ToolError(msg) + if timeout <= 0: + msg = f"timeout must be positive (received {timeout})" + raise ToolError(msg) + search_pattern = pattern if regex else re.escape(pattern) flags = 0 if match_case else re.IGNORECASE try: @@ -193,10 +348,49 @@ async def wait_for_text( ) assert pane.pane_id is not None - matched_lines: list[str] = [] + + # Anchor ``start_time`` before the baseline read so the elapsed + # time returned in ``WaitForTextResult.elapsed_seconds`` reflects + # total call duration, including the baseline read. The + # user-supplied ``timeout`` still cannot bound a stalled tmux + # command — libtmux's ``tmux_cmd`` uses ``Popen.communicate()`` + # with no subprocess timeout, so a hung tmux read can exceed the + # budget. The early anchor measures that blowout; it doesn't + # prevent it. start_time = time.monotonic() deadline = start_time + timeout + + # Snapshot the pane state before polling. ``hs0 + cy0`` is the + # absolute grid anchor — invariant under subsequent scrolling + # because tmux's ``-S`` is relative to the live ``hsize`` at + # capture time (cmd-capture-pane.c: ``top = gd->hsize + n``). + # ``pane_pid`` lets us detect a respawn-pane mid-wait that would + # otherwise leave the absolute anchor pointing at the old + # process's output. See issue #45. + entry = await asyncio.to_thread(_read_pane_state, pane) + baseline_abs = entry.history_size + entry.cursor_y + baseline_pid = entry.pane_pid + baseline_hlimit = await asyncio.to_thread(_read_history_limit, pane) + + # Snapshot rows below the entry cursor by content. The cursor anchor + # alone matches any row at start_line onward, which includes stale + # paint-style content (TUI repaints, paste-text, manual cursor + # positioning) that pre-dates the wait. Filtering per-tick captures + # against this set turns the cursor anchor into an honest "content + # written after entry" predicate. Stored as a frozenset for O(1) + # lookup against the typically small below-cursor row set. + entry_below_cursor: frozenset[str] = frozenset( + await asyncio.to_thread( + pane.capture_pane, + start=entry.cursor_y + 1, + end=None, + join_wrapped=True, + ) + ) + + matched_lines: list[str] = [] found = False + warned_risk_band = False try: while True: @@ -208,14 +402,104 @@ async def wait_for_text( message=f"Polling pane {pane.pane_id} for pattern", ) - # FastMCP direct-awaits async tools on the main event loop; the - # libtmux capture_pane call is a blocking subprocess.run. Push - # to the default executor so concurrent tool calls are not - # starved during long waits. - lines = await asyncio.to_thread( - pane.capture_pane, start=content_start, end=content_end - ) - hits = [line for line in lines if compiled.search(line)] + # FastMCP direct-awaits async tools on the main event loop; + # the libtmux display-message + capture_pane calls are both + # blocking subprocess.run. Push to the default executor so + # concurrent tool calls are not starved during long waits. + state = await asyncio.to_thread(_read_pane_state, pane) + if state.pane_dead: + msg = f"pane {pane.pane_id} died during wait" + raise ToolError(msg) + if state.pane_pid != baseline_pid: + msg = ( + f"pane {pane.pane_id} was respawned during wait " + f"(pid {baseline_pid} -> {state.pane_pid}); " + "baseline anchor no longer valid" + ) + raise ToolError(msg) + # When tmux's ``history-limit`` is reached, ``grid_collect_history`` + # (grid.c) frees the oldest scrollback rows and decrements + # ``gd->hsize``, so absolute index math anchored on + # ``history_size + cursor_y`` is no longer recoverable. The same + # hsize-decrement also fires on ``clear-history``. + # + # ``hsize`` ALSO decrements on resize-grow when ``hscrolled > 0`` + # (``screen.c`` ``screen_resize_y``: rows are pulled from history + # back into the visible region). In that case no row data is freed + # — only the hsize/visible-region partition shifts and absolute + # indices stay valid. Trim and resize-grow are distinguished by + # ``pane_height``: trim leaves it unchanged, resize-grow increases + # it. The conjunction below is the actual signature of row + # eviction; resize-grow falls through cleanly. + if ( + state.history_size < entry.history_size + and state.pane_height <= entry.pane_height + ): + msg = ( + f"pane {pane.pane_id} history shrank below entry " + f"baseline (history_size {entry.history_size} -> " + f"{state.history_size}); baseline anchor lost — " + "re-arm wait_for_text or use wait_for_channel for " + "deterministic synchronization" + ) + raise ToolError(msg) + # The shrink guard above catches clear-history and the + # entry-at-cap rollover edge. It does NOT catch + # grid_collect_history trim during continuous output, where + # hsize bounces between (hlimit - hlimit/10) and hlimit + # faster than we can poll. Emit a one-shot warning when + # sampled state is in the trim-risk band so agents + # subscribed to MCP log notifications know to verify + # results or switch to wait_for_channel. + if not warned_risk_band and baseline_hlimit > 0: + trim_batch = max(baseline_hlimit // 10, 1) + risk_floor = baseline_hlimit - trim_batch + if state.history_size >= risk_floor: + await _maybe_log( + ctx, + level="warning", + message=( + f"pane {pane.pane_id} is polling in the " + "history-limit trim-risk band " + f"(history_size {state.history_size} / " + f"history_limit {baseline_hlimit}); " + "wait_for_text correctness is best-effort " + "here. For deterministic synchronization " + "use wait_for_channel." + ), + ) + warned_risk_band = True + # ``+ 1`` skips the baseline line itself so we don't + # re-match the row the cursor sat on at entry. + start_line = baseline_abs - state.history_size + 1 + # ``capture-pane -S`` clips a below-visible start back to the + # bottom row (cmd-capture-pane.c, post-tmux-3.0), so a naive + # capture would return stale bottom-row text whenever no new rows + # have appeared below the cursor yet. Compare against + # ``state.pane_height`` (re-read each tick) so a resize mid-wait + # doesn't leave the guard keyed to a stale height. + if start_line >= state.pane_height: + lines: list[str] = [] + else: + # ``join_wrapped=True`` adds tmux's ``-J`` so visually + # wrapped lines are returned as one logical line. Without + # this, a pattern that spans tmux's wrap column is split + # across two rows and ``re.search`` against each row in + # isolation never matches. Trade-off: the returned + # ``matched_lines`` can contain a single string longer + # than ``pane_width``. + lines = await asyncio.to_thread( + pane.capture_pane, + start=start_line, + end=None, + join_wrapped=True, + ) + # Filter out lines whose content was already below the + # entry cursor — those are stale paint, not output written + # after the call began. Then run the regex against the + # truly-new lines. + new_lines = [line for line in lines if line not in entry_below_cursor] + hits = [line for line in new_lines if compiled.search(line)] if hits: matched_lines.extend(hits) found = True @@ -250,7 +534,6 @@ async def wait_for_text( matched_lines=matched_lines, pane_id=pane.pane_id, elapsed_seconds=round(elapsed, 3), - timed_out=not found, ) @@ -272,6 +555,11 @@ async def wait_for_content_change( what the output will be — it waits for "something happened" rather than a specific pattern. + Unlike ``wait_for_text``, this tool does not raise ``ToolError`` on + pane respawn, pane death, or ``clear-history`` mid-wait — those events + surface as ``changed=True`` returns instead. For correctness-sensitive + flows prefer ``wait_for_channel`` composed with ``tmux wait-for -S``. + Emits :meth:`fastmcp.Context.report_progress` each tick when a Context is injected, so clients can render a progress indicator during the wait. @@ -369,5 +657,4 @@ async def wait_for_content_change( changed=changed, pane_id=pane.pane_id, elapsed_seconds=round(elapsed, 3), - timed_out=not changed, ) diff --git a/src/libtmux_mcp/tools/wait_for_tools.py b/src/libtmux_mcp/tools/wait_for_tools.py index f1a7841c..cd7d0511 100644 --- a/src/libtmux_mcp/tools/wait_for_tools.py +++ b/src/libtmux_mcp/tools/wait_for_tools.py @@ -17,9 +17,15 @@ timeout and wraps the underlying ``subprocess.run`` call in ``timeout=timeout``. Agents SHOULD use the safe composition pattern:: - send_keys("pytest; status=$?; tmux wait-for -S tests_done; exit $status") - -This ensures the signal fires on both success and failure paths. + send_keys("pytest; tmux wait-for -S tests_done") + +Shell ``;`` semantics fire ``wait-for -S`` whether ``pytest`` succeeded +or failed, so the edge-triggered signal never deadlocks the wait. Do +NOT chain ``exit $status`` after the signal — in interactive shells +that exits the shell itself, which destroys single-pane sessions and +takes the tmux server down with them. Exit-status preservation in +interactive shells is out-of-scope; inspect the captured output for +command-specific success markers. """ from __future__ import annotations @@ -109,15 +115,18 @@ async def wait_for_channel( milestones into explicit synchronisation points:: send_keys( - "pytest; status=$?; tmux wait-for -S tests_done; exit $status", + "pytest; tmux wait-for -S tests_done", pane_id=..., ) wait_for_channel("tests_done", timeout=60) - The ``status=$?; ...; exit $status`` idiom is important: ``wait-for`` - is edge-triggered, so if the shell command crashes before issuing - the signal the wait will block until ``timeout``. Emitting the - signal unconditionally (success or failure) avoids that penalty. + Shell ``;`` semantics fire ``wait-for -S`` whether the command + succeeded or failed, so the edge-triggered signal never deadlocks + on a crash. Do NOT chain ``exit $status`` after the signal — in an + interactive shell that exits the shell itself, which destroys + single-pane sessions. Exit-status preservation in interactive + shells is out-of-scope; inspect the captured output for + command-specific success markers. Parameters ---------- diff --git a/tests/test_pane_tools.py b/tests/test_pane_tools.py index 18d395f5..e7dba3fd 100644 --- a/tests/test_pane_tools.py +++ b/tests/test_pane_tools.py @@ -55,6 +55,22 @@ def test_send_keys(mcp_server: Server, mcp_pane: Pane) -> None: assert "sent" in result.lower() +def test_send_keys_docstring_cross_links_wait_for_channel() -> None: + """``send_keys`` docstring steers agents at ``wait_for_channel`` first. + + Agents read tool descriptions when picking a synchronization primitive. + After the baseline-anchor design landed, ``send_keys`` → + ``wait_for_text`` can race for fast commands (the baseline locks after + the keys are buffered), and the channel pattern is strictly cheaper + for command completion. The docstring must therefore mention both + ``wait_for_channel`` and ``run_and_wait`` so the agent can find the + safe pattern without a separate docs lookup. + """ + assert send_keys.__doc__ is not None + assert "wait_for_channel" in send_keys.__doc__ + assert "run_and_wait" in send_keys.__doc__ + + def test_capture_pane(mcp_server: Server, mcp_pane: Pane) -> None: """capture_pane returns pane content.""" result = capture_pane( @@ -1128,23 +1144,35 @@ class WaitForTextFixture(t.NamedTuple): """Test fixture for wait_for_text.""" test_id: str - command: str | None + #: Command sent BEFORE ``wait_for_text`` is called. Its output is + #: expected to be present in the pane scrollback (and therefore + #: above the baseline) by the time the wait begins. Used to verify + #: that stale scrollback no longer matches (#45). The positive + #: "text appears after baseline" case lives in + #: ``test_wait_for_text_matches_new_output_after_baseline`` rather + #: than this fixture because it needs ``asyncio.create_task`` plus + #: a sequenced ``await`` to coordinate emission against the running + #: poll loop — synchronous setup races the shell's enter-processing + #: on CI and shifts the baseline past single-line output. + pre_command: str | None pattern: str timeout: float expected_found: bool WAIT_FOR_TEXT_FIXTURES: list[WaitForTextFixture] = [ + # Regression for #45: pre-existing scrollback must NOT match. WaitForTextFixture( - test_id="text_found", - command="echo WAIT_MARKER_abc123", - pattern="WAIT_MARKER_abc123", - timeout=2.0, - expected_found=True, + test_id="stale_scrollback_does_not_match", + pre_command="echo WAIT_MARKER_stale", + pattern="WAIT_MARKER_stale", + timeout=0.5, + expected_found=False, ), + # Genuinely absent pattern still times out cleanly. WaitForTextFixture( test_id="timeout_not_found", - command=None, + pre_command=None, pattern="NEVER_EXISTS_xyz999", timeout=0.3, expected_found=False, @@ -1161,7 +1189,7 @@ def test_wait_for_text( mcp_server: Server, mcp_pane: Pane, test_id: str, - command: str | None, + pre_command: str | None, pattern: str, timeout: float, expected_found: bool, @@ -1169,8 +1197,48 @@ def test_wait_for_text( """wait_for_text polls pane content for a pattern.""" import asyncio - if command is not None: - mcp_pane.send_keys(command, enter=True) + if pre_command is not None: + mcp_pane.send_keys(pre_command, enter=True) + # Wait until the pane has fully settled before measuring the + # baseline. "Settled" means: + # + # (a) the OUTPUT line is present — ``line.strip() == pattern``, + # distinguishing the shell's actual output from the typed + # echo line that contains ``pattern`` as a substring (and + # which would otherwise trip a naive ``pattern in capture`` + # predicate while keys are still buffered pre-enter), and + # (b) ``(history_size, cursor_y)`` is unchanged across two + # consecutive polls — zsh prints async prompt-redraw + # lines (vcs_info, precmd hooks) some milliseconds after + # the initial prompt, and those redraws keep growing + # hsize *during* ``wait_for_text``'s window, pulling + # pre-baseline rows back into the visible-relative + # ``start_line`` capture. Waiting them out anchors the + # baseline below all async output. + # + # A fixed ``time.sleep`` would do the same job but couples the + # test to a wall-clock value (the project's idiom for + # tmux-state waits is ``retry_until`` — used throughout this + # file). + last_state: tuple[int, int] = (-1, -1) + + def _stale_settled() -> bool: + nonlocal last_state + raw = mcp_pane.cmd( + "display-message", "-p", "#{history_size}:#{cursor_y}" + ).stdout + if not raw: + return False + hs_str, cy_str = raw[0].split(":", 1) + state = (int(hs_str), int(cy_str)) + has_output_line = any( + line.strip() == pattern for line in mcp_pane.capture_pane() + ) + settled = state == last_state and has_output_line + last_state = state + return settled + + retry_until(_stale_settled, 5, raises=True) result = asyncio.run( wait_for_text( @@ -1182,7 +1250,6 @@ def test_wait_for_text( ) assert isinstance(result, WaitForTextResult) assert result.found is expected_found - assert result.timed_out is (not expected_found) assert result.pane_id == mcp_pane.pane_id assert result.elapsed_seconds >= 0 @@ -1190,6 +1257,149 @@ def test_wait_for_text( assert len(result.matched_lines) >= 1 +def test_wait_for_text_matches_new_output_after_baseline( + mcp_server: Server, mcp_pane: Pane +) -> None: + """wait_for_text finds output written AFTER its baseline snapshot. + + Coordinates the marker emission against the running poll loop by + starting :func:`wait_for_text` via :func:`asyncio.create_task`, + then ``await``-ing the emit coroutine, then ``await``-ing the + wait task. Sequencing matters: the explicit start-then-emit + ordering guarantees ``send_keys`` fires *after* the baseline + read; :func:`asyncio.gather` would schedule both concurrently + and lose that guarantee. Without coordination the test races + the shell's enter-processing — if the shell advances the cursor + before the baseline read on CI, ``start_line`` shifts past the + single-line marker and the poll loop misses it. + """ + import asyncio + + async def emit_after_baseline() -> None: + # The baseline read is a single display-message round trip + # (<5 ms in practice); 0.2 s gives wait_for_text plenty of + # headroom to lock the baseline before the marker fires. + await asyncio.sleep(0.2) + await asyncio.to_thread(mcp_pane.send_keys, "echo WAIT_MARKER_after", True) + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="WAIT_MARKER_after", + pane_id=mcp_pane.pane_id, + timeout=3.0, + socket_name=mcp_server.socket_name, + ) + ) + await emit_after_baseline() + return await wait_task + + result = asyncio.run(run()) + assert result.found is True + assert any("WAIT_MARKER_after" in line for line in result.matched_lines) + + +def test_wait_for_text_ignores_stale_below_cursor( + mcp_server: Server, mcp_pane: Pane +) -> None: + """Stale paint-style content below the cursor must not match. + + The cursor-position anchor (``start_line = cy0 + 1``) captures + rows below the entry cursor — which can include content that + pre-dates the wait (TUI repaints, ``paste-text``, manual cursor + positioning). The entry-time content snapshot filters those rows + out so only content written after entry matches the regex. + + Setup parks the cursor at row 0 with ``STALE_BELOW`` painted on + row 1, then waits for a pattern that's already on screen. The + snapshot filter must drop the row before the regex sees it. + """ + import asyncio + + # Print STALE_BELOW, then move the cursor back to the top-left so + # row 1 holds stale content that wait_for_text would otherwise + # match on the first poll. The trailing sleep keeps the pane state + # frozen for the wait's duration. Double-quote the sh -c argument + # so the inner single-quoted printf format strings don't break the + # outer quoting. + paint_and_park = ( + "printf 'TOP\\nSTALE_BELOW\\n'; " # write 2 rows; cursor lands on row 2 + "printf '\\033[H'; " # ESC[H = move cursor to (0,0) + "sleep 60" + ) + mcp_pane.respawn(kill=True, shell=f'sh -c "{paint_and_park}"') + + def _staged() -> bool: + return any("STALE_BELOW" in line for line in mcp_pane.capture_pane()) + + retry_until(_staged, 5, raises=True) + + result = asyncio.run( + wait_for_text( + pattern="STALE_BELOW", + pane_id=mcp_pane.pane_id, + timeout=0.5, + socket_name=mcp_server.socket_name, + ) + ) + assert result.found is False + + +def test_wait_for_text_does_not_match_bottom_row_clip( + mcp_server: Server, mcp_pane: Pane +) -> None: + """wait_for_text must not match stale text sitting on the cursor row. + + When the cursor is at the last visible row at entry, + ``start_line = cy0 + 1`` points below the visible region and + tmux's ``capture-pane -S`` clips back to the bottom row + (``cmd-capture-pane.c``). Without the bottom-aware guard the + poll loop captures the stale cursor-row text and matches it + instantly. + + The pane is respawned with a shell-free ``sh -c`` command that + prints the marker without a trailing newline and then sleeps — + so ``hsize`` and ``cursor_y`` stay frozen for the duration of + the wait. Running this with zsh in the loop produced a + multi-line history burst on shell exit / exec that lowered + ``start_line`` below ``pane_height`` and disengaged the guard. + """ + import asyncio + + # Replace the default shell with a single sh invocation: emit + # filler rows to push the cursor to the bottom of the visible + # region, print the marker without a trailing newline so it + # stays on the cursor row, then sleep so nothing else scrolls + # into history. Fixture teardown kills the pane (and the sleep) + # at test exit. + fill_and_park = ( + "for i in $(seq 1 30); do echo filler; done; " + "printf STALE_BOTTOM_MARKER; sleep 60" + ) + mcp_pane.respawn(kill=True, shell=f"sh -c '{fill_and_park}'") + + def _bottom_row_ready() -> bool: + state = mcp_pane.display_message("#{pane_height}:#{cursor_y}", get_text=True) + if not state: + return False + sy_str, cy_str = state[0].split(":", 1) + if int(cy_str) != int(sy_str) - 1: + return False + return any("STALE_BOTTOM_MARKER" in line for line in mcp_pane.capture_pane()) + + retry_until(_bottom_row_ready, 5, raises=True) + + result = asyncio.run( + wait_for_text( + pattern="STALE_BOTTOM_MARKER", + pane_id=mcp_pane.pane_id, + timeout=0.5, + socket_name=mcp_server.socket_name, + ) + ) + assert result.found is False + + def test_wait_for_text_invalid_regex(mcp_server: Server, mcp_pane: Pane) -> None: """wait_for_text raises ToolError on invalid regex when regex=True.""" import asyncio @@ -1205,6 +1415,370 @@ def test_wait_for_text_invalid_regex(mcp_server: Server, mcp_pane: Pane) -> None ) +def test_wait_for_text_rejects_empty_pattern( + mcp_server: Server, mcp_pane: Pane +) -> None: + """An empty pattern matches every line and returns found=True instantly. + + ``re.compile('')`` succeeds and ``re.search`` reports a zero-width + match on every string, so the first poll would return + ``found=True`` against whatever was in the pane. Reject explicitly. + """ + import asyncio + + with pytest.raises(ToolError, match="pattern must be a non-empty string"): + asyncio.run( + wait_for_text( + pattern="", + pane_id=mcp_pane.pane_id, + socket_name=mcp_server.socket_name, + ) + ) + + +def test_wait_for_text_rejects_tiny_interval( + mcp_server: Server, mcp_pane: Pane +) -> None: + """A sub-10ms interval lets the poll loop saturate the tmux server. + + ``asyncio.sleep(0)`` yields but does not idle, so an unguarded + ``interval=0`` fires tmux subprocesses as fast as the scheduler + hands them out — a self-inflicted server-side DoS. + """ + import asyncio + + with pytest.raises(ToolError, match=r"interval must be at least 0\.01"): + asyncio.run( + wait_for_text( + pattern="anything", + pane_id=mcp_pane.pane_id, + interval=0, + socket_name=mcp_server.socket_name, + ) + ) + + +def test_wait_for_text_raises_on_pane_respawn( + mcp_server: Server, mcp_pane: Pane +) -> None: + """Respawning the pane mid-wait invalidates the baseline anchor. + + The baseline absolute index is computed against the original + pane process's grid. ``respawn-pane`` clears the visible region + but preserves ``hsize`` (``screen_reinit``), so the math keeps + pointing at the *old* process's content — silently miscapturing. + ``wait_for_text`` detects the ``pane_pid`` change and surfaces + it as a ToolError instead. + """ + import asyncio + + async def respawn_after_delay() -> None: + # Let wait_for_text capture its baseline first, then swap + # the pane process so pane_pid changes. + await asyncio.sleep(0.1) + await asyncio.to_thread(mcp_pane.respawn, kill=True, shell="sleep 30") + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="NEVER_APPEARS_xyz", + pane_id=mcp_pane.pane_id, + timeout=3.0, + socket_name=mcp_server.socket_name, + ) + ) + await respawn_after_delay() + return await wait_task + + with pytest.raises(ToolError, match="respawned during wait"): + asyncio.run(run()) + + +def test_wait_for_text_raises_on_pane_death(mcp_server: Server, mcp_pane: Pane) -> None: + """A pane whose process has exited surfaces as a ToolError. + + With ``remain-on-exit`` set, tmux keeps the pane alive after its + child exits and reports ``#{pane_dead}=1``. The wait loop checks + that flag every tick and bails with a ToolError instead of + polling stale content until timeout. + """ + import asyncio + + mcp_pane.window.set_option("remain-on-exit", "on") + mcp_pane.respawn(kill=True, shell="true") + + def _is_dead() -> bool: + flag = mcp_pane.display_message("#{pane_dead}", get_text=True) + return bool(flag) and flag[0] == "1" + + retry_until(_is_dead, 3, raises=True) + + with pytest.raises(ToolError, match="died during wait"): + asyncio.run( + wait_for_text( + pattern="anything", + pane_id=mcp_pane.pane_id, + timeout=1.0, + socket_name=mcp_server.socket_name, + ) + ) + + +def test_wait_for_text_rejects_non_positive_timeout( + mcp_server: Server, mcp_pane: Pane +) -> None: + """A non-positive timeout is ambiguous; reject rather than guess. + + The loop body runs one probe before the deadline check, so + ``timeout=0`` would complete a single synchronous capture in a + "wait" tool — surprising. Reject explicitly so callers pick a + meaningful budget. + """ + import asyncio + + with pytest.raises(ToolError, match="timeout must be positive"): + asyncio.run( + wait_for_text( + pattern="anything", + pane_id=mcp_pane.pane_id, + timeout=0, + socket_name=mcp_server.socket_name, + ) + ) + + +def test_wait_for_text_raises_when_history_is_cleared( + mcp_server: Server, mcp_pane: Pane +) -> None: + """``clear-history`` during a wait drops ``hsize`` to 0, tripping the guard. + + Pre-fills scrollback, starts the wait, then runs ``clear-history`` + on the pane. tmux's ``grid_clear_history`` sets ``gd->hsize = 0`` + synchronously, so the next poll sees ``state.history_size < + entry.history_size`` and raises ``ToolError``. + """ + import asyncio + + mcp_pane.send_keys("for i in $(seq 1 100); do echo prefill$i; done", enter=True) + + def _prefilled() -> bool: + hs = mcp_pane.display_message("#{history_size}", get_text=True) + return bool(hs) and int(hs[0]) >= 50 + + retry_until(_prefilled, 5, raises=True) + + async def clear_after_delay() -> None: + # Let wait_for_text snapshot the baseline first, then drop + # hsize to 0 with clear-history. + await asyncio.sleep(0.1) + await asyncio.to_thread(mcp_pane.cmd, "clear-history") + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="NEVER_APPEARS_rollover", + pane_id=mcp_pane.pane_id, + timeout=3.0, + socket_name=mcp_server.socket_name, + ) + ) + await clear_after_delay() + return await wait_task + + with pytest.raises(ToolError, match="history shrank below entry baseline"): + asyncio.run(run()) + + +def test_wait_for_text_succeeds_when_history_grows_normally( + mcp_server: Server, mcp_pane: Pane +) -> None: + """Monotonic history growth without trim does NOT trip the rollover guard. + + The guard fires only when ``state.history_size < entry.history_size``. + Many lines scrolling into a generous ``history-limit`` keep hsize + monotonically increasing, so a long-output command followed by a + sentinel marker must still match cleanly. + """ + import asyncio + + async def emit_after_baseline() -> None: + await asyncio.sleep(0.1) + cmd = "for i in $(seq 1 50); do echo line$i; done; echo WAIT_MARKER_grows_ok" + await asyncio.to_thread(mcp_pane.send_keys, cmd, True) + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="WAIT_MARKER_grows_ok", + pane_id=mcp_pane.pane_id, + timeout=3.0, + socket_name=mcp_server.socket_name, + ) + ) + await emit_after_baseline() + return await wait_task + + result = asyncio.run(run()) + assert result.found is True + + +def test_wait_for_text_survives_resize_grow_with_scrolled_history( + mcp_server: Server, mcp_pane: Pane +) -> None: + """Resize-grow that pulls lines from history must NOT trip the rollover guard. + + tmux's ``screen_resize_y`` decrements ``gd->hsize`` on a vertical + grow when ``hscrolled > 0`` — rows from history are pulled back + into the visible region. The rows themselves are NOT freed; only + the history/visible-region boundary shifts and absolute indices + stay valid. The guard's conjunction with ``pane_height <= + entry.pane_height`` exempts this case, because resize-grow also + increases ``pane_height``. + """ + import asyncio + + # Pre-fill scrollback so hscrolled > 0 — rows must have already + # scrolled past the visible region for screen_resize_y to have + # anything to pull back on grow. + mcp_pane.send_keys("for i in $(seq 1 100); do echo prefill$i; done", enter=True) + + def _prefilled() -> bool: + hs = mcp_pane.display_message("#{history_size}", get_text=True) + return bool(hs) and int(hs[0]) >= 50 + + retry_until(_prefilled, 5, raises=True) + + # Read current pane height; we'll grow past it during the wait. + height_raw = mcp_pane.display_message("#{pane_height}", get_text=True) + assert height_raw is not None + current_height = int(height_raw[0]) + target_height = current_height + 3 + + async def grow_after_delay() -> None: + # Let wait_for_text snapshot the baseline first, then grow + # the window vertically. screen_resize_y pulls rows from + # history back into view, decrementing hsize. + await asyncio.sleep(0.1) + await asyncio.to_thread( + mcp_pane.window.cmd, + "resize-window", + "-y", + str(target_height), + ) + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="NEVER_APPEARS_resize_grow", + pane_id=mcp_pane.pane_id, + timeout=1.0, + socket_name=mcp_server.socket_name, + ) + ) + await grow_after_delay() + return await wait_task + + # The wait must complete cleanly via timeout — NOT a ToolError. + result = asyncio.run(run()) + assert result.found is False + + +def test_wait_for_text_handles_resize_during_wait( + mcp_server: Server, mcp_pane: Pane +) -> None: + """Mid-wait resize keys the bottom-row clip to the LIVE pane height. + + Without the ``state.pane_height`` fix, the bottom-row clip guard + stays keyed to the entry-time pane height. Shrinking the pane + mid-wait would then leave the guard too lax — the capture would + fire past the new bottom and tmux's ``-S`` clip would return stale + bottom-row text. The fix re-reads ``pane_height`` each tick so the + guard matches the current visible region. + """ + import asyncio + + # Park a stale marker on the last visible row and freeze output. + # Same parking shape as test_wait_for_text_does_not_match_bottom_row_clip. + fill_and_park = ( + "for i in $(seq 1 30); do echo filler; done; " + "printf STALE_RESIZE_MARKER; sleep 60" + ) + mcp_pane.respawn(kill=True, shell=f"sh -c '{fill_and_park}'") + + def _ready() -> bool: + return any("STALE_RESIZE_MARKER" in line for line in mcp_pane.capture_pane()) + + retry_until(_ready, 5, raises=True) + + async def resize_after_delay() -> None: + await asyncio.sleep(0.1) + await asyncio.to_thread(mcp_pane.cmd, "resize-pane", "-y", "5") + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern="STALE_RESIZE_MARKER", + pane_id=mcp_pane.pane_id, + timeout=0.5, + socket_name=mcp_server.socket_name, + ) + ) + await resize_after_delay() + return await wait_task + + result = asyncio.run(run()) + assert result.found is False + + +def test_wait_for_text_matches_pattern_across_wrap( + mcp_server: Server, mcp_pane: Pane +) -> None: + """A pattern that spans tmux's visual wrap matches via ``-J``. + + The poll loop passes ``join_wrapped=True`` to ``capture-pane`` so a + pattern that crosses the wrap boundary is matched against the + joined logical line. Without that flag, each visual row is its own + string and a regex against any one row never sees the full marker. + + The command line is composed of three ``printf`` calls so the + echoed command text does NOT contain the marker as a literal + substring — only the produced output (after the three pieces + concatenate on stdout) does. + """ + import asyncio + + width_raw = mcp_pane.display_message("#{pane_width}", get_text=True) + assert width_raw is not None + pane_width = int(width_raw[0]) + + filler_len = max(1, pane_width - 5) + payload = ( + f"printf 'x%.0s' $(seq 1 {filler_len}); " + "printf 'WRA'; printf 'PPED_MARKER'; printf '_xyz'; echo" + ) + marker = "WRAPPED_MARKER_xyz" + + async def emit_after_baseline() -> None: + await asyncio.sleep(0.2) + await asyncio.to_thread(mcp_pane.send_keys, payload, True) + + async def run() -> WaitForTextResult: + wait_task = asyncio.create_task( + wait_for_text( + pattern=marker, + pane_id=mcp_pane.pane_id, + timeout=3.0, + socket_name=mcp_server.socket_name, + ) + ) + await emit_after_baseline() + return await wait_task + + result = asyncio.run(run()) + assert result.found is True + assert any(marker in line for line in result.matched_lines) + + def test_wait_for_text_reports_progress(mcp_server: Server, mcp_pane: Pane) -> None: """wait_for_text calls ``ctx.report_progress`` at each poll tick. @@ -1242,7 +1816,6 @@ async def warning(self, message: str) -> None: ) ) assert result.found is False - assert result.timed_out is True assert len(progress_calls) >= 2 first_progress, first_total, first_msg = progress_calls[0] assert first_progress >= 0.0 @@ -1335,7 +1908,6 @@ async def warning(self, message: str) -> None: ) ) assert result.found is False - assert result.timed_out is True def test_wait_for_text_warns_on_invalid_regex( @@ -1390,9 +1962,9 @@ def test_wait_for_text_warns_on_timeout(mcp_server: Server, mcp_pane: Pane) -> N Sibling guard to the invalid-regex warning. The timeout case is where operators most need a structured signal — the tool returns - ``timed_out=True`` in the result but agents and human log readers - have to dig into the ``WaitForTextResult`` to notice. The warning - surfaces it directly. + ``found=False`` but agents and human log readers have to dig into + the ``WaitForTextResult`` to notice. The warning surfaces it + directly. """ import asyncio @@ -1421,12 +1993,156 @@ async def warning(self, message: str) -> None: ) ) - assert result.timed_out is True + assert result.found is False assert any( level == "warning" and "timeout" in msg.lower() for level, msg in log_calls ), f"expected a timeout warning, got: {log_calls}" +def test_wait_for_text_warns_in_history_limit_risk_band( + mcp_server: Server, mcp_pane: Pane +) -> None: + """``wait_for_text`` emits a warning when polling near ``history-limit``. + + With a small ``history-limit`` and a burst of output that forces + ``grid_collect_history`` to fire repeatedly, sampled ``history_size`` + enters the trim-risk band (top 10% of ``history_limit``). The wait's + strict-shrink predicate cannot see those trims (hsize stays clamped + at the cap), so the tool emits a one-shot ``ctx.warning`` notification + so MCP clients can decide whether to keep waiting, retry, or switch + to ``wait_for_channel``. + + The wait's ``found`` result is intentionally not asserted — once + polling enters the risk band, correctness is best-effort. The test + pins the warning contract (what the tool guarantees), not the + match contract (what tmux's grid model fundamentally can't). + """ + import asyncio + + # ``history-limit`` is session-scope and the effective per-pane value + # is locked in at pane creation. Set the option globally, then split a + # fresh pane that inherits the small limit. The mcp_pane fixture's + # original pane keeps its larger limit and is unaffected. + mcp_pane.session.cmd("set-option", "-g", "history-limit", "50") + fresh_pane = mcp_pane.window.split() + assert fresh_pane.pane_id is not None + + def _hlimit_locked() -> bool: + hl = fresh_pane.display_message("#{history_limit}", get_text=True) + return bool(hl) and int(hl[0]) == 50 + + retry_until(_hlimit_locked, 5, raises=True) + + log_calls: list[tuple[str, str]] = [] + + class _RecordingContext: + async def report_progress( + self, + progress: float, + total: float | None = None, + message: str = "", + ) -> None: + return + + async def warning(self, message: str) -> None: + log_calls.append(("warning", message)) + + async def burst_after_delay() -> None: + await asyncio.sleep(0.1) + await asyncio.to_thread( + fresh_pane.send_keys, + "for i in $(seq 1 200); do echo burst$i; done", + True, + ) + + async def run() -> None: + wait_task = asyncio.create_task( + wait_for_text( + pattern="WILL_NEVER_MATCH_riskband_qZ9", + pane_id=fresh_pane.pane_id, + timeout=2.0, + interval=0.05, + socket_name=mcp_server.socket_name, + ctx=t.cast("t.Any", _RecordingContext()), + ) + ) + await burst_after_delay() + try: + await wait_task + except ToolError: + # The strict-shrink guard may or may not fire depending on + # whether the dip is observable between polls. Either way, + # we only assert the warning contract, not the result type. + return + + asyncio.run(run()) + + assert any( + level == "warning" and "trim-risk band" in msg for level, msg in log_calls + ), f"expected a trim-risk-band warning, got: {log_calls}" + + +def test_wait_for_text_warns_when_already_in_risk_band( + mcp_server: Server, mcp_pane: Pane +) -> None: + """``wait_for_text`` warns immediately if entry is already in the risk band. + + Unlike ``test_wait_for_text_warns_in_history_limit_risk_band`` which + advances into the band, this covers the case where the pane is + already near ``history-limit`` at entry. Without output (idle wait), + the simplified predicate (no ``advanced`` gate) must still fire the + one-shot warning. + """ + import asyncio + + mcp_pane.session.cmd("set-option", "-g", "history-limit", "50") + fresh_pane = mcp_pane.window.split() + assert fresh_pane.pane_id is not None + + def _hlimit_locked() -> bool: + hl = fresh_pane.display_message("#{history_limit}", get_text=True) + return bool(hl) and int(hl[0]) == 50 + + retry_until(_hlimit_locked, 5, raises=True) + + # history-limit is 50. Risk floor (top 10%) is 45. + # Print 100 lines to ensure hsize reaches the cap (50). + fresh_pane.send_keys("for i in $(seq 1 100); do echo line$i; done", True) + + def _prefilled() -> bool: + hs = fresh_pane.display_message("#{history_size}", get_text=True) + # We need it to be in the risk band (>= 45). + return bool(hs) and int(hs[0]) >= 45 + + retry_until(_prefilled, 10, raises=True) + + log_calls: list[tuple[str, str]] = [] + + class _RecordingContext: + async def report_progress(self, *args: t.Any, **kwargs: t.Any) -> None: + return + + async def warning(self, message: str) -> None: + log_calls.append(("warning", message)) + + async def run() -> None: + # Idle wait: no new output, no cursor movement. + await wait_for_text( + pattern="NEVER_MATCH_idle_risk", + pane_id=fresh_pane.pane_id, + timeout=0.5, + interval=0.1, + socket_name=mcp_server.socket_name, + ctx=t.cast("t.Any", _RecordingContext()), + ) + + asyncio.run(run()) + + assert any( + level == "warning" and "trim-risk band" in msg for level, msg in log_calls + ), f"expected a trim-risk-band warning during idle wait, got: {log_calls}" + + def test_wait_for_content_change_warns_on_timeout( mcp_server: Server, mcp_pane: Pane ) -> None: @@ -1485,7 +2201,7 @@ async def warning(self, message: str) -> None: ctx=t.cast("t.Any", _RecordingContext()), ) ) - assert result.timed_out is True + assert result.changed is False assert any( level == "warning" and "timeout" in msg.lower() for level, msg in log_calls ), f"expected a timeout warning, got: {log_calls}" @@ -1815,7 +2531,6 @@ def _send_later() -> None: thread.join() assert isinstance(result, ContentChangeResult) assert result.changed is True - assert result.timed_out is False assert result.elapsed_seconds > 0 @@ -1828,7 +2543,7 @@ def test_wait_for_content_change_timeout(mcp_server: Server, mcp_pane: Pane) -> machines the shell prompt can take well over 500 ms to fully render (cursor blink, zsh right-prompt, git status async hooks) and would otherwise be observed as pane-content change during the test window, - failing ``timed_out=True`` spuriously under ``--reruns=0``. + failing ``changed=True`` spuriously under ``--reruns=0``. """ import time @@ -1867,7 +2582,6 @@ def test_wait_for_content_change_timeout(mcp_server: Server, mcp_pane: Pane) -> ) assert isinstance(result, ContentChangeResult) assert result.changed is False - assert result.timed_out is True # --------------------------------------------------------------------------- diff --git a/tests/test_prompts.py b/tests/test_prompts.py index 99e32278..253d6a21 100644 --- a/tests/test_prompts.py +++ b/tests/test_prompts.py @@ -50,14 +50,29 @@ def test_prompts_as_tools_enabled_by_env( def test_run_and_wait_returns_string_template() -> None: - """``run_and_wait`` prompt produces a string with the safe idiom.""" + """``run_and_wait`` prompt produces a string with the safe idiom. + + The rendered payload must NOT contain ``exit`` in the shell command + portion: an interactive-shell ``exit`` after the signal kills the + parent shell, which destroys single-pane tmux sessions. The signal + fires unconditionally via shell ``;`` semantics whether the command + succeeds or fails — the wait-for primitive doesn't need an exit to + preserve safety. + """ from libtmux_mcp.prompts.recipes import run_and_wait text = run_and_wait(command="pytest", pane_id="%1", timeout=30.0) assert "tmux wait-for -S libtmux_mcp_wait_" in text assert "wait_for_channel" in text - # Exit-status preservation is the whole point — pin it. - assert "exit $__mcp_status" in text + # The shell payload (between `keys=` and the closing quote) must + # not append ``exit`` after the wait-for — that would kill the + # parent shell in an interactive pane. Check the rendered keys= + # line for the absence of the exit suffix. + keys_line = next( + line for line in text.splitlines() if line.strip().startswith("keys=") + ) + assert "; exit" not in keys_line + assert "exit $" not in keys_line def test_run_and_wait_channel_is_uuid_scoped() -> None: diff --git a/tests/test_server.py b/tests/test_server.py index de61814a..65db7d1d 100644 --- a/tests/test_server.py +++ b/tests/test_server.py @@ -151,14 +151,23 @@ def test_base_instructions_surface_flagship_read_tools() -> None: def test_base_instructions_prefer_wait_over_poll() -> None: - """_BASE_INSTRUCTIONS names wait_for_text and wait_for_content_change. - - The wait tools block server-side, which is dramatically cheaper in - agent turns than ``capture_pane`` in a retry loop. Making them - discoverable from the instructions is a no-cost UX win. + """_BASE_INSTRUCTIONS names the wait family with the right primacy. + + ``wait_for_channel`` is the deterministic primitive (composes + ``tmux wait-for -S``) and should appear first; ``wait_for_text`` + and ``wait_for_content_change`` are the fallbacks for output the + agent doesn't author. Making the channel primitive discoverable + from the instructions steers agents off the polling-scraper path + for command-completion synchronization. """ + assert "wait_for_channel" in _BASE_INSTRUCTIONS assert "wait_for_text" in _BASE_INSTRUCTIONS assert "wait_for_content_change" in _BASE_INSTRUCTIONS + # The channel primitive should be named before the fallbacks so an + # agent that scans top-to-bottom encounters the cheaper option first. + assert _BASE_INSTRUCTIONS.index("wait_for_channel") < _BASE_INSTRUCTIONS.index( + "wait_for_text" + ) def test_base_instructions_document_hook_boundary() -> None: