Skip to content

wait_for_text: adaptive backoff to reduce subprocess pressure on parallel waits #52

@tony

Description

@tony

Type: performance · Tier: deferred · Tool: wait_for_text

What's happening

Default interval=0.05 means ~20 polls/sec. Each useful poll runs two tmux subprocess-backed operations:

  • display-message (one tmux command via libtmux Pane.display_message)
  • capture-pane (one tmux command via libtmux Pane.capture_pane)

That is roughly 40 tmux subprocess calls per second per active wait. Ten parallel wait_for_text calls across agent instances produce ~400 tmux subprocesses per second hammering the same tmux server.

Not a correctness issue — but a real cost in parallel-agent flows, and a real load on the tmux event loop that other clients share.

Why this isn't urgent

The deterministic alternative for command completion is already shipped: wait_for_channel, which blocks server-side via cmd-wait-for.czero subprocesses per second while waiting. The send_keys docstring and the server system instructions both now name wait_for_channel first, so the agent's default mental path leads off the polling-scraper at the moment the choice is being made.

wait_for_text is the right primitive when the agent does not author the output (third-party process logs, daemon prompts, interactive supervisors). That's a smaller set of calls and a smaller subprocess footprint.

Two viable directions

1. Adaptive backoff inside the poll loop

Use the same exponential-backoff pattern the project already implements in ReadonlyRetryMiddleware:

base_delay = 0.1
max_delay = 1.0
backoff_multiplier = 2.0

Apply only when no match is found on a given tick. First tick uses interval; on no-match, sleep increases up to max_delay. Reset on match (irrelevant — we exit on match) or on caller-supplied higher interval.

Pros:

  • Cheap implementation.
  • Precedent exists in the same codebase.
  • Agents that hit a fast match get fast latency; agents that wait long get cheap polling.

Cons:

  • A new exposed knob (max_interval?) or a hardcoded ceiling that the caller can't override.

2. Raise the default interval

interval: float = 0.2  # was 0.05

Pros:

  • Simpler, more honest about the cost.
  • Callers who need 50 ms polling can still pass it.

Cons:

  • Default-behavior change visible to existing agents (perceived as "slower").
  • Doesn't help long waits.

Recommendation

Adaptive backoff (option 1) — the precedent already exists, and the agent-facing API stays unchanged. Knob exposure can wait until someone needs it. If a stress-test fixture lands that measures subprocess-per-second under N parallel waits, that's the place to land this.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions