Skip to content

Daemon freezes when one pane produces heavy output (single-threaded tokio runtime on Unix) #34

@arung-northwestern

Description

@arung-northwestern

Daemon freezes when one pane produces heavy output (single-threaded tokio runtime on Unix)

Description

When a single pane is producing heavy, continuous output (e.g. an ipython REPL running Dask distributed computations with local cluster), the entire rmux daemon becomes unresponsive — all other panes freeze, send-keys and capture-pane block, and any external client (Emacs, scripts) cannot communicate with the daemon until the heavy output subsides.

Environment

  • rmux v0.5.0
  • macOS 15 (Apple M4 Pro, arm64)
  • Terminal: WarpTerminal
  • Shell: zsh

Reproduction

  1. Start an rmux session with two panes (e.g. one running Emacs, one running ipython)
  2. In the ipython pane, run a Dask computation that produces continuous output:
    from dask.distributed import Client
    client = Client()  # local cluster
    # Any compute that streams progress/logging
    futures = client.map(heavy_func, range(1000))
    results = client.gather(futures)
  3. While the computation is running, try to use the other pane (Emacs, shell, anything)
  4. Try running rmux send-keys or rmux capture-pane from an external terminal

Expected: Other panes and CLI commands remain responsive (as they do in zellij).

Actual: The entire multiplexer freezes. All panes stop updating, all CLI commands block, the attach session stops rendering.

Root Cause

The daemon uses a single-threaded tokio runtime on Unix:

// src/main.rs (~line 233)
#[cfg(unix)]
let runtime = Builder::new_current_thread().enable_all().build()?;

#[cfg(windows)]
let runtime = Builder::new_multi_thread().worker_threads(hidden_daemon_worker_threads()).enable_all().build()?;

On Unix, everything runs on one OS thread: all pane PTY readers, all client IPC socket handlers, all send-keys/capture-pane dispatch, and all attach streams.

When one pane produces heavy output:

  1. Its reader task (PaneReaderRuntimewire.rs) saturates the single worker with reads, VT parsing, and OutputRing writes
  2. After MAX_IMMEDIATE_PANE_READS (8 reads / ~64 KiB), it calls tokio::task::yield_now() — but if data is immediately available, the task re-enters right away
  3. All other daemon tasks (IPC listener, client connections, other pane readers) are starved — they never get scheduled
  4. Every CLI call (send-keys, capture-pane, etc.) blocks on the Unix domain socket waiting for a response the daemon can't produce

This is a daemon-wide bottleneck, not a per-pane issue.

Comparison with Zellij

Zellij does not have this problem because it uses a multi-threaded runtime:

// zellij: zellij-utils/src/global_async_runtime.rs
let runtime = Builder::new_multi_thread().worker_threads(4).enable_all().build()?;

With 4 OS threads, heavy output from one pane saturates one worker; the other 3 remain free for client I/O, rendering, and other pane processing. Zellij also uses 64 KiB read buffers (vs rmux's 8 KiB) and separates I/O from parsing via channels to a dedicated Screen thread.

Suggested Fix

The most direct fix is to switch to a multi-threaded runtime on Unix, matching what Windows already does:

// Option A: Simple — one line change
let runtime = Builder::new_multi_thread().worker_threads(4).enable_all().build()?;

// Option B: Dedicated runtime for pane readers
let pane_runtime = Builder::new_multi_thread().worker_threads(2)
    .thread_name("rmux-pane-reader")
    .enable_all().build()?;

The existing tokio tasks don't need to change — they just need multiple OS threads to run on. The PaneReaderRuntime::spawn() calls already produce Send + 'static futures, so they're compatible with a multi-threaded runtime.

Additional improvements that could help even with a single-threaded runtime:

  • Per-quantum byte budget (not just 8-read count) with forced sleep after budget exceeded
  • Bounded async channels between pane reader and publisher
  • Separate the IPC listener onto a dedicated thread so CLI commands always get a response

Workaround

Reducing output volume in the heavy pane (e.g. Client(silence_logs=True) in Dask) mitigates but does not eliminate the issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions