Skip to content

Harden Python perf interop: mplex teardown, yamux throughput, and perf_test.py reliability #1344

@acul71

Description

@acul71

Summary

Python↔Python perf interop in the unified-testing / test-plans harness is failing or flaky on real benchmark stacks. This issue tracks hardening perf_test.py, mplex (tls+mplex) teardown, and yamux window/throughput so perf runs complete reliably while preserving or improving measured performance.

Problem

  • tcp + tls + mplex: expected connection-closed errors during post-benchmark cleanup cause non-zero exit after benchmarks finish.
  • ws + noise + yamux: slow large transfers were cut off (~36% through upload 9) due to listener lifecycle / timeout interaction; yamux recv-window batching caused excessive round-trips on 1 GiB transfers.
  • Harness ergonomics: no local Docker-free runner for fast iteration on perf stacks and PY_YAMUX_* tuning.

Scope (implementation in PR #1337)

  1. perf_test.py (interop/perf/perf_test.py)

    • Listener: peer-polling teardown instead of sleep_forever(); connect deadline only until first peer; consecutive empty get_live_peers() polls before shutdown.
    • Dialer: swallow connection-closed errors only after _benchmarks_complete.
    • PERF_LOCAL_ADDR_FILE for local runner address handoff.
    • Configurable timeouts (TEST_TIMEOUT_SECS, DIAL_TIMEOUT_SECS, etc.).
  2. mplex / tls+mplex

  3. yamux throughput

    • Go-like GrowTo hysteresis, per-read WINDOW_UPDATE credit release, bounded read chunking.
    • PY_YAMUX_* env knobs for perf A/B (documented in scripts/perf/README.md).
    • Unit tests: tests/core/stream_muxer/yamux/test_yamux_growto_hysteresis.py.
  4. Local tooling

    • scripts/perf/run_local_perf.py + README for matrix/quick runs without Redis/Docker.

Expected outcome

  • python-v0.x x python-v0.x (tcp, tls, mplex) perf interop exits 0 after full benchmark.
  • python-v0.x x python-v0.x (ws, noise, yamux) completes all 10 upload/download/latency iterations on 1 GiB workloads.
  • Local ./scripts/perf/run_local_perf.py --matrix passes for regression during development.
  • Throughput on yamux stacks is not regressed vs pre-fix baseline (target: fewer window-update round-trips).

Test plan

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions