Skip to content

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#5

Draft
yyoyoian-pixel wants to merge 191 commits into
mainfrom
feat/pipeline-tunnel-polls
Draft

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#5
yyoyoian-pixel wants to merge 191 commits into
mainfrom
feat/pipeline-tunnel-polls

Conversation

@yyoyoian-pixel
Copy link
Copy Markdown
Owner

@yyoyoian-pixel yyoyoian-pixel commented May 13, 2026

Summary

Pipelined full-tunnel with adaptive pipeline depth, write-sequence ordering on the tunnel-node, and WebRTC TCP fallback.

Pipelining

  • Optimist start at depth 2 — every session begins with 2 in-flight polls (free, no elevation permit)
  • Adaptive ramp to depth 4 — sessions with sustained download data (>32KB) elevate with permit
  • Fast-path uploads — when pipeline is full, upload data bypasses depth cap (+4 extra ops)
  • Timer-based refill — non-blocking 100ms steps in the select loop, polls after 1s
  • Single-loop architecture — upload reads and reply processing in one select loop for natural back-pressure

Write ordering (wseq)

  • Client assigns monotonic wseq to data-bearing ops (not polls)
  • Tunnel-node buffers out-of-order writes per session, flushes in wseq order
  • Backward compatible: old clients without wseq write immediately
  • Prevents TLS corruption from pipelined batches completing out of order

STUN/TURN blocking

  • block_stun config (default true) with Android UI toggle
  • Rejects STUN/TURN ports (3478/5349/19302) so WebRTC apps (Meet, WhatsApp) instantly fall back to TCP TURN
  • Eliminates 10-30s ICE negotiation timeout

Tunnel-node improvements

  • LONGPOLL_DEADLINE 4s (must stay below client batch timeout)
  • Reader buffer 2MB (was 64KB)
  • Drain loop: keeps reading until buffer empty (max 1s), accumulates up to 2MB+ per drain
  • Upload size logging

Android

  • Pipeline debug overlay (SYSTEM_ALERT_WINDOW) — temporary, shows session depths and events
  • Tokio worker threads: 4 (was 2)
  • block_stun toggle in Advanced settings

Other

  • Legacy detection removed (was false-triggering)
  • consecutive_empty gate removed from refill (was killing idle sessions)
  • 32KB download threshold for elevation (prevents keep-alive sessions from over-elevating)
  • Unbounded mux channel (prevents upload flood from blocking downloads)

Files changed

  • src/tunnel_client.rs — pipelining, fast-path, wseq, timer refill, single-loop
  • src/domain_fronter.rs — wseq field on BatchOp
  • src/proxy_server.rs — STUN blocking
  • src/config.rs — block_stun config
  • src/android_jni.rs — pipelineDebugJson JNI, worker_threads=4
  • tunnel-node/src/main.rs — wseq ordering, 2MB reader, drain loop, LONGPOLL 4s
  • android/ — ConfigStore, HomeScreen, PipelineDebugOverlay, MhrvVpnService, Native, Manifest

Test plan

  • Pipelining: sessions ramp 2→3→4, downloads overlap
  • Fast-path: uploads bypass full pipeline
  • wseq ordering: tunnel-node logs show in-order writes
  • STUN blocking: Google Meet connects instantly via TCP TURN
  • Video upload: starts immediately, no stall (single-loop)
  • Telegram messaging: messages send with expected delay
  • Debug overlay: shows sessions, depth, events
  • Long-running stability test

🤖 Generated with Claude Code

dazzling-no-more and others added 30 commits April 25, 2026 16:49
…ecycle-reliability

fix(android): tighten VPN session lifecycle reliability
Five small but real Android-only fixes:

1. Connect/Disconnect button gated on VpnState.isRunning state-flow
   with 12s backstop, replacing the fixed 2s transitionCooldown
   timer. Closes the race where a tap-after-Stop hit "Address already
   in use" because the previous teardown's listener-socket release
   wasn't done.

2. Tun2proxy.stop() wrapped in 2s join() — if the native call hangs,
   bounded teardown still releases the listener port instead of
   holding the teardown thread.

3. fd-leak fixed between parcelFd.detachFd() and Thread.start(): an
   OOM-thrown Thread.start used to orphan the detached fd. Now
   adopted into a fresh ParcelFileDescriptor purely so we can close()
   it.

4. Misleading teardown doc-comment rewritten — the "step 2 closes
   the TUN fd to force EBADF on read" claim has been factually
   wrong since detachFd landed.

5. Recursive crash trap: Log.e in MhrvApp's uncaught handler now
   wrapped in try/catch so a logd failure during exception logging
   falls through to the previous handler with the real exception.

No Rust changes; 98 lib + 22 tunnel-node tests still pass.

Local Android build verified, APK installed on mhrv_test emulator,
launches cleanly with v1.6.1 in title.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In `relay_parallel_range`, when a chunk failed validation
(`extract_exact_range_body` returned Err) OR the stitched body length
didn't match the advertised total, the fallback path called
`rewrite_206_to_200(&first)` — which converted the 256 KiB probe
response into HTTP 200 + Content-Length=262144 and returned that as
if it were the full file. Browsers saw a complete-looking 200 and
treated the download as finished at 256 KB.

Common triggers for the chunk-validation failure (per the user
reports):
- Apps Script's UrlFetchApp stripping `Content-Range` from chunk
  responses while preserving it on the probe
- Origin returning 200-OK on follow-up Range requests (some servers
  flatten ranges after the first one)
- Mismatched `total` field across chunks for paths behind a varying
  cache layer

The correct fallback is a single GET without any Range header —
Apps Script fetches the whole URL (up to its 50 MiB cap) and
returns a normal 200 with the complete body. Slower than parallel
for large files but produces a correct response, which is the
minimum bar.

Two independent reports (Ehsan in therealaleph#162, Recruit1992 confirming).
98 lib tests still pass; existing `validate_probe_range_rejects_*`
and `extract_exact_range_body_*` tests already cover the validation
side, the fallback path is observed integration-testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
buildNotif() hardcoded `proxyPort + 1` for the SOCKS5 line, ignoring
cfg.socks5Port entirely. With the default Android config
(listenPort=8080, socks5Port=1081) the foreground notification read
"Routing via SOCKS5 127.0.0.1:8081" but the real listener was on 1081 —
so users configuring per-app SOCKS5 (Telegram, etc.) against the
notification value silently failed.

Use the same `cfg.socks5Port ?: (cfg.listenPort + 1)` elvis fallback the
real listener uses, and surface both ports in the notification:
  HTTP 127.0.0.1:8080  ·  SOCKS5 127.0.0.1:1081

Reported by vpnineh and l3est (with netstat screenshots showing the
exact mismatch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The batch-build loop blocked on a 30 ms timeout for the first message,
then drained whatever else was in the channel via try_recv() and fired
the batch. Under any non-bursty workload, the channel queue was always
empty by the time the first op woke us up — so every "batch" had
exactly one op, defeating the entire batching premise. Reporter (w0l4i)
saw `batch: 1 ops → ..., rtt=6.3 s` repeating in logs even under high
concurrency.

Fix: after the first op lands, hold the buffer open for an 8 ms
coalescing window. Concurrent ops (parallel fetches, HTTP/2 stream
openings, etc.) now accumulate into the same batch. 8 ms is rounding
error against the 2–7 s Apps Script RTT we're amortizing, and restores
the multi-op-per-batch behavior the rest of the code already supports
(MAX_BATCH_OPS=50, MAX_BATCH_PAYLOAD_BYTES=4 MiB).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: include twitter.com in X.com URL normalization (therealaleph#245)
- therealaleph#245 (@Parsa307): match twitter.com in X.com URL normalization
- therealaleph#255 (@dazzling-no-more): copy-logs button + selectable log lines on Android
- therealaleph#257 (@dazzling-no-more): bulk paste of multiple deployment IDs on Android
- therealaleph#256 (@dazzling-no-more): plain HTTP proxy passthrough in google_only mode
  (used to return 502; now falls through to direct TCP / upstream_socks5,
  matching the existing CONNECT behavior)

No protocol or wire-format changes; existing config and Apps Script
deployments work unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…table VoIP, faster browsing - needs new tunnel deployment for udpgw (therealaleph#222)

* feat: native udpgw protocol alongside existing UDP associate

Why udpgw is needed even with UDP associate:

UDP associate (udp_open/udp_data) creates one tunnel session per UDP
destination and polls each independently. On high-latency or shaky
networks this compounds — N simultaneous UDP flows need N separate
polling loops, each paying its own batch round-trip overhead. Google
Meet calls, which fire dozens of concurrent STUN + RTP flows, stall
or fail entirely because the per-destination polling can't keep up.

udpgw multiplexes ALL UDP over one persistent TCP-like session using
conn_id framing. One batch op carries frames for many destinations.
Persistent sockets per (conn_id, dest) with continuous reader tasks
keep source ports stable — critical for protocols like Telegram VoIP
and STUN that expect replies on the same port.

Both paths coexist — they serve different traffic:
  - UDP associate (SOCKS5): apps that negotiate SOCKS5 UDP relay
  - udpgw (198.18.0.1:7300): TUN-captured UDP (DNS, QUIC, Meet, etc.)

tun2proxy vendored as git submodule at v0.7.20 with one transparent
commit adding udpgw_server to the Android JNI run() function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: block QUIC (UDP 443) and DNS (UDP 53) from udpgw

QUIC through udpgw is slower than TCP/HTTP2 through the batch pipeline
— blocking it forces browsers to fall back to TCP, improving YouTube
and general browsing speed.

DNS is better handled by tun2proxy's virtual DNS / SOCKS5 UDP associate
path which is more reliable for single request-response exchanges.

VoIP (Telegram, Meet) still flows through udpgw normally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace submodule with [patch.crates-io] for tun2proxy udpgw

Use the idiomatic Rust [patch.crates-io] mechanism instead of a git
submodule. Points to yyoyoian-pixel/tun2proxy fork with the udpgw
JNI parameter patch (upstream PR: tun2proxy/tun2proxy#247).

Will be removed once upstream ships the change in tun2proxy >= 0.8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: pin tun2proxy patch SHA in Cargo.lock

Locks tun2proxy at dfc24ed1 so the patch resolution is recorded and
any branch rewrite is visible in the lockfile diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use AbortHandle for ConnSocket readers to prevent FD leaks

JoinHandle::drop detaches the task without aborting it. When
udpgw_server_task is cancelled (session close), the post-loop
cleanup never runs and per-(conn_id, dest) reader tasks become
zombies holding Arc<UdpSocket> file descriptors.

AbortHandle::drop aborts the task automatically, so cleanup is
correct by construction regardless of how the parent task exits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yyoyoian-pixel <279225925+yyoyoian-pixel@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlights:
- Native udpgw protocol in Full mode (therealaleph#222) — Telegram voice/video
  calls and Google Meet now work in Full mode on Android. UDP flows
  through one persistent TCP tunnel (instead of session-per-destination)
  so STUN/RTP flow counts no longer stall. Requires redeploying the
  tunnel-node Docker image (ghcr.io/therealaleph/mhrv-tunnel-node:1.7.0).
- Android home screen restructure (therealaleph#258, closes therealaleph#246) — Connect button
  now pinned under Mode field, App picker shows pre-selected apps at
  top. With long deployment-ID lists, Connect no longer scrolls
  off-screen.
- release-drafter + prepare-release tooling (therealaleph#260) — incrementally
  drafts release notes from merged PR titles; manual workflow_dispatch
  prepares version bumps + changelog stubs.

No protocol breaking changes; existing apps_script-mode and Full-mode
deployments work unchanged. Full-mode users get udpgw automatically
once the tunnel-node Docker image is updated.

Thanks to @yyoyoian-pixel and @dazzling-no-more.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…te revocation (therealaleph#121)

* feat(cert): add --remove-cert flag and Remove CA button for clean-slate revocation

* fix(cert): testable euid-root branch + orphan enterprise_roots warning
mhrv-rs --remove-cert (CLI) and Remove CA button (UI) for verified
clean-slate revocation. Clears OS trust store, NSS browser stores
(Linux Firefox/Chrome), and the on-disk ca/ directory. config.json
and the Apps Script deployment are untouched.

By-name trust verification runs before browser-state mutation; OS
removal failures return RemovalIncomplete with browser state intact
so retries are idempotent. Sudo-aware on Unix (re-roots HOME to the
real user). 29 new unit tests on the pure logic (Firefox user.js
marker handling, getent passwd parsing, NSS stderr classification,
NssReport state rules).

Tested end-to-end on Windows by the contributor; macOS verified at
merge time on real hardware (login keychain delete + NSS-missing
fallback). Linux paths await user testing.

Closes therealaleph#121.
Thanks @dazzling-no-more.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Telegram release notifier used to post just the universal APK with
a single-document caption. This change ships the per-platform binaries
for macOS (amd64+arm64 CLI), Linux (amd64+arm64 CLI), Windows
(amd64 UI), and Android (universal APK) as a single Telegram media
group with one caption listing every filename + SHA-256.

Workflow side (.github/workflows/release.yml):
- The telegram job now downloads ALL artifacts (was: APK only).
- New `Prepare files for Telegram media group` step extracts the raw
  binaries out of each per-platform .tar.gz / .zip (no archive
  wrappers in the channel) and renames them with version suffixes
  (mhrv-rs-linux-amd64-v1.7.2, mhrv-rs-windows-amd64-ui-v1.7.2.exe,
  etc.). Per-platform extraction is best-effort: a missing artifact
  emits a `::warning::` and skips that platform rather than failing
  the whole post.
- The post step builds a `--files <path>` arg list from tg-files/,
  sorted for deterministic order across runs, and invokes the
  notifier without --with-changelog (the script auto-replies with
  changelog whenever --files is used).

Script side (.github/scripts/telegram_release_notify.py):
- New --files arg (repeatable). 2..=10 files → sendMediaGroup; 1 file
  → sendDocument with the same caption shape; 0 → error. Telegram's
  sendMediaGroup rejects single-item groups, so the 1-file fallback
  isn't optional.
- New build_media_group_caption() composes title + per-file
  filename+SHA list + repo/release URLs. Fits ~860 chars for a 6-file
  release; fallback to filename-only-list if a future swell pushes
  past Telegram's 1024-char caption cap.
- send_media_group() handles the multipart/form-data shape with each
  file referenced as `attach://fileN` from the media JSON. Caption is
  attached to file 0 only (Telegram clients render per-item captions
  inconsistently for media groups; first-item-only is the safe
  pattern).
- Legacy --apk path kept for any caller that hasn't migrated; either
  --apk or --files must be present (validated at startup).
- _content_type_for() picks application/vnd.android.package-archive
  for .apk and application/octet-stream for everything else, so
  Telegram clients label the APK with the Android icon and label
  desktop binaries by filename without a misleading icon.

Behavioural change for users:
- The Telegram channel now sees one grouped post per release with all
  primary platform binaries inline, instead of just the APK. macOS
  users wanting the gatekeeper-friendly .app.zip still grab it from
  the GitHub Releases page; the Telegram drop is for the "give me
  the binary, I'll run it" path.
- The Persian/English changelog reply that used to be opt-in (via
  TELEGRAM_INCLUDE_CHANGELOG=true) is now automatic in the --files
  path because the per-file SHA list eats the caption budget that
  previously held the FA brief-note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…therealaleph#266)

* feat(android): config import/export via clipboard, QR code, deep link, and share sheet

- Clipboard paste: banner auto-detects mhrv:// or raw JSON in clipboard,
  one tap to import. Clipboard cleared after successful import.
- Export dialog: QR code + compressed hash + copy button + Android share
  sheet (sends QR image + text together).
- QR scanner: ZXing embedded scanner in portrait orientation.
- Deep link: mhrv:// URIs auto-open the app and import the config.
- Compact encoding: only non-default fields included, DEFLATE compressed
  before base64. Accepts both compressed and raw JSON on import.
- ConfigStore.loadFromJson() deduplicated — shared by file load + import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: deep link requires confirmation, trust warning on import, mhrv-rs:// scheme

Security fix: deep link (mhrv-rs://) no longer auto-imports config.
Stashes decoded config for UI confirmation dialog — same flow as
clipboard paste and QR scan.

Import confirmation dialog now shows:
- Trust warning: "Importing routes your traffic through the deployment
  IDs in this config. Only import from trusted sources."
- Mode and deployment ID count with first 3 IDs previewed
- Explicit Import / Cancel buttons

Also:
- Renamed scheme from mhrv:// to mhrv-rs:// (less collision risk)
- Deduplicated import dialog into shared ImportConfirmDialog composable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yyoyoian-pixel <279225925+yyoyoian-pixel@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- mhrv-rs:// deep links, QR scanner, clipboard banner, share sheet
- DEFLATE-compressed base64 encoding (~200 chars vs ~800 raw)
- Every import path requires explicit user confirmation; the dialog
  shows the new deployment IDs and a trust warning so an attacker
  posting a malicious mhrv-rs:// link in a public channel can't
  silently overwrite a user's auth_key + script_ids
- ZXing for QR generation/scanning (no Google Play Services)

Closes therealaleph#266. Thanks @yyoyoian-pixel — the rebase from auto-import
to confirmation-gated import is exactly the right shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ph#271)

Uses tun2proxy_run_with_cli_args (the C API) via dlsym instead of
modifying the JNI run() signature. The upstream tun2proxy maintainer
recommended this path — the CLI API accepts --udpgw-server natively.

- Cargo.toml: enable udpgw feature, remove [patch.crates-io]
- MhrvVpnService.kt: build CLI args with --udpgw-server in full mode
- Native.kt + android_jni.rs: dlsym wrapper for the C API
- Tun2proxy.kt: reverted to upstream signature

No fork, no patch, no submodule.

Co-authored-by: yyoyoian-pixel <279225925+yyoyoian-pixel@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move from yyoyoian-pixel/tun2proxy fork (with patched JNI signature)
to canonical tun2proxy 0.7.21 from crates.io with feature flag
"udpgw". Cargo.toml [patch.crates-io] section removed entirely.

The Android side now resolves tun2proxy_run_with_cli_args at runtime
via dlsym from libtun2proxy.so, which is the upstream maintainer's
recommended path for callers that need full CLI flexibility.
mhrv-rs builds the CLI string in MhrvVpnService and passes it through
Native.runTun2proxy → src/android_jni.rs → dlsym → tun2proxy.

Future tun2proxy upgrades are now a single Cargo version bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-therealaleph#275, youtube_via_relay=true routed every YouTube-related host
through Apps Script — including ytimg.com (thumbnails) and any
googlevideo.com chunk request the player issued. Two problems:

1. ytimg.com via Apps Script is wasted quota — image CDN, no
   Restricted Mode logic to bypass.
2. googlevideo.com wasn't even in SNI_REWRITE_SUFFIXES, so video
   chunks hit the relay regardless of the flag. A single chunk
   timeout aborted the whole video on Firefox; long videos risked
   the Apps Script 6-min execution cap mid-playback.

Fix: split YouTube into "API/HTML hosts" (where Restricted Mode
lives, gated by the flag) and "asset CDNs" (always direct). The
new YOUTUBE_RELAY_HOSTS list is youtube.com, youtu.be,
youtube-nocookie.com, youtubei.googleapis.com — those go through
relay when the flag is on. ytimg.com, googlevideo.com (added),
ggpht.com all stay on SNI rewrite.

The matches_sni_rewrite logic was also restructured: the carve-out
now runs FIRST before the SNI suffix match, so the broad
googleapis.com entry can't override the narrower
youtubei.googleapis.com decision.

Reported with detailed analysis by @amirabbas117. Will ship in v1.7.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The tunnel-docker job in v1.7.3 release failed with:

  error: failed to unpack package `serde_json v1.0.149`
  Caused by: failed to open `/usr/local/cargo/registry/src/.../serde_json-1.0.149/.cargo-ok`
  Caused by: File exists (os error 17)

Root cause: BuildKit's default cache-mount sharing is "shared" — both
linux/amd64 and linux/arm64 build stages mount the SAME on-disk cache
dir. Cargo's registry source extraction is non-atomic; both arches
race on `tar -xzf serde_json-1.0.149.crate` into the same destination,
and the loser hits EEXIST mid-unpack.

Fix: scope each cache mount with `id=cargo-registry-${TARGETPLATFORM}`
(and matching for cargo-git + target). BuildKit then keeps separate
on-disk caches per architecture — no race. Per-arch warm-build speedup
is preserved (each cache fills with that arch's pre-built deps); the
only loss is one cache miss per arch on the first build after this
change, which we already paid in v1.7.3.

The target/ mount is also platform-scoped since target/ holds compiled
object files for a single ABI; sharing across arches would either miss
or, worse, link wrong-ABI objects together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ative-cache + pre-warm)

therealaleph#275: youtube_via_relay no longer routes video/image CDNs through
Apps Script. The flag now correctly carves out only the API/HTML
hosts where Restricted Mode is enforced; video chunks come direct
from googlevideo.com (which was missing from the SNI rewrite list
entirely — fixed). Long videos no longer hit Apps Script's 6-min
execution cap, and single-chunk timeouts no longer abort playback.

therealaleph#280: TunnelMux now caches "destination unreachable" responses from
the tunnel-node (Network is unreachable / No route to host) for 30
seconds, short-circuiting subsequent CONNECTs to that destination
with 502 (HTTP) or 0x04 (SOCKS5). Saves ~5 batches/second on
IPv6-only host probes. Startup pre-warm pool grew 12→24.

143/143 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
w0l4i has been asking for client-side QUIC block since therealaleph#213. Now
implemented as a small config flag.

When `block_quic = true`, the SOCKS5 UDP relay drops any datagram
destined for port 443 — that's HTTP/3-over-UDP. The client's QUIC
stack retries a couple of times and then falls back to TCP/HTTPS
through the regular CONNECT path (which goes through the relay
normally).

Why client-side rather than server-side udpgw block: the udpgw
block in therealaleph#222 is bound to Full mode + Android tun2proxy. This
covers everyone — apps_script users, desktop, Full mode, all the
same path. Skipping at the SOCKS5 layer rather than the tunnel-node
layer also avoids paying 200–500 ms tunnel-node round-trip per
QUIC datagram drop, which compounds during browser retries.

Silent drop is the contractually correct shape: SOCKS5 UDP wire
has no `host unreachable` reply (RFC 1928 §6 only defines that for
TCP CONNECT). Browsers' QUIC stacks have a "no response → fall
back" timeout, so silent drop matches what the protocol expects.

Default false (opt-in) — udpgw mitigates QUIC partly via persistent
sockets, and a tiny minority of sites only support HTTP/3.

Will ship in v1.7.5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resume the practice (dropped after v1.1.0) of committing prebuilt
binaries to the repo's releases/ folder. Iranian users behind state
network filtering frequently can't reach the GitHub Releases page
(/releases/tag/...) but CAN reach the static source tree via
Code → Download ZIP — that pulls the in-repo releases/ folder along
with the source. Telegram channel feedback explicitly requested
this be resumed.

The new commit-releases job:
1. Runs after release+build+android succeed.
2. Wipes existing binary artifacts from releases/ (.apk, .tar.gz,
   .zip) but preserves README.md and .gitattributes.
3. Copies all desktop archives (which already have stable
   platform-suffixed names like mhrv-rs-linux-amd64.tar.gz).
4. Copies all per-ABI Android APKs (so users on slow connections
   can grab the ~37 MB arm64-v8a APK instead of the ~110 MB
   universal).
5. sed-updates the "Current version" line and APK filename refs
   in releases/README.md (both English and Persian copies).
6. Commits as github-actions[bot] and pushes to main.

The GitHub Release page itself keeps the canonical versioned
artifacts as before — this in-repo folder is the fallback for
users who can't reach that URL.

Tag protection rules don't apply to refs/heads/main so the push
isn't gated. release-drafter.yml triggers on push-to-main but only
updates the next-release draft, no cycle risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… folder

- Adds `block_quic = true` config flag for client-side QUIC drop.
  SOCKS5 UDP relay refuses UDP/443 datagrams; browsers fall back to
  TCP/HTTPS through the relay. Opt-in. Thanks @w0l4i
- Workflow now auto-refreshes the in-repo releases/ folder on each
  release tag, so Iranian users behind GitHub-Releases-page filtering
  can download via Code → Download ZIP. Practice was started before
  v1.1.0 then dropped; resumed at user request.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-actions Bot and others added 16 commits May 11, 2026 00:02
Auto-committed by release workflow so users behind GitHub-Releases-page filtering can download via the in-repo releases/ folder. The GitHub Release page itself still has the canonical versioned artifacts; this folder is the fallback path for users who can only reach the static source tree (Code → Download ZIP).
…#1040 (therealaleph#1041)

Completes [therealaleph#1040](therealaleph#1040) (v1.9.21). therealaleph#1040 skipped H2 for `tunnel_batch_request_to` but missed `tunnel_request` — the single-op path used for plain `connect` ops. The 16-17s long-poll stalls persisted on full-tunnel sessions that go through the single-op path; this PR closes that gap.

Same fix shape: remove the H2 try/fallback/NonRetryable block from `tunnel_request`, go straight to H1 pool `acquire()`. H2 remains active for relay-mode paths (`do_relay_once_with`, exit-node `relay()`).

## All h2_relay_request call sites audited

| Call site | Function | Mode | H2 skipped? |
|---|---|---|---|
| `do_relay_once_with` | relay | Relay | No (correct — relay benefits from H2) |
| `relay()` exit-node | relay | Relay | No (correct) |
| `tunnel_request` | tunnel single op | Full tunnel | **YES — this PR** |
| `tunnel_batch_request_to` | tunnel batch | Full tunnel | Yes (PR therealaleph#1040) |
| `tunnel_batch_request_with_timeout` | tunnel batch | Full tunnel | Yes (PR therealaleph#1040) |

No other full-tunnel paths use H2 after this fix.

## Verified locally on top of v1.9.21

- `cargo test --lib --release`: 209/209 ✅
- `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅

Reviewed via Anthropic Claude.

Co-Authored-By: yyoyoian-pixel <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ops (therealaleph#1041)

Bumps Cargo.toml v1.9.21 → v1.9.22. Ships @yyoyoian-pixel's PR therealaleph#1041
which completes therealaleph#1040 — v1.9.21 skipped H2 for tunnel_batch_request_to
but missed tunnel_request (single-op connect path). 5/5 h2_relay_request
call sites now audited; all full-tunnel paths use H1, relay paths keep
H2. 209 lib tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-committed by release workflow so users behind GitHub-Releases-page filtering can download via the in-repo releases/ folder. The GitHub Release page itself still has the canonical versioned artifacts; this folder is the fallback path for users who can only reach the static source tree (Code → Download ZIP).
…aleph#1047, therealaleph#1049)

Fixes therealaleph#1047. `_doSingle`'s normal-relay path (cache disabled or cache miss on
non-cachable request) ran `UrlFetchApp.fetch` → `getContent` → `base64Encode`
with no error wrapper. Any throw — most commonly when the response body
approaches Apps Script's ~50 MB ceiling and `base64Encode` blows the V8 heap,
also URL-too-long / payload-too-large / quota exhaustion / 6-minute execution
timeout — propagated unhandled, and Apps Script served its default
`<title>Web App</title>` HTML error page in place of the JSON envelope.

The Rust client (`parse_relay_json` in `domain_fronter.rs`) then failed to
find JSON and surfaced the cryptic `bad response: no json in: <!DOCTYPE html>...`
with no signal as to the actual cause.

The reporter's symptom — a single failing host (`shc-dist.lostsig.co`,
sonichacking.org) serving large ROM-hack binaries — matches this exactly.
Every other download worked because they were all under the body-size
ceiling.

## Fix

Wrap the normal-relay block in `_doSingle` with
`try { ... } catch (err) { return _json({ e: "fetch failed: " + String(err) }); }`.
Mirrors the per-item try/catch already present in `_doBatch`. Turns the
silent HTML crash into a structured `FronterError::Relay("fetch failed: …")`
on the client side that pinpoints the real underlying error.

Cache path intentionally untouched:
- `_fetchAndCache` already wraps its own fetch in try/catch and returns
  `null` on any failure (so `_doSingle` falls through cleanly to the
  normal relay).
- The cached-read path is bounded to ≤ `CACHE_MAX_BODY_BYTES` (35 KB)
  so it cannot trip the size limits that caused this bug.

## Verified locally on top of v1.9.22

- `node --check assets/apps_script/Code.gs`: clean ✅
- `cargo test --lib --release`: 209/209 ✅ (sanity — no Rust change)

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 50 MiB cap (therealaleph#1042, therealaleph#1085)

Fixes therealaleph#1042 — range-capable downloads larger than ~50 MiB through the
Apps Script relay returned `504 Relay timeout — Apps Script unresponsive`
instead of the file. The 104 MiB v2rayN DMG in the reported logs was the
canonical repro (also matches @Paymanonline's report in therealaleph#1077, closed
prior as "architectural limit, use mirrors" — this PR makes it actually
work via streaming).

## Root cause

`relay_parallel_range` capped the stitched response at 64 MiB and fell
back to a single `relay()` for anything larger. Single-GET routes through
Apps Script's ~50 MiB response ceiling, so Apps Script killed the script
mid-execution and we hung for the full 25s relay timeout before returning
504.

## Fix

Convert `relay_parallel_range` into a writer-based API that streams large
files chunk-by-chunk to the client socket. Each chunk is still one
≤256 KiB Apps Script call (well under the 50 MiB cap); only the host-side
buffering changes. Backward-compatible `Vec<u8>` wrapper preserves the
pre-1.9.23 API surface for external library consumers.

Three-way dispatch via `RangeDispatch { Buffered, Stream, FallbackSingleGet,
RejectTooLarge }` and the pure `dispatch_range_response(total,
streaming_allowed)` predicate:

- **`Buffered`** — `total ≤ APPS_SCRIPT_BODY_MAX_BYTES` (40 MiB) on either
  surface. Existing stitch + single-GET fallback path; fully recovers on
  chunk failure.
- **`Stream`** — writer API above 40 MiB. Streams; chunk failure flushes
  the committed prefix and returns `Err` so the `Content-Length`
  mismatch tells download clients to resume via `Range`.
- **`FallbackSingleGet`** — wrapper above 64 MiB. Matches pre-1.9.23 cliff
  for external library consumers stuck on the old API.
- **`RejectTooLarge`** — writer API above 16 GiB. Refuses with 502;
  bounds worst-case Apps Script quota drain from a hostile origin
  advertising an absurd `Content-Range` total.

## Memory bounds

Lazy `plan_remaining_ranges` (via `std::iter::from_fn` + `saturating_*`):
range planning is `O(1)` memory regardless of advertised total. Even a
`u64::MAX` total no longer drives a ~6 GB `Vec<(u64, u64)>` allocation.

## CORS interaction

MITM HTTPS and plain-HTTP call sites updated to use `relay_parallel_range_to`
with a CORS-aware `transform_head` closure. Extracted `inject_cors_into_head`
(head-only variant of `inject_cors_response_headers`) so the streaming
path can rewrite ACL headers before the body has been assembled.

## Verified locally on top of v1.9.22

- `cargo test --lib --release`: 227/227 ✅ (was 209; +18 new — 15 stated
  in PR body + 3 incidental from the helper extractions)
- `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅

Manual repro of the 104 MiB v2rayN DMG download is unchecked in the PR
test plan — the unit tests cover the dispatch + streaming + flush
contracts thoroughly. The architectural reasoning is sound and the new
test count (+18) is concrete.

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ealaleph#1042, therealaleph#1085)

Bumps Cargo.toml v1.9.22 → v1.9.23. Ships @dazzling-no-more's PR therealaleph#1085
which converts relay_parallel_range into a writer-based API that streams
files >50 MiB chunk-by-chunk instead of trying to buffer the whole
response and hitting Apps Script's body ceiling. Four-way dispatch
(Buffered / Stream / FallbackSingleGet / RejectTooLarge) with O(1)
memory range planning + a 16 GiB hostile-origin guard. 209 → 227 lib
tests (+18 new). Unblocks GitHub releases / large CDN binaries through
apps_script mode without needing Full mode or external mirrors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-committed by release workflow so users behind GitHub-Releases-page filtering can download via the in-repo releases/ folder. The GitHub Release page itself still has the canonical versioned artifacts; this folder is the fallback path for users who can only reach the static source tree (Code → Download ZIP).
…alaleph#620, therealaleph#1117)

Fixes therealaleph#620 — `tunnel-node/Dockerfile` used BuildKit-only `RUN --mount=type=cache` directives, breaking on Cloud Run's `gcloud run deploy --source .` path (the underlying `gcr.io/cloud-builders/docker` builder doesn't enable BuildKit, and `--set-build-env-vars DOCKER_BUILDKIT=1` doesn't flip it on either).

Reworked to use **cargo-chef**: a dedicated planner stage emits `recipe.json` for dependency metadata, a `cargo chef cook` stage builds just the deps in their own Docker layer, the final build stage adds `src/` on top. Docker's regular layer cache handles dependency reuse — warm rebuilds where only `src/` changes still skip the slow crate compile.

## Changes (`tunnel-node/Dockerfile`-only)

- Dropped `# syntax=docker/dockerfile:1` parser directive and all `RUN --mount=type=cache,...` blocks
- Added cargo-chef multi-stage build (`chef` → `planner` → `builder`)
- Pinned `cargo-chef` to exact `0.1.77` with `--locked` for reproducible installs
- Bumped base from `rust:1.85-slim` → `rust:1.90-slim` (cargo-chef's transitive deps require rustc 1.86+; tunnel-node's `Cargo.toml` has no `rust-version` pin so the bump is internal-only)
- Removed `ARG TARGETPLATFORM` per-platform cache-id workaround — Docker's regular layer cache is already arch-scoped

## Non-changes (deliberate)

- `tunnel-node/Cargo.toml` left alone — the old Dockerfile comment claimed "matches MSRV in Cargo.toml" but no `rust-version` field actually exists. The Docker base bump is internal build-env, not a declared MSRV.
- Base image digest pinning left on tag refs — without Renovate/Dependabot to keep digests fresh, pinning trades automatic glibc/openssl/ca-certificates CVE patching for a reproducibility property this repo doesn't currently need.

## Verified locally

- `cd tunnel-node && cargo build --release`: clean (binary side unchanged)
- `cd tunnel-node && cargo test --release`: 36/36
- Local `docker build` couldn't run (daemon not started on the dev machine); the PR author's test plan documents successful build under classic Docker daemon.

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eph#1088, therealaleph#1108)

Fixes therealaleph#1088 — under Full mode, a single slow Apps Script edge cascade-killed every in-flight tunnel session sharing its batch. Users on 1.9.21+ saw frequent 10s "batch timeout" errors and lost download progress on Telegram / browser sessions.

## Root cause

`read_http_response` in `domain_fronter.rs` had a **hardcoded 10s header-read timeout** that ran *inside* `tunnel_batch_request_to` — independent of and shorter than the outer `tokio::time::timeout(batch_timeout, …)` in `fire_batch`. Apps Script cold starts routinely land in the 8-12s range (PR therealaleph#1040's A/B recorded 4/30 H1 batches timing out at exactly 10s after the H2→H1 switch), so the inner cliff fired as a false-positive batch timeout well before `request_timeout_secs` (default 30s) could.

Secondary: even with a parameterized timeout, the per-read `timeout(d, stream.read(...))` form would silently extend its budget if a peer drip-fed bytes just under `d` each — a slow edge could keep the loop alive past the outer `batch_timeout` and defeat the whole wiring.

## Fix (two changes in `domain_fronter.rs`)

1. **`tunnel_batch_request_to` passes `batch_timeout` to the header read** via new `read_http_response_with_header_timeout` helper. `Config::request_timeout_secs` is now the only knob controlling how long we wait for an Apps Script edge to start responding. Other callers (relay path, exit-node) keep the historical 10s value.

2. **Header read uses a single absolute deadline** (`tokio::time::timeout_at(deadline, …)`) instead of per-read `timeout()`. Total elapsed across all header reads is bounded by `header_read_timeout`, regardless of read cadence.

## Bonus (in `tunnel_client.rs`)

3. **`TunnelMux::reply_timeout` co-varies with `batch_timeout`**: computed at construction as `fronter.batch_timeout() + 5s slack` instead of the fixed 35s const. Operators raising `request_timeout_secs` no longer have sessions abandon `reply_rx` just before `fire_batch`'s HTTP round-trip would complete.

## Verified locally (on top of v1.9.23 / main after therealaleph#1117 merge)

- `cargo test --lib --release`: **231/231** ✅ (was 209 in v1.9.23 baseline; this PR adds 22 new tests covering the deadline/co-variance behavior)
- `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅

## Interaction with v1.9.20 (PR therealaleph#1029)

PR therealaleph#1029 added `H1_OPEN_TIMEOUT_SECS = 8` to bound the TCP+TLS handshake in `open()`. That bound is **separate** from the header-read timeout this PR addresses — both bounds exist in the same call chain. Issue therealaleph#1131 (BuffOvrFlw, just opened) reports `h1 open timed out after 8s` errors which are the `open()` bound firing, not the header-read bound. Worth a follow-up to make `H1_OPEN_TIMEOUT_SECS` parameterized too, but that's a separate change.

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
therealaleph#1088, therealaleph#620)

Bumps v1.9.23 → v1.9.24. Two PRs from @dazzling-no-more:
- therealaleph#1108 (therealaleph#1088): batch header read honors request_timeout_secs.
  Closes the 10s inner timeout cliff that was cascade-killing tunnel
  sessions under slow Apps Script edges. +22 regression tests (231 total).
- therealaleph#1117 (therealaleph#620): cargo-chef Dockerfile so tunnel-node builds without
  BuildKit. Cloud Run's gcloud-deploy path now works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-committed by release workflow so users behind GitHub-Releases-page filtering can download via the in-repo releases/ folder. The GitHub Release page itself still has the canonical versioned artifacts; this folder is the fallback path for users who can only reach the static source tree (Code → Download ZIP).
…aleph#251, therealaleph#1143)

Closes therealaleph#251. In Android Full mode, Telegram worked but Google search and most other websites failed silently. `apps_script` mode on the same setup was unaffected.

**Root cause**: the udpgw magic destination (`198.18.0.1:7300`) was inside `198.18.0.0/15` — the exact range tun2proxy's `--dns virtual` allocator uses to synthesise fake IPs for hostname lookups. Whenever virtual DNS assigned `198.18.0.1` to a real hostname, that hostname's traffic was intercepted by tun2proxy *itself* as a udpgw connection and dropped. Telegram was immune because it uses hardcoded numeric IPs; `apps_script` mode was immune because it never sets `--udpgw-server`.

**Fix**: move `UDPGW_MAGIC_IP` to `192.0.2.1` (RFC 5737 TEST-NET-1) — outside any virtual-DNS allocation pool. Coordinated change across the tunnel-node constant and the Android `--udpgw-server` flag.

## Back-compat

v1.9.25 tunnel-nodes still recognise the legacy `198.18.0.1:7300` for one deprecation cycle (removal in v1.10.0).

| Android | Tunnel-node | Full-mode UDP |
|---|---|---|
| v1.9.25 | v1.9.25 | ✅ fully fixed |
| ≤v1.9.24 | v1.9.25 | ⚠️ handshake works (legacy IP still recognised), but the old client still asks tun2proxy for `198.18.0.1`, so the therealaleph#251 virtual-DNS collision is still live on-device |
| v1.9.25 | ≤v1.9.24 | ❌ breaks silently (old node rejects `192.0.2.1`) |

The fix lives on the client side (which magic IP it asks tun2proxy to reserve). The back-compat is on the tunnel-node side (accepting both during the deprecation window).

## Verified locally

- `cargo test --lib --release`: 231/231 ✅
- `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅
- `(cd tunnel-node && cargo test --release)`: 38/38 ✅ (+2 new tests for the IP change)

## Version bump

Cargo.toml already bumped to 1.9.25 in this PR; `docs/changelog/v1.9.25.md` pre-baked. Will combine with any other PRs landing into v1.9.25 before tagging.

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, therealaleph#1159)

Closes therealaleph#1145. LibreWolf users were getting `MOZILLA_PKIX_ERROR_MITM_DETECTED` when visiting HSTS-protected sites (bing.com, youtube.com, …) through MasterHttpRelayVPN's MITM mode. HSTS gives no "Add Exception" affordance, so users were fully locked out of those sites despite the OS-level CA install having succeeded.

**Root cause**: `cert_installer.rs` only scanned Firefox profile roots (`~/.mozilla/firefox`, the snap variant, `%APPDATA%\Mozilla\Firefox\Profiles`, `~/Library/Application Support/Firefox/Profiles`). LibreWolf is a Firefox fork with strict privacy defaults; it shares Firefox's NSS DB layout and respects the same `security.enterprise_roots.enabled` pref, but stores its profile tree under its own app dir. Neither the per-profile `certutil -A` install nor the `user.js` enterprise-roots auto-trust fallback ever touched LibreWolf, so the browser never trusted our CA.

Same failure mode behind already-closed therealaleph#955 and therealaleph#959 (Firefox-fork users reporting the identical "secure connection could not be established" symptom).

**Fix**: extend Mozilla-family profile discovery to cover LibreWolf on every supported platform. No behavioural change for Firefox installs.

## Changes (`src/cert_installer.rs`-only)

- Renamed `firefox_profile_dirs()` → `mozilla_family_profile_dirs()`. Same flat-vec return type so all five call sites read identically; the rename is signposting only.
- Extracted `mozilla_family_profile_roots(os, home, appdata, xdg_config_home)`: returns the union of Firefox + LibreWolf profile root directories, per-OS:
  - **Linux**: `~/.mozilla/firefox`, snap variant, `~/.librewolf`, `$XDG_CONFIG_HOME/librewolf` (LibreWolf respects XDG by default).
  - **macOS**: `~/Library/Application Support/Firefox/Profiles`, `~/Library/Application Support/LibreWolf/Profiles`.
  - **Windows**: `%APPDATA%\Mozilla\Firefox\Profiles`, `%APPDATA%\LibreWolf\Profiles`.
- All five existing call sites (per-profile install, enterprise-roots fallback, uninstall, dry-run reporter, test-mode reporter) read from the renamed function without further changes.

## Verified locally (on top of v1.9.24)

- `cargo test --lib --release`: **239/239** ✅ (was 231; this PR adds 8 new tests covering LibreWolf-path discovery on each OS).
- `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅

## Will combine with therealaleph#1143

PR therealaleph#1143 already pre-baked the v1.9.25 release files (Cargo.toml + changelog). This PR doesn't touch either, so the squash-merge will land cleanly alongside therealaleph#1143's changes. Will edit v1.9.25's changelog to include therealaleph#1159 as a second bullet before tagging.

Reviewed via Anthropic Claude.

Co-Authored-By: dazzling-no-more <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…all (therealaleph#251, therealaleph#1145)

v1.9.25 ships two bug fixes from @dazzling-no-more:

- therealaleph#1143 (therealaleph#251): Android Full-mode `udpgw magic IP` moved from
  198.18.0.1 → 192.0.2.1 to avoid clash with tun2proxy's virtual-DNS
  allocator range. Resolves "Google + most websites silently broken
  while Telegram works" on Android Full mode. Back-compat: legacy IP
  still recognised by tunnel-node for one deprecation cycle.
- therealaleph#1159 (therealaleph#1145): MITM CA now installs into LibreWolf NSS stores
  alongside Firefox. Closes `MOZILLA_PKIX_ERROR_MITM_DETECTED` HSTS
  lockout on LibreWolf. Same class as already-closed therealaleph#955/therealaleph#959.

Cargo.toml bump (1.9.24 → 1.9.25) came in via therealaleph#1143. This commit
amends the pre-baked v1.9.25 changelog to include therealaleph#1159 and refreshes
Cargo.lock.

239 lib tests + 38 tunnel-node tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-committed by release workflow so users behind GitHub-Releases-page filtering can download via the in-repo releases/ folder. The GitHub Release page itself still has the canonical versioned artifacts; this folder is the fallback path for users who can only reach the static source tree (Code → Download ZIP).
@yyoyoian-pixel yyoyoian-pixel changed the title feat(tunnel): pipelined polls with adaptive depth feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking May 14, 2026
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch 2 times, most recently from af703b1 to 5c010ed Compare May 14, 2026 21:58
yyoyoian-pixel and others added 9 commits May 15, 2026 00:01
…nt reads

Three improvements to full-tunnel throughput and latency:

1. **Overlapped client reads**: tunnel_loop reads from the client socket
   concurrently with the batch reply wait via tokio::select!, buffering
   upload data for the next op instead of blocking on a fresh read timeout.

2. **Pipelined polls with seq echo**: add a per-op sequence number echoed
   by the tunnel-node so the client can reorder out-of-order replies.
   Sessions with sustained data flow (consecutive_data >= 2) ramp up to
   MAX_INFLIGHT_PER_SESSION polls in flight, with 1s stagger between sends
   so they land in separate batches. Drops to serial on first empty reply.

3. **Adaptive pipeline depth**: idle sessions stay at depth 1 (no extra
   polls). Data-bearing sessions gradually ramp 1→2→3→...→10. At most
   MAX_ELEVATED_PER_DEPLOYMENT (6) sessions per deployment can be elevated
   simultaneously, preventing semaphore exhaustion. Elevation slots are
   released immediately on first empty reply or session close.

Wire protocol: BatchOp and TunnelResponse gain an optional `seq` field.
Fully backward compatible — old tunnel-nodes ignore the field, new clients
fall back to serial (depth 1) when resp.seq is None.

Tunnel-node: LONGPOLL_DEADLINE reduced from 15s to 4s for faster poll
turnaround while keeping persistent connections (Telegram) stable.

Includes bench-pipeline.sh for comparing serial vs pipelined throughput.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…STUN blocking

Pipeline improvements:
- Optimist start at depth 2 (free, no permit), drop to 1 on 2 consecutive empties
- Elevation permit only for depth 3+ with 32KB download threshold (prevents
  keep-alive sessions like Telegram from over-elevating)
- Fast-path uploads bypass full pipeline with +4 cap and 20ms coalesce
- Data-op preference: 20ms client read check before sending empty polls
- 1s stagger always applied for batch separation
- Client socket close breaks immediately (no waiting for in-flight polls)
- consecutive_data no longer resets on single empties

Android:
- Pipeline debug overlay (SYSTEM_ALERT_WINDOW) with per-session tracking
- Tokio worker threads 4 (was 2) to prevent burst stalls
- STUN/TURN port blocking (3478/5349/19302) for instant WebRTC TCP fallback

Tunnel-node:
- LONGPOLL_DEADLINE 4s (must stay below client batch timeout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tunnel-node:
- Drain loop: keep reading until buffer empty (max 1s), accumulates
  up to 2MB+ per drain for streaming video (was 100KB)
- Upload size logging for debugging
- 512KB reader buffer (was 64KB)
- LONGPOLL_DEADLINE 4s

Client:
- INFLIGHT_ACTIVE 4 (was 10) to prevent semaphore exhaustion
- Upload loop-read in initial path (1s max, accumulates fat uploads)
- Fast-path 200ms coalesce loop (was single 20ms read)
- 32KB download threshold for elevation (prevents keep-alive sessions
  like Telegram from over-elevating)
- consecutive_data no longer resets on single empties
- block_stun config (default true) with Android UI toggle
- 512KB client read buffer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Architecture:
- Upload task (spawned): reads client socket → sends MuxMsg::Data with
  wseq directly to mux → sends InflightEntry to download task. Fully
  independent, never blocked by downloads.
- Download task (inline): processes replies, sends refill polls (timer),
  accepts InflightEntry. Never blocked by uploads.
- Lock-free mpsc channels throughout — no Mutex contention.

Write ordering (wseq):
- Client assigns monotonic wseq to data-bearing ops only (not polls).
- Tunnel-node buffers out-of-order writes per session, flushes in wseq
  order. Backward compatible: old clients without wseq write immediately.
- Fixes data corruption from pipelined batches completing out of order.

Upload accumulation:
- Adaptive: 50ms initial window for small messages (low latency).
- If >= 32KB accumulated, extend to 1s / 1MB cap (fat uploads for files).

Other:
- Removed consecutive_empty gate on refill (was killing idle sessions).
- Tunnel-node reader buffer 2MB (was 512KB).
- Removed legacy detection (was false-triggering on merged replies).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces upload chunk size to prevent large video uploads from starving
heartbeat polls in shared batches. Adaptive accumulation:
- 50ms initial window, 10ms per-read gap timeout
- >= 8KB triggers extended 1s window (capped at 256KB)
- Smaller chunks clear batches faster, heartbeats get through

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mux channel unbounded (was 512) — prevents upload flood from blocking
  download task's poll sends and ack processing
- Pipeline debug functions no-op'd — std::sync::Mutex was blocking tokio
  workers under contention during heavy uploads
- Upload accumulation yields between reads
- Added batch response mismatch logging (r.len vs sent ops)
- Open issue: r.len()=0 from Apps Script during heavy uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Upload semaphore: max 3 unacked data ops per session (TCP-like flow
  control). Permit held in inflight future until reply arrives.
- Suppress refill polls while data ops are in flight — prevents upload
  acks from being delayed behind slow poll responses in pending_writes.
- data_ops_in_flight counter tracks active upload ops per session.
- upload_cap config field (default 3, not yet wired to Android UI).

Root cause of video upload stall: r.len()=0 batch responses from Apps
Script when batches are large (19+ ops). Needs Apps Script investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split upload/download task architecture caused video upload stalls:
upload ack responses were delayed behind slow poll responses in the
pending_writes ordering buffer. The single-loop naturally serializes
uploads with reply processing, giving steady ack delivery.

Single-loop keeps all pipelining benefits (elevated polls, adaptive
depth, fast-path uploads) while avoiding the ordering issue.

Removed dead upload_cap config field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch from 5c010ed to 377add3 Compare May 14, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants