fix(firmware): stop ESP32-S3 sendto ENOMEM tight loop - 10 Hz self-ping, 300 ms backoff, 128 TX buffers (#1135) by binhusmachado-code · Pull Request #1142 · ruvnet/RuView

binhusmachado-code · 2026-06-20T05:33:50Z

A fresh ESP32-S3 node enters a permanent sendto ENOMEM loop and never emits a single UDP frame. This applies the three mitigations proposed in #1135 and gets the egress path back to 9–13 pps / 300+ CSI frames/min, measured end-to-end on real S3 hardware.

Addresses #1135 — fixes bug #1 (egress ENOMEM) only; bug #2 (phantom LD2410 on a floating UART) is intentionally out of scope (see Testing & scope).

Problem

On a fresh ESP32-S3, from the second CSI callback onward the node spins in a permanent sendto ENOMEM loop and zero UDP frames ever leave the device — the aggregator stays esp32:offline — even though Wi-Fi, DHCP and ICMP are all healthy. pkt_yield_per_sec sits at 0 pps.

This matches the earlier S3 report in #1107, whose ENOMEM/yield-0 half #1119 explicitly deferred as "most likely a separate network-path issue." #1135 pins that separate egress issue down.

Root cause

Per the analysis in #1135: during the first ~1 s after boot, the 50 Hz self-ping + mmWave UART probe + ESPNOW init + promiscuous sniffer all contend for the same lwIP pbuf / Wi-Fi dynamic-TX pools. sendto returns ENOMEM, and the fixed 100 ms backoff introduced in #132 is too short to let the pools drain, so the backoff re-fires into a still-full pool every cycle and loops forever. The S3 contends harder for these buffers than the C6 the original 0.6.x/0.7.0 tuning was verified against, which is why it surfaces there.

The fix — the three mitigations proposed in #1135

main/csi_collector.c — self-ping cadence 50 Hz → 10 Hz (cfg.interval_ms 20 → 100). Removes ~52 back-to-back boot-time datagrams/s of TX flood while keeping the CSI OFDM source alive. Interval comment, block-header comment, and the ESP_LOGI startup string updated to @10Hz.
main/stream_sender.c — ENOMEM_COOLDOWN_MS 100 → 300. The backoff now outlasts the pbuf/mbox pressure instead of re-firing into a still-full pool. (Flat, not exponential — see Testing & scope.)
sdkconfig.defaults — CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM 64 → 128. TX headroom for the boot contention window. 128 is the max of the Kconfig range 1 128 on ESP-IDF v5.4; the project has CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER=y (no PSRAM → dynamic TX), so the value is actually applied, not silently dropped.

Scoped to the S3. Because the bump lives in the base sdkconfig.defaults, the C6 build would inherit it through the overlay chain. Since this was hardware-tested on the S3 only, I pinned CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64 in sdkconfig.defaults.esp32c6, so the C6 build is byte-identical to today. Happy to drop that pin if you'd rather the C6 take the bump too.

These are conservative, reversible tunings: two constants, one buffer-count bump (~few KB heap), one C6 pin, and two stale-comment fixes. No API, protocol, or dependency change.

Measured results

Built and flashed with ESP-IDF v5.4 on an ESP32-S3-DevKitC-1-class board (QFN56 rev v0.2, 16 MB flash / 8 MB PSRAM, native USB-Serial/JTAG; WPA2 2.4 GHz). Aggregator = a stdlib UDP listener on :5005 (macOS).

Metric	Before	After
`sendto` ENOMEM	tight loop, never drains	none observed
`pkt_yield_per_sec`	0 pps	9–13 pps
CSI frames reaching host	0	300+ / min
Vitals packets	none	parsing on host

The ~10 pps floor stays comfortably above the min_pkt_yield = 5 pps DEGRADED gate, so no flapping. The edge pipeline measures its true sample rate from inter-frame timestamps and re-tunes (#987), so dropping the self-ping 5× does not break the vitals/BPM math; 10 Hz is still >5× Nyquist for HR.

Testing & scope (honesty)

S3 only, not C6. Verified end-to-end on ESP32-S3 hardware only (no C6 board on hand). This inverts RuView's usual C6-first coverage, so I'm flagging it plainly. The C6 build is left unchanged by this PR (the TX-buffer pin above).
Flat 300 ms backoff, not exponential. stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node #1135 also suggested an exponential schedule; I kept it flat because only the flat build was hardware-verified, and with the self-ping flood cut + 128 buffers the backoff now rarely fires (0 ENOMEM observed). Exponential under sustained pressure could grow the cooldown enough to starve CSI sends below the min_pkt_yield = 5 DEGRADED threshold, and it needs added state — better as its own hardware-tested follow-up with a capped max.
Version-tree caveat. The patched tree is v0.7.0 (version.txt); stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node #1135 was filed against v0.8.1-esp32. The egress root cause and the 100 ms backoff are identical in both, but please confirm it lands cleanly on the current v0.8.x branch.
Not a total ENOMEM elimination. This kills the boot-time loop. A residual airtime-bound feature_state-emit ENOMEM under load (noted in the in-tree sdkconfig.defaults comment block) is a separate adaptive-controller emit-cadence follow-up; the 300 ms backoff likely masks part of it but does not address it.
Single board, single network, no long soak. @Gelsoluis offered to test patches on the v0.8.1-esp32 S3 + Docker rig — that validation before merge would de-risk the version-tree and single-board caveats.
No CHANGELOG entry is included (the repo CHANGELOG looks release-automated); happy to add one if you'd like.

Refs

#132 (the original ENOMEM backoff this extends — the 100 ms value #1135 says is too short on S3), #521 / #954 (self-ping CSI source), #1107 / #1119 (the mmwave-validation cluster + the deferred separate egress half this resolves).

…uvnet#1135) On a fresh ESP32-S3 the node enters a permanent `sendto ENOMEM` loop from the second CSI callback onward and zero UDP frames ever leave the device (the aggregator stays `esp32:offline`), even though Wi-Fi, DHCP and ICMP are healthy and pkt_yield sits at 0 pps. Per the analysis in ruvnet#1135, during the first ~1 s after boot the 50 Hz self-ping + mmWave UART probe + ESPNOW init + promiscuous sniffer all contend for the same lwIP pbuf / Wi-Fi dynamic-TX pools; `sendto` returns ENOMEM and the fixed 100 ms backoff from ruvnet#132 is too short to let the pools drain, so it re-fires into a still-full pool every cycle and loops forever. The S3 contends harder for these buffers than the C6 the original 0.6.x/0.7.0 tuning was verified against. Implements the three mitigations proposed in ruvnet#1135: * csi_collector.c: self-ping cadence 50 Hz -> 10 Hz (interval_ms 20 -> 100). Cuts ~52 back-to-back boot-time datagrams/s of TX flood while keeping the CSI OFDM source alive. Interval comment, header comment and log string updated. * stream_sender.c: ENOMEM_COOLDOWN_MS 100 -> 300 so the backoff outlasts the pool pressure instead of re-firing into a still-full pool. * sdkconfig.defaults: CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM 64 -> 128 (max of the IDF 1..128 range) for TX headroom during the boot contention window. Scoped to the S3: the bump lives in the base sdkconfig.defaults, so to leave the untested C6 build unchanged it is pinned back to 64 in sdkconfig.defaults.esp32c6. Also tidied a stale "50 Hz" self-ping header comment and a stale "100 ms" backoff comment in adaptive_controller.c so they match the new runtime behavior. Measured on an ESP32-S3-DevKitC-1-class board (QFN56 rev v0.2, 16MB/8MB, USB-Serial/JTAG, WPA2 2.4 GHz; aggregator UDP :5005 on macOS), built and flashed with ESP-IDF v5.4: before: sendto ENOMEM tight loop, yield 0 pps, 0 frames reach the host after: yield 9-13 pps, no ENOMEM, 300+ CSI frames/min received, vitals parsing Fixes the egress/ENOMEM half (bug ruvnet#1) of ruvnet#1135 only; the phantom-LD2410-on- floating-UART detection (bug ruvnet#2) is out of scope and belongs with the ruvnet#1107/ruvnet#1119 mmwave-validation work. Verified on ESP32-S3 only, not on C6. Refs ruvnet#132, ruvnet#521, ruvnet#954, ruvnet#1107, ruvnet#1119.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(firmware): stop ESP32-S3 sendto ENOMEM tight loop - 10 Hz self-ping, 300 ms backoff, 128 TX buffers (#1135)#1142

fix(firmware): stop ESP32-S3 sendto ENOMEM tight loop - 10 Hz self-ping, 300 ms backoff, 128 TX buffers (#1135)#1142
binhusmachado-code wants to merge 1 commit into
ruvnet:mainfrom
binhusmachado-code:fix/s3-sendto-enomem-1135

binhusmachado-code commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

binhusmachado-code commented Jun 20, 2026

Problem

Root cause

The fix — the three mitigations proposed in #1135

Measured results

Testing & scope (honesty)

Refs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant