stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node

# `stream_sender: sendto ENOMEM` in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node

## Summary
A fresh ESP32-S3 (8 MB, no peripherals attached) flashed with the **v0.8.1-esp32** release assets enters a permanent `sendto ENOMEM` loop on the second CSI callback. The aggregator's source state never advances past `esp32:offline`. While diagnosing this I also found what looks like a separate **phantom LD2410 detection** on a floating UART — I'm filing both here since they were observed in the same run; happy to split into two issues if preferred.

## Environment
| | |
|---|---|
| Board | ESP32-S3, QFN56 rev v0.2, 8 MB embedded PSRAM, USB-Serial/JTAG |
| Flash | 8 MB, DIO @ 80 MHz |
| Firmware | **v0.8.1-esp32** release assets (`bootloader.bin`, `partition-table.bin`, `ota_data_initial.bin`, `esp32-csi-node-s3-8mb.bin`) at `0x0 / 0x8000 / 0xf000 / 0x20000` |
| esptool | 5.3.0 (host: macOS) |
| Aggregator | `ruvnet/wifi-densepose:latest` in Docker, UDP `0.0.0.0:5005` mapped to host |
| Wi-Fi | WPA2-PSK, 2.4 GHz, ch 10, RSSI -39 to -42 dBm |
| Network | ESP32 ↔ host on same /24, ICMP RTT ~12 ms |
| mmWave sensor | **None physically connected** (UART1 TX/RX pins floating) |
| NVS | Wiped before each test (`esptool erase-region 0x9000 0x6000`) and re-provisioned via the bundled `provision.py` |

## What works
- Wi-Fi associates, DHCP returns an IP, host↔ESP32 ICMP is healthy.
- CSI capture itself is alive: callbacks fire at expected cadence with valid `len` and `rssi` fields.
- The aggregator's UDP receiver works: injecting a hand-built UDP packet at `127.0.0.1:5005` from the host immediately promotes its source from `simulated` to `esp32`. So the failure is strictly on the node's egress path.

## What fails
Annotated boot log (timestamps in ms since reset; nothing else is talking on this LAN):

```
I (4124) main: Got IP: 192.168.0.19
I (4124) stream_sender: UDP sender initialized: 192.168.0.25:5005
I (4144) csi_collector: WiFi modem sleep disabled (WIFI_PS_NONE) for CSI capture
I (4154) wifi:ic_enable_sniffer
I (4154) csi_collector: Promiscuous mode enabled (MGMT-only, RuView#396)
I (4164) csi_collector: self-ping started -> 192.168.0.1 @50Hz (CSI OFDM source, fix #521/#954)
I (4184) ESPNOW: espnow [version: 2.0] init
I (4194) edge_proc: Initializing edge processing (tier=2, top_k=8, vital_interval=1000ms, ...)
I (4294) mmwave: Probing UART1 (TX=17, RX=18) for mmWave sensor...
I (4304) mmwave: Probing at 115200 baud (MR60BHA2)...
I (4494) csi_collector: CSI cb #1: len=128 rssi=-25 ch=10        ← first send succeeds (no log = OK)
I (5364) mmwave: Probing at 256000 baud (LD2410)...
I (5544) csi_collector: CSI cb #2: len=128 rssi=-25 ch=10
W (5544) stream_sender: sendto ENOMEM — backing off for 100 ms   ← first failure on CSI cb #2
W (5544) csi_collector: sendto failed (fail #1)
I (5564) mmwave: Detected LD2410 at 256000 baud (caps=0x000c)    ← (see "bug #2" below)
I (5564) mmwave: mmWave UART task started (type=LD2410)
W (5564) stream_sender: sendto suppressed (ENOMEM backoff, 1 dropped)
... (steady-state — every send either ENOMEMs or is suppressed)
```

The aggregator stays at `{"source":"esp32:offline"}` indefinitely; **zero CSI frames reach it over the network** even though L2/L3 is healthy.

### Bug #1 — primary: permanent `sendto ENOMEM` from CSI cb #2 onward
The first `stream_sender_send` (on CSI cb #1) appears to succeed (no failure log). The very next one fails with `ENOMEM` and never recovers — every subsequent attempt either ENOMEMs or is suppressed by the 100 ms backoff. The 100 ms backoff is shorter than what's needed for the underlying pbuf/mbox pressure to clear, so the node is stuck.

The 1050 ms gap between cb #1 and cb #2 is occupied by:
- the **50 Hz self-ping** to the gateway (`csi_collector: self-ping started ... @50Hz`) — that's ~52 UDP datagrams enqueued back-to-back into LWIP;
- the **MR60BHA2 UART probe** at 115200 baud for ~1060 ms;
- **ESPNOW** init + `c6_espnow` tx loop;
- **promiscuous + sniffer** RX consuming Wi-Fi RX buffers.

It looks like LWIP pbufs / WiFi dynamic TX buffers / UDP send mbox saturate during that 1 s and never drain. `sdkconfig.defaults` already mentions a sibling fix for an earlier ENOMEM (note above `CONFIG_LWIP_UDP_RECVMBOX_SIZE=32` / `CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64` / `CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64`), but on S3 those values don't appear sufficient — possibly because S3 + Sniffer + 50 Hz self-ping + ESPNOW competes harder for buffers than the C6 target the 0.6.7 build was verified against.

Possible fixes worth considering:
- Drop the self-ping cadence (50 Hz → 10 Hz?) when the LD2410/mmWave or ESPNOW tasks are also TX-active.
- Raise `CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM` / `CONFIG_LWIP_TCPIP_RECVMBOX_SIZE` further in the S3-specific sdkconfig overlays.
- When `stream_sender` has been in ENOMEM backoff for >N consecutive cycles, exponentially extend the backoff (the current fixed 100 ms is too short) and emit a single warning instead of one per attempt.

### Bug #2 — secondary: phantom LD2410 detection on a floating UART
With **no mmWave sensor wired to UART1 (TX=17, RX=18)**, the firmware still concludes `Detected LD2410 at 256000 baud (caps=0x000c)` and spawns the LD2410 reader task. The v0.8.1-esp32 release notes specifically called out a fix for "false MR60BHA2 detection → ENOMEM by requiring validated sensor headers instead of accepting bare byte patterns" — the **LD2410 path looks like it still accepts loose patterns** and so trips on floating-pin noise at 256000 baud.

This isn't the trigger of Bug #1 (the timing rules it out — first ENOMEM at 5544 ms, LD2410 declared at 5564 ms), but the resulting mmWave UART task adds steady load to a system that's already in a fragile buffer state.

Suggested fix: gate `mmwave: Detected LD2410` on a validated frame header (length + checksum + magic), matching what was done for MR60BHA2 in v0.8.1.

## What I tried
1. `release_bins/s3-adr110/` in-tree bins — same ENOMEM loop.
2. `release_bins/s3-fair-adr110/` in-tree bins — same.
3. Fresh download of **v0.8.1-esp32** release assets — same.
4. `esptool erase-region 0x9000 0x6000` to wipe NVS, then `provision.py --reset --edge-tier 2 --target-ip <host> --target-port 5005` — same.
5. Confirmed Wi-Fi credentials, IP, gateway, and aggregator IP/port are correct (ping host↔ESP32 OK).
6. Confirmed the aggregator's UDP receiver works by sending a synthetic CSI packet from the host — source promoted to `esp32` immediately, then back to `esp32:offline` after the synthetic stream stops.

## Repro
1. Flash a bare ESP32-S3 (8 MB, **no mmWave sensor connected**) with the v0.8.1-esp32 release assets at `0x0 / 0x8000 / 0xf000 / 0x20000`.
2. `python3 provision.py --port <port> --chip esp32s3 --ssid <SSID> --password <pw> --target-ip <host> --target-port 5005 --edge-tier 2 --reset`.
3. Run RuView aggregator on `<host>:5005`.
4. Watch ESP32 serial: first `stream_sender: sendto ENOMEM — backing off for 100 ms` appears on CSI cb #2 and never goes away. Phantom `mmwave: Detected LD2410 ...` appears in the same window.
5. Watch `GET /api/v1/status` on the aggregator — stays `esp32:offline` indefinitely.

Happy to test patches on this board if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node #1135

`stream_sender: sendto ENOMEM` in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node

Summary

Environment

What works

What fails

Bug #1 — primary: permanent `sendto ENOMEM` from CSI cb #2 onward

Bug #2 — secondary: phantom LD2410 detection on a floating UART

What I tried

Repro

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development


Board	ESP32-S3, QFN56 rev v0.2, 8 MB embedded PSRAM, USB-Serial/JTAG
Flash	8 MB, DIO @ 80 MHz
Firmware	v0.8.1-esp32 release assets (`bootloader.bin`, `partition-table.bin`, `ota_data_initial.bin`, `esp32-csi-node-s3-8mb.bin`) at `0x0 / 0x8000 / 0xf000 / 0x20000`
esptool	5.3.0 (host: macOS)
Aggregator	`ruvnet/wifi-densepose:latest` in Docker, UDP `0.0.0.0:5005` mapped to host
Wi-Fi	WPA2-PSK, 2.4 GHz, ch 10, RSSI -39 to -42 dBm
Network	ESP32 ↔ host on same /24, ICMP RTT ~12 ms
mmWave sensor	None physically connected (UART1 TX/RX pins floating)
NVS	Wiped before each test (`esptool erase-region 0x9000 0x6000`) and re-provisioned via the bundled `provision.py`

stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node #1135

Description

stream_sender: sendto ENOMEM in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node

Summary

Environment

What works

What fails

Bug #1 — primary: permanent sendto ENOMEM from CSI cb #2 onward

Bug #2 — secondary: phantom LD2410 detection on a floating UART

What I tried

Repro

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`stream_sender: sendto ENOMEM` in a tight loop on ESP32-S3 (v0.8.1-esp32) — 0 UDP frames ever leave the node

Bug #1 — primary: permanent `sendto ENOMEM` from CSI cb #2 onward