Skip to content

feat(eth): port mesh/http to RP2350 + W5500 (HTTP/80 + HTTPS/443)#10573

Open
cvaldess wants to merge 12 commits into
meshtastic:developfrom
cvaldess:feature/eth-tls-api
Open

feat(eth): port mesh/http to RP2350 + W5500 (HTTP/80 + HTTPS/443)#10573
cvaldess wants to merge 12 commits into
meshtastic:developfrom
cvaldess:feature/eth-tls-api

Conversation

@cvaldess
Copy link
Copy Markdown
Contributor

Summary

Ports the phone-API endpoints that today only ship on ESP32 (mesh/http/ContentHandler) over to RP2350 boards with W5500 Ethernet. Adds:

  • HTTP server on TCP/80 exposing /api/v1/{fromradio,toradio} over the existing arduino-libraries/Ethernet stack.
  • HTTPS server on TCP/443 reusing the same handlers via a IStreamReadWrite abstraction and an mbedTLS source-compiled TLS engine with a self-signed ECDSA P-256 cert (SAN=IP) persisted under LittleFS.

The new feature is off by default — gated by HAS_ETHERNET_API (HTTP/80) and HAS_ETHERNET_TLS_API (HTTPS/443). No upstream variant enables these flags in this PR; W5500 variants can opt-in independently.

Why two servers and not Mongoose / esp32_https_server?

The existing mesh/http/ implementation depends on esp32_https_server, which is not portable to arduino-pico, and Mongoose's built-in TCP/IP stack conflicts with the active EthernetServer used by OTA / MQTT / NTP. The cleanest path was to keep using arduino-libraries/Ethernet for the socket layer and bring our own HTTP/TLS code on top, sharing handlers between the two transports.

Architecture

                +----------+         +----------+
                | TCP/80   |         | TCP/443  |
                +----+-----+         +----+-----+
                     |                    | mbedtls_ssl_read/write
                     v                    v
       EthernetClientStream         TlsClientStream
                     \                  /
                      v                v
                   IStreamReadWrite (Print)
                              |
                              v
                       parseRequest()
                              |
                              v
                    handleApiClient()
                       |          |
                       v          v
                handleFromRadio  handleToRadio  (-> PhoneAPI)
  • ethApiHandlers.{h,cpp} — Request struct, parser (CORS + Content-Length), EthHttpAPI : PhoneAPI (api_type = TYPE_HTTP, checkIsConnected = true).
  • ethApiServer.{h,cpp} — listener on TCP/80, accept loop on its own OSThread (20 ms tick).
  • ethTlsApiServer.{h,cpp} — listener on TCP/443, same OSThread pattern, wraps EthernetClient in an mbedtls_ssl_context.
  • ethCert.{h,cpp} — first-boot ECDSA P-256 keygen + self-signed cert with KeyUsage(digitalSignature + keyEncipherment, critical) + ExtendedKeyUsage(serverAuth, critical). Generation runs once on a dedicated OSThread; result persisted under /prefs/eth_*.der and reloaded on subsequent boots.
  • scripts/add_mbedtls_sources.py — vendors mbedTLS 3.6.2 (the copy already in pico-sdk) via BuildSources() so the build doesn't depend on a precompiled lib that arduino-pico doesn't ship.
  • src/mbedtls_user_config.h — strips TIME / NET_IO / FS_IO / TIMING / PSA storage, defines MBEDTLS_NO_PLATFORM_ENTROPY (the platform check in entropy_poll.c is not flag-gated). Keygen and the f_rng path use pico-sdk's get_rand_64() directly, bypassing mbedtls_entropy.
  • rp2040Loop() — adds an RP2xx0 hardware watchdog (watchdog_enable(8000, true)) pet from the main loop. Required because the TLS handshake on Chrome can hold the OSThread inside mbedtls_ssl_read() past 8 s; the netRecv poll loop pets the watchdog every 2 ms so a quiet client cannot starve it.

Performance

HTTP/1.1 keep-alive across both servers — handleApiClient loops on the same connection until the peer closes or parseRequest hits its 3 s idle timeout. Measured against client.meshtastic.org doing its initial sync (~80 sequential /api/v1/fromradio polls = MyInfo + every Config_* + every ModuleConfig_* + every NodeInfo):

  • Before keep-alive: 80 × ~625 ms ECDSA P-256 handshake = ~50 s of pure handshake overhead.
  • After: one handshake (641 ms, CHACHA20-POLY1305), all 80 requests pipelined, full config + 40 NodeInfos in ~6-8 s. Per-request post-handshake latency ~10-25 ms.

Validation

Validated end-to-end against a WIZnet W5500-EVB-Pico2 + E22-900M30S board on the local network:

  • OPTIONS /api/v1/fromradio → 204 with CORS + X-Protobuf-Schema.
  • PUT /api/v1/toradio (ToRadio{want_config_id=1}) → 200 + echo.
  • GET /api/v1/fromradio?all=true → 200 + ~3500 B (config + channels + nodeinfos).
  • GET /api/v1/nonexistent → 404.
  • HTTPS handshake via curl -kv and via Firefox / Chrome / Edge after accepting the self-signed cert; curl with two URLs reports Reusing existing https: connection and the server returns Connection: keep-alive explicitly.
  • Watchdog reboot stress: held a Chrome Test Connection open through idle periods > 8 s — no watchdog reset, no socket exhaustion across back-to-back requests.

Cert generation timing: ~430 ms ECDSA P-256 cold boot on the dedicated OSThread (off the Periodic stack, off the Ethernet reconnect path).

Build impact

Numbers from the wiznet variant on a downstream build that enables both flags:

  • HTTP-only (HAS_ETHERNET_API): +~11 KB flash, +1.8 KB RAM.
    • HTTPS (HAS_ETHERNET_TLS_API): +~600 KB flash (mbedTLS + cert path), +~50 KB RAM. The cert key + DER blobs are heap-allocated only during gen and freed after persist.

Not included

  • No variant in this PR enables HAS_ETHERNET_API / HAS_ETHERNET_TLS_API. Activation lives in the W5500 variant configs (separate PRs).
  • No OTA endpoints over HTTP/HTTPS. The existing HAS_ETHERNET_OTA path lives in feat: add Ethernet OTA support for RP2350/W5500 boards #10136 and is unchanged. A follow-up commit can route OTA through handleApiClient once that lands.

Files

  • New: scripts/add_mbedtls_sources.py, src/mbedtls_user_config.h, src/mesh/eth/{ethApiServer,ethApiHandlers,ethCert,ethTlsApiServer}.{cpp,h}.
  • Modified: src/mesh/eth/ethClient.cpp (2 init/loop hooks gated by the new flags), src/main.cpp + src/main.h (RP2040 loop hook), src/platform/rp2xx0/main-rp2xx0.cpp (watchdog enable + poke).

Net diff: 14 files changed, 1353 insertions(+), 1 deletion(-).

cvaldess added 11 commits May 29, 2026 00:00
First step of porting mesh/http/ to RP2350 + W5500 (today ESP32-only).
Phase 0 stands up the listener over the existing arduino-libraries/Ethernet
stack (no Mongoose — its built-in TCP/IP would conflict with EthernetServer
and break OTA/MQTT/NTP). Skeleton parses request line, logs it, replies 503.
Real handlers (/api/v1/{info,fromradio,toradio}) come in later phases.

- New mesh/eth/ethApiServer.{h,cpp} gated by HAS_ETHERNET_API
- Wired into ethClient init+loop in parallel with existing ethHttpOTA (port 4244)
- Enabled in both wiznet_5500_evb_pico2_e22p and pico2_w5500_e22 variants
- Build impact on wiznet: +~9 KB flash (63.0% used), <1 KB RAM
…honeAPI

Replace phase 0 skeleton with the real handlers that bridge the HTTP transport
to PhoneAPI, mirroring mesh/http/ContentHandler (ESP32) semantics.

- EthHttpAPI : public PhoneAPI (api_type=TYPE_HTTP, checkIsConnected=true),
  outside the MESHTASTIC_EXCLUDE_WEBSERVER gate so it builds on RP2350.
- Minimal HTTP parser: method/path/query/Content-Length, no allocations beyond
  Arduino String, capped at 32 header lines x 256 bytes (anti-DoS).
- OPTIONS preflight -> 204 with CORS + X-Protobuf-Schema.
- GET /api/v1/fromradio[?all=true]: stream-write up to 64 protobufs from
  webAPI.getFromRadio(), Connection: close framing (HTTP/1.0 style).
- PUT /api/v1/toradio: read Content-Length bytes (<=512), call
  webAPI.handleToRadio(), echo body back.
- 404 default for unknown paths, 405 for wrong method, 400 for bad body.

Validated e2e against wiznet @ 192.168.1.143:
- OPTIONS /api/v1/fromradio -> 204 + correct CORS headers
- PUT ToRadio{want_config_id=1} (2 bytes) -> 200 + echo
- GET /api/v1/fromradio?all=true -> 200 + 3476 bytes (config+channels+nodeinfo)
- GET /api/v1/nonexistent -> 404 unknown endpoint
- OTA HTTP on port 4244 untouched and still responsive

Build impact on wiznet_5500_evb_pico2_e22p: +2 KB flash (63.1%), +1.8 KB RAM
(17.6%) vs phase 0.
The API server was being polled from the Ethernet client Periodic which runs
every 5s. Measured impact before this fix on wiznet @ 192.168.1.143:

  req 1: ttfb=6.458s total=6.509s    (matches the 5s tick + handler overhead)
  req 2: connection refused          (W5500 sockets exhausted)
  req 3: connection refused

After: same hardware, same network, back-to-back requests:

  req 2: ttfb=0.287s
  req 3: ttfb=0.015s
  req 4: ttfb=0.022s
  req 5: ttfb=0.020s

The web client (meshtastic/web served from localhost) was visibly stalling
mid-handshake — it had pulled 109 nodes + 19 messages but channels and device
info never arrived. With the periodic-only polling, every request takes 5s
and the W5500's 4 hardware sockets fill up under the burst.

EthApiServerThread mirrors the WebServerThread pattern from ESP32
mesh/http/WebServer.cpp: adaptive interval — 20ms when there's recent
traffic, 100ms after 5s of idle, 500ms after 30s. Auto-registers with the
OSThread scheduler on construction.

ethHttpOTA still ticks from the periodic; left unchanged because OTA is one
large transfer that tolerates the latency, and minimizing scope here to one
behaviour change at a time.
…rite

Phase 2.0 of the TLS port — separate transport from request handling so
the upcoming HTTPS server can reuse the exact same parser + routing logic.

- New ethApiHandlers.{h,cpp}: Request, parser, CORS helpers, fromradio/toradio
  handlers, EthHttpAPI : PhoneAPI subclass. All driven by a single
  IStreamReadWrite interface that inherits Print (so client.print(...) keeps
  working transparently).
- ethApiServer.cpp slimmed to ~90 LOC: now just the EthernetServer(80) +
  OSThread + an EthernetClientStream adapter that forwards reads/writes to
  the underlying EthernetClient. No behaviour change.

Validated on wiznet @ 192.168.1.143: 5 curl requests TTFB 18-110ms (same
as pre-refactor), protobuf round-trip PUT 200 + GET ?all=true 3499 bytes.
Flash impact: +200 bytes (63.2%); RAM unchanged (17.6%).
mbedTLS-based cert generation module that produces a SAN=IP self-signed
ECDSA P-256 server certificate, persisted under LittleFS so subsequent
boots reuse the same key. Generation runs once on a dedicated OSThread
so the ECDSA keygen path (~430 ms) never blocks the Periodic stack or
the Ethernet reconnect loop.

  - mbedTLS 3.6.2 sources compiled in via scripts/add_mbedtls_sources.py
    (BuildSources of pico-sdk/lib/mbedtls/library/*.c, all 108 files).
    MBEDTLS_USER_CONFIG_FILE injected as CPPDEFINES tuple in the script —
    build_flags shell escape mangles the embedded quotes on Windows.
  - src/mbedtls_user_config.h undefs MBEDTLS_HAVE_TIME, HAVE_TIME_DATE,
    TIMING_C, NET_C, FS_IO, PSA_ITS_FILE_C, PSA_CRYPTO_STORAGE_C, and
    defines MBEDTLS_NO_PLATFORM_ENTROPY (entropy_poll.c uses a raw
    platform check, not gated by a flag).
  - src/mesh/eth/ethCert.{h,cpp}: ECDSA P-256 keypair + cert with
    SAN(IP=current) using pico-sdk get_rand_64() directly as f_rng,
    bypassing mbedtls_entropy. DER buffers heap-allocated to keep the
    OSThread stack within budget. Cert + key + ip persisted under
    /prefs/eth_*.der so subsequent boots reuse the same identity.
  - Gated by HAS_ETHERNET_TLS_API. Standalone phase: generation only;
    HTTPS server (TCP/443) wired up in the follow-up commit.
Brings up an mbedTLS server on port 443 that defers to the ECDSA P-256
self-signed cert produced by [[ethCert]]. Reuses the HTTP request /
response handlers from [[ethApiHandlers]] via the IStreamReadWrite
interface — there is exactly one code path for routing / CORS / PhoneAPI
integration regardless of whether the transport is plain TCP or TLS.

EthTlsApiServerThread (OSThread):
  - Phase A: poll isEthCertReady() every 500 ms while the cert worker
    runs. Once true, parse cert chain + key, build ssl_config (TLS server,
    stream transport, default preset, VERIFY_NONE since we are the cert
    issuer), install own cert, run ssl_setup, bind tlsServer on 443.
  - Phase B: standard adaptive accept loop (20 / 100 / 500 ms tick),
    identical to the plain-HTTP server.

Per-connection flow:
  1. session_reset on the static ssl context (1 in-flight session — multi-
     session pool is Phase 3 if needed)
  2. set_bio routes mbedtls I/O to two C callbacks (netSend / netRecv)
     that bridge to the live EthernetClient via the void* ctx
  3. handshake loop (sync, blocking, with 10 s recv timeout) — mbedTLS
     errors are logged with mbedtls_strerror
  4. wrap (ssl, client) in MbedTlsStream → handleApiClient(stream)
  5. close_notify + client.stop

Stack budget continues the Phase 2.1-bis discipline: every large buffer
(ssl context, cert chain, pk_key, ssl_config) lives in BSS as a static
global. The OSThread stack only holds the per-connection EthernetClient
adapter and small mbedtls return codes.

Validated on wiznet 192.168.1.143:
  - cert load-from-FS path: 13 ms (regen skipped on second boot)
  - cert gen path: 290 ms (first boot only)
  - ssl_setup chain: parse cert, parse key, ssl_config_defaults,
    conf_own_cert, ssl_setup all return 0
  - end-to-end: curl -k https://192.168.1.143/api/v1/fromradio → 200 OK
    with application/x-protobuf, CORS, X-Protobuf-Schema headers, server
    initiates close_notify cleanly

Footprint vs 2.0:
  - flash 70.5% → 87.3% (+220 KB for mbedtls_ssl + x509 server-side code)
  - RAM static 17.6% → 19.8% (+11 KB BSS for ssl_context + cert chain)
  - heap at runtime adds ~32 KB per active session (mbedtls in/out
    record buffers, default MBEDTLS_SSL_*_CONTENT_LEN=16384)

Next: validate from Firefox direct (https://192.168.1.143 → warning
accept → JSON visible), then from client.meshtastic.org hosted to
confirm the mixed-content block is gone.
…er compat

Two browser-compat fixes that surfaced in Firefox validation:

1. Cert v1 only had Basic Constraints + SAN. NSS / Firefox refuse to
   treat a cert as a TLS server cert without an Extended Key Usage
   extension naming id-kp-serverAuth since 2023 — the error surfaces as
   a non-overridable 'Secure Connection Failed' with no 'Accept the
   Risk' path. Add KeyUsage(digitalSignature + keyEncipherment, critical)
   and ExtendedKeyUsage(serverAuth, critical). Bump cert/key/ip file
   paths to '_v2' so live boards regenerate on next start instead of
   loading a v1 cert that the browser silently refuses.

2. pico-sdk mbedtls defines MBEDTLS_SSL_PROTO_TLS1_3 in its default
   config, but the server-side 1.3 plumbing in this vendored build is
   incomplete: Firefox and openssl-3.5's s_client default to 1.3 and
   the handshake dies 4 ms in with MBEDTLS_ERR_ERROR_GENERIC_ERROR
   (-0x0001). curl/SChannel happened to default to 1.2 so it masked the
   issue earlier. Cap min/max_tls_version to TLS 1.2 — clients downgrade
   transparently and we keep the ECDHE-ECDSA + AES-GCM / CHACHA20-POLY1305
   suites that already work end-to-end.

Validated on wiznet 192.168.1.143:
  - cert v2 dumps with the 4 extensions visible (openssl x509 -text):
    BasicConstraints CA:FALSE, KeyUsage(critical) digitalSignature +
    keyEncipherment, ExtendedKeyUsage(critical) serverAuth, SAN IP.
  - openssl s_client (no version flag): downgrades to TLS 1.2, handshake
    completes, verify_return=18 (self-signed, expected).
  - Firefox: warning self-signed -> Advanced -> Accept Risk -> handshake
    OK in 632 ms with CHACHA20-POLY1305, request reaches handleApiClient
    and returns the protobuf body.
…er request

Before this change /fromradio used 'Connection: close' framing (HTTP/1.0
style with no Content-Length), forcing each poll to redo the full TLS
handshake. client.meshtastic.org needs ~80 sequential /fromradio polls
during initial sync (one per packet from the config-replay state
machine: MyInfo, channels, every Config_*, every ModuleConfig_*, every
NodeInfo), so the user-visible load time was ~50 s of pure handshake
overhead (80 requests * ~625 ms ECDSA P-256 each).

Three coordinated changes:

1. handleApiClient() now loops on the same connection until the peer
   closes or parseRequest hits its 3 s idle timeout. requestsServed
   counter keeps the 'bad/timeout request' debug log from firing on
   the natural idle close after a keep-alive sequence.

2. handleFromRadio() buffers all packets into a std::vector before
   writing, so it can emit a real Content-Length and 'Connection:
   keep-alive'. Buffer is dynamic — common 1-packet response only
   allocates ~256 B; ?all=true keeps the 64-packet cap which tops out
   around 16 KB. handleToRadio + sendPreflight switched to keep-alive
   too (they already had real Content-Length). sendError keeps close —
   errors are terminal.

3. ethTlsApiServer netRecv RECV_TIMEOUT_MS dropped from 10 s to 3 s so
   mbedtls_ssl_read can't outlast the handler's idle deadline (a quiet
   browser leaving the socket open would otherwise wedge the OSThread
   for 10 s past the natural close).

Measured on wiznet 192.168.1.143 against client.meshtastic.org:
  - one TLS handshake (641 ms, CHACHA20-POLY1305)
  - ~80 requests pipelined over the same session
  - full config + 40 NodeInfos in ~6-8 s (vs ~50 s before)
  - per-request latency post-handshake: ~10-25 ms
  - curl -kv with two URLs: 'Reusing existing https: connection',
    server returns 'Connection: keep-alive' explicitly.
…rowsers

Phase 3 keep-alive shipped a working Firefox flow but client.meshtastic.org
loops + Chrome 'Test Connection' both rebooted the board. Four distinct
issues; collectively they kept the OSThread inside serveClient() too long
or spin-looping without yielding, and pico-sdk mbedtls' TLS 1.3 code path
choked on Chrome's modern ClientHello.

1. Watchdog reset during keep-alive idle. Once a client drained the
   replay queue and entered 3 s poll mode, the OSThread sat inside
   netRecv()'s busy-wait waiting for the next request. Two consecutive
   3 s waits plus prior handler time crossed the 8 s RP2350 hardware
   watchdog. Pet the watchdog inside netRecv()'s poll loop (every 2 ms)
   so a quiet client can never starve the watchdog. Same fix in the
   ethApiHandlers per-request yield path.

2. Cap session at 64 requests + yield() between. Defense-in-depth: a
   pathological client can't monopolize serveClient indefinitely; after
   the cap it just re-handshakes (~625 ms), still vastly cheaper than
   the per-request handshake we had before keep-alive.

3. TLS 1.3 code compiled out of mbedtls entirely
   (#undef MBEDTLS_SSL_PROTO_TLS1_3 in mbedtls_user_config). Capping
   max_tls_version=TLS1_2 at runtime is enough for Firefox / openssl
   (they downgrade cleanly), but Chrome's ClientHello carries TLS 1.3
   extensions — post-quantum key shares, Encrypted ClientHello,
   etc. — that the vendored mbedtls 1.3 parser crashes on before the
   downgrade decision happens. Removing the 1.3 sources sidesteps the
   parser; ServerHello just announces TLS 1.2 and Chrome accepts.

4. netSend infinite WANT_WRITE spin. When W5500's TX buffer momentarily
   filled mid-handshake (Chrome draining slower than Firefox during
   ServerKeyExchange), EthernetClient::write() returned 0, our netSend
   returned MBEDTLS_ERR_SSL_WANT_WRITE without delay, mbedtls retried
   immediately, repeat at ~180k iter/sec until ... well, until the
   board's other threads got nothing done. Log signature: ret=-0x6880
   tight-looping in the handshake iter trace. Rewrite netSend to block
   with delay(2) + watchdog_update() and a 3 s timeout — same shape as
   netRecv. Return MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY on disconnect
   (was incorrectly returning WANT_WRITE).

Also added granular per-iter handshake logging gated on first 20 iters
+ every 50th after that, so any future regression localizes itself in
COM9 without RTT JLink.

Validated on wiznet 192.168.1.143:
  - Firefox: client.meshtastic.org full sync + idle poll stable (no
    reset during the previously-crashing 'replay drain complete' phase)
  - Chrome: 'Test Connection' accepts the cert prompt and connects
  - Edge: same as Chrome
  - openssl s_client default + tls1_2 forced: both negotiate TLS 1.2
    with ECDHE-ECDSA + AES-GCM / CHACHA20-POLY1305, verify=18 (self-
    signed, expected)
The cert pipeline + TLS context init had step-by-step LOG_INFOs and
ubiquitous Serial.flush() that were essential while diagnosing the
Phase 2.1-bis stack overflow, the Chrome handshake crash, and the
keep-alive watchdog reset. Once those bugs were fixed the logs just
clutter COM9 on every boot.

Kept on hand:
  - cert: 'loaded from FS', 'generating ...', 'generated N B in T ms',
    'persisted to LittleFS', plus all LOG_ERROR / LOG_WARN paths
  - tls: 'server listening on TCP port 443', 'client connected from',
    'handshake OK in N ms ciphersuite=', 'handshake failed -0xXXXX (...)',
    plus all init LOG_ERRORs

Dropped:
  - cert: 'step 1/8 pk_setup' through 'step 8/8 copy key DER', 'thread
    woke', 'pipeline OK' (now silent on success)
  - tls: 'cert is ready, initializing', 'parsing cert chain', 'parsing
    key', 'ssl_config_defaults', 'conf_own_cert', 'ssl_setup',
    'server worker scheduled', and the per-iter handshake trace
  - obsolete 'Optional mbedtls debug bridge' commented-out stub
  - all Serial.flush() that were added defensively for the
    debug-the-crash phase

Bin shrinks ~40 KB (logs + format strings). Validated on wiznet
192.168.1.143: HTTP 200 round-trip works post-flash, no regression.
@github-actions github-actions Bot added the hardware-support Hardware related: new devices or modules, problems specific to hardware label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hardware-support Hardware related: new devices or modules, problems specific to hardware

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant