Prerequisites
Description
A single malformed or wrong-encoding setup request permanently wedges the agent's setup socket. The setup path is served on a ZMQ REQ/REP socket; when an incoming setupRequest fails to decode (e.g. a JSON-speaking dApp connecting to an ASN.1-configured agent, or any garbage bytes), the handler logs the decode failure and returns without sending a reply. The REP state machine is then stuck in its "must send before next recv" state, so the agent can never receive another setup request: every subsequent dApp setup times out until the whole agent process (in our deployment, the gNB) is restarted. One misconfigured client takes down dApp onboarding for the entire agent.
Steps to reproduce
- Build main (Release + tests + examples, all defaults):
./build_libe3 -c -d build -j $(nproc) -r -t
- Start the bundled example agent with its defaults (ASN.1 encoding, setup socket on tcp://*:9990):
./build/simple_agent
- From a second shell, send one undecodable setup request (raw garbage; a well-formed JSON setupRequest against the ASN.1-configured agent reproduces it identically) and observe no reply:
python3 - <<'EOF'
import zmq
ctx = zmq.Context()
s = ctx.socket(zmq.REQ); s.setsockopt(zmq.RCVTIMEO, 3000); s.setsockopt(zmq.LINGER, 0)
s.connect("tcp://127.0.0.1:9990")
s.send(b"not-a-valid-e3-setup")
try:
print("reply:", s.recv())
except zmq.error.Again:
print("no reply within 3 s")
EOF
- Now attempt a valid setup, e.g. run the bundled example dApp:
./build/simple_dapp
→ it times out waiting for the setup response. Any further setup attempt from any client does the same.
- Restart simple_agent and run ./build/simple_dapp again → setup succeeds immediately, confirming the failure in step 4 was agent-side wedged state, not networking.
Expected behavior
A setup request that fails to decode should affect only that request, never the agent. The agent should send back a best-effort negative/empty reply (the requester sees its setup rejected or times out once), log the decode failure, and keep the setup socket serviceable, so the next valid setupRequest from any dApp succeeds without restarting the agent. A single misbehaving or misconfigured client must not be able to disable dApp onboarding process-wide.
Actual behavior
There is no crash, sanitizer report, or stack trace — the failure mode is silence. On the malformed request the agent logs exactly one line and nothing else:
[E3Interface] ERROR: Failed to decode setup request; ret=20
After that, every setup attempt from any client times out with no agent-side output at all (the probe in the reproduction prints no reply within 3 s; simple_dapp hangs waiting for its setup response). The agent process stays alive and looks healthy.
Root cause is visible in src/core/e3_interface.cpp (main @ 6295811, setup loop around lines 318–335): all three early-exit branches — decode failure ("Failed to decode setup request"), wrong PDU type ("Unexpected PDU type in setup"), and variant extraction failure ("Failed to get SetupRequest from PDU") — do continue; without sending any reply on the REP socket. The ZMQ REP state machine then requires a send before the next receive, so every subsequent recv on the setup socket fails; those failures are swallowed by the loop's ret <= 0 → continue path, which is why nothing further is logged. All three branches reproduce the same wedge.
Deterministic in-tree evidence: the regression test added on the fix branch (test_setup_bad_request) times out on unfixed main and passes with the fix applied.
Build type
Release (-r)
Exact build command
./build_libe3 -c -d build -j $(nproc) -r -t
CMake feature flags (mark non-defaults)
How is libe3 being used?
Standalone — examples/simple_agent or similar
libe3 version
0.0.5 (latest main, commit 6295811)
Operating system + architecture
Ubuntu 24.04 aarch64 (container on NVIDIA GH200) — also reproduced on macOS 15 arm64; the wedge is platform-independent ZMQ REQ/REP state-machine behavior.
Compiler and version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 — primary environment; also reproduced with Apple clang version 21.0.0 (clang-2100.1.1.101) on macOS.
Test output (if relevant)
No test on current main exercises this path — that's part of the gap this report covers. Below is the regression test from the fix branch (test_setup_bad_request) run against unfixed main sources (Release, dual-encoder config, Ubuntu 24.04 aarch64):
1/1 Test #16: test_setup_bad_request ...........***Failed 5.61 sec
[FAIL] SetupChannel_garbageRequest_repliesAndChannelSurvives
Exception: tests/test_setup_bad_request.cpp:120: Assertion failed: n >= 0 (got -1 vs 0)
========================================
Results: 0 passed, 1 failed out of 1
========================================
0% tests passed, 1 tests failed out of 1
The following tests FAILED:
16 - test_setup_bad_request (Failed)
The failing assertion is the receive after the garbage request: n = -1 means no reply ever arrives and the channel is dead from that point on. The same test passes with the fix applied (verified 18/18 on the full suite, both Release dual-encoder and JSON-only configurations).
Logs / sanitizer output
Agent-side stderr around the event. Two setup requests were sent (one garbage, then one valid); note there is exactly one log line — the second request produces nothing because the wedged REP socket never receives it:
[2026-06-11 23:58:12.480] [INFO ] [E3Interface] Setup request received: 20 bytes
[2026-06-11 23:58:12.480] [ERROR] [E3Interface] Failed to decode setup request; ret=20
(No further output, indefinitely. The same applies via the sibling branches "Unexpected PDU type in setup" and "Failed to get SetupRequest from PDU" — all three continue without replying.)
Client-side, the only observable is a receive timeout (from the regression test, assertion on the post-garbage receive):
[FAIL] SetupChannel_garbageRequest_repliesAndChannelSurvives
Exception: tests/test_setup_bad_request.cpp:120: Assertion failed: n >= 0 (got -1 vs 0)
ASan/TSan/UBSan: not applicable and nothing reported — this is a deterministic protocol-state defect (ZMQ REQ/REP state machine left in send-state), not a memory or threading error. GDB backtrace: n/a, the process never crashes; attaching shows the setup thread parked in zmq_recv on the REP socket, which can no longer deliver (every receive fails until a send occurs).
Additional context
No response
Prerequisites
main.CONTRIBUTING.md.Description
A single malformed or wrong-encoding setup request permanently wedges the agent's setup socket. The setup path is served on a ZMQ REQ/REP socket; when an incoming setupRequest fails to decode (e.g. a JSON-speaking dApp connecting to an ASN.1-configured agent, or any garbage bytes), the handler logs the decode failure and returns without sending a reply. The REP state machine is then stuck in its "must send before next recv" state, so the agent can never receive another setup request: every subsequent dApp setup times out until the whole agent process (in our deployment, the gNB) is restarted. One misconfigured client takes down dApp onboarding for the entire agent.
Steps to reproduce
./build_libe3 -c -d build -j $(nproc) -r -t./build/simple_agent./build/simple_dapp→ it times out waiting for the setup response. Any further setup attempt from any client does the same.
Expected behavior
A setup request that fails to decode should affect only that request, never the agent. The agent should send back a best-effort negative/empty reply (the requester sees its setup rejected or times out once), log the decode failure, and keep the setup socket serviceable, so the next valid setupRequest from any dApp succeeds without restarting the agent. A single misbehaving or misconfigured client must not be able to disable dApp onboarding process-wide.
Actual behavior
There is no crash, sanitizer report, or stack trace — the failure mode is silence. On the malformed request the agent logs exactly one line and nothing else:
[E3Interface] ERROR: Failed to decode setup request; ret=20
After that, every setup attempt from any client times out with no agent-side output at all (the probe in the reproduction prints no reply within 3 s; simple_dapp hangs waiting for its setup response). The agent process stays alive and looks healthy.
Root cause is visible in src/core/e3_interface.cpp (main @ 6295811, setup loop around lines 318–335): all three early-exit branches — decode failure ("Failed to decode setup request"), wrong PDU type ("Unexpected PDU type in setup"), and variant extraction failure ("Failed to get SetupRequest from PDU") — do continue; without sending any reply on the REP socket. The ZMQ REP state machine then requires a send before the next receive, so every subsequent recv on the setup socket fails; those failures are swallowed by the loop's ret <= 0 → continue path, which is why nothing further is logged. All three branches reproduce the same wedge.
Deterministic in-tree evidence: the regression test added on the fix branch (test_setup_bad_request) times out on unfixed main and passes with the fix applied.
Build type
Release (-r)
Exact build command
./build_libe3 -c -d build -j $(nproc) -r -t
CMake feature flags (mark non-defaults)
How is libe3 being used?
Standalone — examples/simple_agent or similar
libe3 version
0.0.5 (latest main, commit 6295811)
Operating system + architecture
Ubuntu 24.04 aarch64 (container on NVIDIA GH200) — also reproduced on macOS 15 arm64; the wedge is platform-independent ZMQ REQ/REP state-machine behavior.
Compiler and version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 — primary environment; also reproduced with Apple clang version 21.0.0 (clang-2100.1.1.101) on macOS.
Test output (if relevant)
Logs / sanitizer output
Additional context
No response