Skip to content

Decouple dApp-report handling from the inbound thread via a reusable LockFreeQueue#13

Merged
Thecave3 merged 2 commits into
mainfrom
fix-report-path
Jun 12, 2026
Merged

Decouple dApp-report handling from the inbound thread via a reusable LockFreeQueue#13
Thecave3 merged 2 commits into
mainfrom
fix-report-path

Conversation

@Ninjabippo1205

@Ninjabippo1205 Ninjabippo1205 commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Moves handle_dapp_report() off the RAN inbound (ZMQ-recv) thread so downstream OAI/iApp work can no longer stall inbound reads. Reports are handed to a dedicated worker thread through a bounded lock-free queue.

What changed

  • Report path decoupling: the RAN inbound loop pushes each DAppReport onto a report_queue_; a dedicated report_worker_thread_ drains it and invokes handle_dapp_report(). The worker is started only on the RAN role (the side that receives dApp reports).
  • Reusable queue wrapper: the Pdu-specific ResponseQueue is generalized into a header-only LockFreeQueue<T> (adaptive 3-phase spin-wait blocking pop + shutdown() semantics over the existing MpmcQueue<T>). Both paths now share it: LockFreeQueue<Pdu> (outbound) and LockFreeQueue<DAppReport> (inbound reports). No back-compat alias — renamed throughout.
  • The report worker reuses the queue's blocking pop(timeout) + shutdown(), mirroring the outbound loop.

Tests

  • test_response_queue.cpp -> test_lockfree_queue.cpp (now LockFreeQueue<Pdu>, plus a DAppReport specialisation case).
  • test_report_drop.cpp: unit tests for the report queue (FIFO, explicit overflow, burst-producer/slow-consumer, multi-producer no-loss/no-dup) against the production LockFreeQueue<DAppReport> wrapper.
  • test_e2e_report_path.cpp: end-to-end integration test driving a real E3Agent + fake dApp ZMQ peer through the full inbound pipeline to the registered handler.

Note

The original ZMQ_CONFLATE removal from this branch is dropped — it already landed on main independently (commit 398a17e, "fix ZMQ multi-dApp fan-out"). This PR is now strictly the worker-thread decoupling + queue refactor.

@github-actions

github-actions Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

📊 MPMC Queue Benchmark Results (commit 475d655)

MpmcQueue End-to-End Latency (SPSC, 30000 items per queue size)

Queue Capacity P50 (ns) P95 (ns) P99 (ns) P99.9 (ns)
16 684 995 1075 8218
64 2768 4051 4441 26502
256 2307 8458 26071 39085
1024 35419 39276 43734 44035
4096 27223 46740 47552 47732

LockFreeQueue End-to-End Latency (SPSC, 30000 items per queue size)

Queue Capacity P50 (ns) P95 (ns) P99 (ns)
16 1035 2207 2388
64 3920 7076 7276
256 18507 28957 33155
1024 113176 144455 159292
4096 486782 510627 512480

Throughput (400000 items per configuration)

Configuration Queue Cap Throughput (Mops/s)
SPSC (1P×1C) 256 15.23
MPSC (4P×1C) 1024 11.07
MPMC (4P×4C) 4096 13.25
MPMC (8P×4C) 4096 10.62

Stress / Correctness (100000 items each)

Test Items Result
SPSC (1P×1C) 100000 ✅ PASS
MPSC (4P×1C) 100000 ✅ PASS
MPMC (4P×4C) 100000 ✅ PASS
MPMC (8P×8C) 100000 ✅ PASS

Benchmarked on ubuntu-latest, Release build

@Ninjabippo1205 Ninjabippo1205 requested a review from Thecave3 April 29, 2026 17:45
@Thecave3

Copy link
Copy Markdown
Collaborator

Not sure which PR is fixing this now. The comments of the other PRs applies here as well

@Thecave3 Thecave3 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some things are not valid anymore, anyway don't use directly the mcpm queue but use the same logic of the response queue

@Thecave3 Thecave3 marked this pull request as draft June 4, 2026 22:51
@Ninjabippo1205 Ninjabippo1205 deleted the fix-report-path branch June 11, 2026 19:01
@Thecave3 Thecave3 restored the fix-report-path branch June 11, 2026 21:03
@Thecave3 Thecave3 reopened this Jun 11, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

⏱️ Full-loop Latency Benchmark (commit 475d655)

Full-loop latency benchmark (N=1023 after 50 warmup)

All values in microseconds (us). Transport: ZMQ over IPC, encoding: ASN.1 APER.

Phase mean p50 p99 max
1. Collect indication data 0 0 0 10
2. Create & encode indication 0 0 1 16
3. Deliver indication (RAN -> dApp) 97 95 161 178
4. Decode indication 0 0 0 11
5. Process data 0 0 0 0
6. Create & encode control 0 0 0 10
7. Deliver control (dApp -> RAN) 108 105 179 228
8. Decode & handle control 0 0 0 15
Total round-trip 209 195 327 358

Benchmarked on ubuntu-latest, Release build, ZMQ + IPC, ASN.1 APER.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🔄 E2E dApp Integration Results (commit 475d655)

posix/ipc

  • dApp exit: 0
  • Indications received: 7

posix/tcp

  • dApp exit: 0
  • Indications received: 7

zmq/ipc

  • dApp exit: 0
  • Indications received: 7

zmq/tcp

  • dApp exit: 0
  • Indications received: 7

example_simple_agent + example_simple_dapp on ubuntu-latest, Release build

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🔀 E2E Topologies — multi-dApp / multi-RAN (commit 475d655)

zmq/ipc

  • 1 RAN - 1 dApp: indications=5
    • dapp peer=t11 ran=ran-solo sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.4 max=1 @seq=0) hist[<=1:5 2-5:0 6-10:0 >10:0]
  • 1 RAN - 2 dApps: dApp#1 ind=5 sub=1, dApp#2 ind=6 sub=2, RAN saw 2 dApps
    • dapp1 peer=t12 ran=ran-shared sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.6 max=1 @seq=2) hist[<=1:5 2-5:0 6-10:0 >10:0]
    • dapp2 peer=t12 ran=ran-shared sub=2 indications=6 seq=[0..5] dropped=0 (0%) age_ms(avg=0.5 max=1 @seq=3) hist[<=1:6 2-5:0 6-10:0 >10:0]
  • 2 RANs - 1 dApp: from ran-a ind=5, from ran-b ind=5
    • dapp peer=t2a ran=ran-a sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.6 max=1 @seq=2) hist[<=1:5 2-5:0 6-10:0 >10:0]
    • dapp peer=t2b ran=ran-b sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.2 max=1 @seq=4) hist[<=1:5 2-5:0 6-10:0 >10:0]

zmq/tcp

  • 1 RAN - 1 dApp: indications=5
    • dapp peer=default ran=ran-solo sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.4 max=1 @seq=3) hist[<=1:5 2-5:0 6-10:0 >10:0]
  • 1 RAN - 2 dApps: dApp#1 ind=5 sub=1, dApp#2 ind=6 sub=2, RAN saw 2 dApps
    • dapp1 peer=default ran=ran-shared sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0 max=0 @seq=0) hist[<=1:5 2-5:0 6-10:0 >10:0]
    • dapp2 peer=default ran=ran-shared sub=2 indications=6 seq=[0..5] dropped=0 (0%) age_ms(avg=0 max=0 @seq=0) hist[<=1:6 2-5:0 6-10:0 >10:0]
  • 2 RANs - 1 dApp: from ran-a ind=5, from ran-b ind=5
    • dapp peer=default ran=ran-a sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0.2 max=1 @seq=0) hist[<=1:5 2-5:0 6-10:0 >10:0]
    • dapp peer=off100 ran=ran-b sub=1 indications=5 seq=[0..4] dropped=0 (0%) age_ms(avg=0 max=0 @seq=0) hist[<=1:5 2-5:0 6-10:0 >10:0]

example_simple_agent + example_simple_dapp on ubuntu-latest, Release. Indication age is report-only.

@Thecave3 Thecave3 changed the title Modified report handling and added tests Decouple dApp-report handling from the inbound thread via a reusable LockFreeQueue Jun 12, 2026
Two races on the shared /tmp/dapps IPC directory caused
test_e2e_report_path to fail intermittently in CI (agent.start()
returning CONNECTION_FAILED) under `ctest --parallel`, while passing
in serial local runs:

1. The ZMQ connector treated any mkdir(/tmp/dapps) failure as fatal.
   Concurrent agents race between stat() and mkdir(); the loser gets
   EEXIST. Now ignore EEXIST, matching the POSIX connector.

2. Both connectors called rmdir(IPC_BASE_DIR) on dispose. Since the
   directory is shared, a finishing agent removed it out from under
   another agent that had already stat()'d it and was about to bind a
   socket inside it, making that bind fail with ENOENT. Drop the rmdir;
   leaving the empty shared directory behind is harmless.
@Thecave3 Thecave3 marked this pull request as ready for review June 12, 2026 00:39
@Thecave3 Thecave3 merged commit 6295811 into main Jun 12, 2026
15 checks passed
@Thecave3 Thecave3 deleted the fix-report-path branch June 12, 2026 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants