#112 - Add OpenTelemetry Metrics#111
Conversation
|
| Filename | Overview |
|---|---|
| src/metrics.h | New header providing the daqiri::metrics API. Uses #if DAQIRI_ENABLE_OTEL_METRICS to switch between real declarations and inline no-op stubs; heavy implementation lives entirely in metrics.cpp which is only compiled when the CMake option is ON. |
| src/metrics.cpp | New implementation of the OTel observable-counter registry. The previous deadlock concern (RemoveCallback called while holding mutex_) is resolved — shutdown now moves instruments to locals, clears state, releases the lock, then calls RemoveCallback outside the critical section. |
| src/managers/dpdk/daqiri_dpdk_stats.cpp | Refactors xstat name parsing into a reusable parse_queue_xstat lambda and populates per-queue and per-port OTel counters from rte_eth_stats and queue-level xstats. |
| src/managers/rdma/daqiri_rdma_mgr.cpp | Adds per-thread CounterSet handle captured at thread start; increments rx/tx on work completions and add_dropped on all error paths. |
| src/managers/socket/daqiri_socket_mgr.cpp | Consolidates the pre-existing per-protocol tx_pkts_/tx_bytes_ update into a single post-send block and adds metrics::add_tx/add_dropped. No regressions to the free contract. |
| examples/grafana/otel_prometheus.cpp | Wires the OTel Prometheus exporter from an environment variable. The entire implementation body is wrapped in #if defined(DAQIRI_GRAFANA_PROMETHEUS), which is redundant since CMakeLists.txt already handles file exclusion. |
| CMakeLists.txt | Adds top-level option(DAQIRI_ENABLE_OTEL_METRICS ...) and propagates it to the pkg-config output. The same option() is duplicated in src/CMakeLists.txt. |
| src/CMakeLists.txt | Conditionally compiles metrics.cpp and links opentelemetry-cpp::api when DAQIRI_ENABLE_OTEL_METRICS is ON. Duplicates the same option() already declared in the root CMakeLists.txt. |
| examples/CMakeLists.txt | Correctly gates otel_prometheus.cpp on the Prometheus exporter target being available; falls back gracefully with a status message when the exporter is absent. |
Sequence Diagram
sequenceDiagram
participant App as Application
participant Common as daqiri::shutdown()
participant DPDK as DpdkStats::Run()
participant RDMA as rdma_thread()
participant Socket as SocketMgr TX/RX
participant Reg as metrics::Registry
participant CS as CounterSet (shared_ptr)
participant OTel as OTel SDK (collection thread)
participant Prom as Prometheus / Grafana
App->>Common: daqiri_init()
Common->>Reg: get_or_create_queue(backend, iface, port, queue)
Reg-->>Common: "shared_ptr<CounterSet>"
DPDK->>CS: set_rx_packets / set_tx_packets / set_dropped
RDMA->>CS: add_rx / add_tx / add_dropped
Socket->>CS: add_rx / add_tx / add_dropped
OTel->>Reg: observe_rx_packets callback
Reg->>Reg: snapshot_counters() [acquires mutex briefly]
Reg->>CS: rx_packets.load()
Reg-->>OTel: Observe(value, attrs)
Prom->>OTel: HTTP GET /metrics
OTel-->>Prom: "daqiri_rx_packets_total{...}"
App->>Common: daqiri::shutdown()
Common->>Reg: shutdown() [moves instruments, clears state, releases lock]
Reg->>OTel: RemoveCallback() [outside lock — no deadlock]
Reviews (5): Last reviewed commit: "#111 - Populate DPDK port metrics" | Re-trigger Greptile
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
150478a to
5129d27
Compare
Signed-off-by: Denis Leshchev <dleshchev@nvidia.com>
|
While running the Grafana example, we noticed the DPDK per-interface Prometheus series were staying at zero even though the queue-level series were moving. Specifically, the I pushed commit |
This PR adds OpenTelementry-compatible metrics for use in exporting to libraries like Prometheus and Grafana. A working example using the raw_tx_rx is provided.