Problem
Grafana Tempo is deployed in the monitoring namespace (v2.6.0, OTLP gRPC + HTTP, 3-day retention, dashboards ready). The observability package exists in service-runtime. But the core mesh services aren't actually shipping traces:
| Service |
OTel Instrumentation |
Ships to Tempo? |
| Cerebro (not in mesh) |
Full — otelhttp, traceparent in CloudEvents, OTLP export |
Yes |
| llm-gateway |
None — no OTel dep in go.mod |
No |
| meter |
None |
No |
| ensemble-tap |
Transitive indirect only |
No |
| gate |
Has spans, but exports to stdout only |
No |
The Loop 3 revenue flow (Tap → Pipeline → Ensemble → LLM Gateway → Meter) — the flow you'd most want to trace end-to-end — is completely invisible in the tracing backend.
Ironically, the only service shipping traces is Cerebro, which operates outside the mesh entirely.
Proposed changes
1. identityclient — propagate trace context on introspection calls
Wrap the HTTP client with otelhttp.NewTransport so that every identity introspection call carries traceparent. This means a request that hits the gateway, introspects with identity, then calls the LLM provider shows up as a single trace.
2. natsbus — inject/extract trace context in CloudEvent envelopes
Cerebro already does this (otel.GetTextMapPropagator().Inject/Extract on CloudEvent headers). Port the same pattern into natsbus.Publish and the consumer side so that traces survive async NATS hops. This connects the Tap → Pipeline → Ensemble event chain into one trace.
3. httpkit — default to otelhttp.NewMiddleware for server spans
The observability package initializes the TracerProvider; httpkit should wire it into the default middleware chain so services get server spans without manual setup.
4. Gate — switch from stdouttrace to OTLP exporter
Gate already has OTel spans; it just writes them to stdout instead of Tempo. Change the exporter config.
Why this matters
You cannot debug the revenue loop, the agent delegation chain, or the cost attribution pipeline without distributed tracing. The infrastructure exists. The shared package exists. The data isn't flowing. This is the "last mile" connection.
Context
Identified during org-wide architecture review (2026-04-12). Cerebro's implementation (internal/events/jetstream.go, internal/providers/http_client.go) is the reference pattern.
Problem
Grafana Tempo is deployed in the monitoring namespace (v2.6.0, OTLP gRPC + HTTP, 3-day retention, dashboards ready). The
observabilitypackage exists in service-runtime. But the core mesh services aren't actually shipping traces:The Loop 3 revenue flow (Tap → Pipeline → Ensemble → LLM Gateway → Meter) — the flow you'd most want to trace end-to-end — is completely invisible in the tracing backend.
Ironically, the only service shipping traces is Cerebro, which operates outside the mesh entirely.
Proposed changes
1.
identityclient— propagate trace context on introspection callsWrap the HTTP client with
otelhttp.NewTransportso that every identity introspection call carriestraceparent. This means a request that hits the gateway, introspects with identity, then calls the LLM provider shows up as a single trace.2.
natsbus— inject/extract trace context in CloudEvent envelopesCerebro already does this (
otel.GetTextMapPropagator().Inject/Extracton CloudEvent headers). Port the same pattern intonatsbus.Publishand the consumer side so that traces survive async NATS hops. This connects the Tap → Pipeline → Ensemble event chain into one trace.3.
httpkit— default tootelhttp.NewMiddlewarefor server spansThe observability package initializes the TracerProvider; httpkit should wire it into the default middleware chain so services get server spans without manual setup.
4. Gate — switch from
stdouttraceto OTLP exporterGate already has OTel spans; it just writes them to stdout instead of Tempo. Change the exporter config.
Why this matters
You cannot debug the revenue loop, the agent delegation chain, or the cost attribution pipeline without distributed tracing. The infrastructure exists. The shared package exists. The data isn't flowing. This is the "last mile" connection.
Context
Identified during org-wide architecture review (2026-04-12). Cerebro's implementation (
internal/events/jetstream.go,internal/providers/http_client.go) is the reference pattern.