Skip to content

Wire OTel trace propagation through identityclient, natsbus, and httpkit #33

@haasonsaas

Description

@haasonsaas

Problem

Grafana Tempo is deployed in the monitoring namespace (v2.6.0, OTLP gRPC + HTTP, 3-day retention, dashboards ready). The observability package exists in service-runtime. But the core mesh services aren't actually shipping traces:

Service OTel Instrumentation Ships to Tempo?
Cerebro (not in mesh) Full — otelhttp, traceparent in CloudEvents, OTLP export Yes
llm-gateway None — no OTel dep in go.mod No
meter None No
ensemble-tap Transitive indirect only No
gate Has spans, but exports to stdout only No

The Loop 3 revenue flow (Tap → Pipeline → Ensemble → LLM Gateway → Meter) — the flow you'd most want to trace end-to-end — is completely invisible in the tracing backend.

Ironically, the only service shipping traces is Cerebro, which operates outside the mesh entirely.

Proposed changes

1. identityclient — propagate trace context on introspection calls

Wrap the HTTP client with otelhttp.NewTransport so that every identity introspection call carries traceparent. This means a request that hits the gateway, introspects with identity, then calls the LLM provider shows up as a single trace.

2. natsbus — inject/extract trace context in CloudEvent envelopes

Cerebro already does this (otel.GetTextMapPropagator().Inject/Extract on CloudEvent headers). Port the same pattern into natsbus.Publish and the consumer side so that traces survive async NATS hops. This connects the Tap → Pipeline → Ensemble event chain into one trace.

3. httpkit — default to otelhttp.NewMiddleware for server spans

The observability package initializes the TracerProvider; httpkit should wire it into the default middleware chain so services get server spans without manual setup.

4. Gate — switch from stdouttrace to OTLP exporter

Gate already has OTel spans; it just writes them to stdout instead of Tempo. Change the exporter config.

Why this matters

You cannot debug the revenue loop, the agent delegation chain, or the cost attribution pipeline without distributed tracing. The infrastructure exists. The shared package exists. The data isn't flowing. This is the "last mile" connection.

Context

Identified during org-wide architecture review (2026-04-12). Cerebro's implementation (internal/events/jetstream.go, internal/providers/http_client.go) is the reference pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions