Skip to content

Add OpenTelemetry distributed tracing and OTLP push metrics#212

Open
surapallykishore wants to merge 2 commits intojuspay:mainfrom
surapallykishore:feature/add-opentelemetry-distributed-tracing
Open

Add OpenTelemetry distributed tracing and OTLP push metrics#212
surapallykishore wants to merge 2 commits intojuspay:mainfrom
surapallykishore:feature/add-opentelemetry-distributed-tracing

Conversation

@surapallykishore
Copy link
Copy Markdown

Summary

  • Integrate OpenTelemetry to enable end-to-end distributed tracing via OTLP/gRPC export to any compatible backend (Jaeger, Tempo, Datadog, etc.)
  • Add W3C TraceContext propagation — traceparent headers are extracted from incoming requests and injected into outgoing HTTP calls, enabling full trace correlation across services (e.g. OMS → CG → Decision Engine → CG)
  • Add OTLP push-based metrics alongside existing Prometheus pull metrics, with a dual-record helper for incremental migration
  • Embed trace_id and span_id in JSON log output for log-trace correlation
  • Include Docker Compose stack (OTel Collector + Jaeger) for local development and testing

All telemetry is disabled by default and fully backward-compatible — no config changes are required for existing deployments.

Configuration

[telemetry.tracing]
enabled = true
otlp_endpoint = "http://otel-collector:4317"
sampling_ratio = 1.0

[telemetry.metrics]
enabled = true
otlp_endpoint = "http://otel-collector:4317"
export_interval_secs = 60
export_timeout_secs = 30

Files changed

File Change
Cargo.toml Add opentelemetry, opentelemetry_sdk, opentelemetry-otlp, opentelemetry-http, tracing-opentelemetry
src/logger/config.rs Add Telemetry config structs with serde defaults
src/config.rs Add telemetry field to GlobalConfig
src/logger/setup.rs Init TracerProvider, MeterProvider, W3C propagator, OTel tracing layer
src/logger/formatter.rs Extract trace_id/span_id into JSON log output
src/bin/open_router.rs Pass telemetry config to logger setup
src/api_client.rs Inject traceparent header into outgoing HTTP requests
src/metrics.rs Add OTel metric instruments and dual-record helper
config/development.toml Add telemetry config section (disabled)
config/otel-collector-config.yaml OTel Collector config for local dev
docker-compose.otel.yaml Collector + Jaeger stack for local testing

Test plan

  • cargo check passes
  • cargo fmt applied
  • Tested locally with OTel Collector + Jaeger — traces flow end-to-end with correct parent-child span relationships
  • Verified trace_id and span_id appear in JSON log output
  • Verified existing /metrics Prometheus endpoint still works
  • Verify no impact on existing deployments when telemetry is disabled (default)

   Integrate OpenTelemetry into the decision-engine to enable end-to-end
   distributed tracing and push-based metrics export via OTLP/gRPC,
   while preserving the existing Prometheus pull metrics endpoint.
Integrate OpenTelemetry to enable end-to-end distributed tracing and
push-based metrics export via OTLP/gRPC. Existing Prometheus pull
metrics are fully preserved — this is a purely additive change.
@surapallykishore
Copy link
Copy Markdown
Author

Hey @tinu-hareesswar , @aniket-agrawal
I'm from pinelabs engg team, as you are aware that we are already using decision-engine in our production environment. I made a few changes which includes distributed tracing and adding traceId and spanId to the logs. could you please check and lmk if I' m missing something. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant