Summary
Design and implement OpenTelemetry integration for production debugging and monitoring.
Parent issue: #105 — Tier 3, Priority #13
Why
The turn log is good for audit; OpenTelemetry is good for understanding latency, cost, and failure patterns in real time. Production pipelines need distributed tracing, metrics (tokens used, step duration, error rates), and integration with existing observability stacks.
Design Decisions Needed
Spec Reference
- Referenced in
spec/core/s21*.md as an exploratory feature
ail-core already uses tracing — this builds on that foundation
Acceptance Criteria
Summary
Design and implement OpenTelemetry integration for production debugging and monitoring.
Parent issue: #105 — Tier 3, Priority #13
Why
The turn log is good for audit; OpenTelemetry is good for understanding latency, cost, and failure patterns in real time. Production pipelines need distributed tracing, metrics (tokens used, step duration, error rates), and integration with existing observability stacks.
Design Decisions Needed
opentelemetry+tracing-opentelemetry?observability:block? Environment variables? OTLP endpoint?tracinginstrumentationSpec Reference
spec/core/s21*.mdas an exploratory featureail-corealready usestracing— this builds on that foundationAcceptance Criteria