Design & implement observability — OpenTelemetry (§21)

## Summary

Design and implement OpenTelemetry integration for production debugging and monitoring.

Parent issue: #105 — Tier 3, Priority #13

## Why

The turn log is good for audit; OpenTelemetry is good for understanding latency, cost, and failure patterns in real time. Production pipelines need distributed tracing, metrics (tokens used, step duration, error rates), and integration with existing observability stacks.

## Design Decisions Needed

- [ ] OTel SDK — which Rust crate? `opentelemetry` + `tracing-opentelemetry`?
- [ ] What to instrument — spans per step? Per pipeline? Per runner call?
- [ ] Attributes — step_id, runner, model, token_count, cost, error_type?
- [ ] Export configuration — YAML `observability:` block? Environment variables? OTLP endpoint?
- [ ] Metrics vs. traces vs. logs — which OTel signals to support?
- [ ] Interaction with existing `tracing` instrumentation
- [ ] Whether observability config is per-pipeline or global

## Spec Reference

- Referenced in `spec/core/s21*.md` as an exploratory feature
- `ail-core` already uses `tracing` — this builds on that foundation

## Acceptance Criteria

- [ ] Spec section authored
- [ ] OTel traces emitted for pipeline and step execution
- [ ] Key attributes (step_id, duration, model, tokens) attached to spans
- [ ] Configurable export endpoint
- [ ] Zero overhead when OTel is not configured


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design & implement observability — OpenTelemetry (§21) #123

Summary

Why

Design Decisions Needed

Spec Reference

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Design & implement observability — OpenTelemetry (§21) #123

Description

Summary

Why

Design Decisions Needed

Spec Reference

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions