Skip to content

Add comprehensive logging and tracing observability with OpenTelemetry#6

Draft
ryanhill4L wants to merge 1 commit intomainfrom
terragon/add-logging-tracing-provider
Draft

Add comprehensive logging and tracing observability with OpenTelemetry#6
ryanhill4L wants to merge 1 commit intomainfrom
terragon/add-logging-tracing-provider

Conversation

@ryanhill4L
Copy link
Copy Markdown
Owner

Summary

  • Introduces a full observability stack with structured logging, distributed tracing, and monitoring
  • Adds OpenTelemetry integration with support for Jaeger, Zipkin, OTLP, and console exporters
  • Implements enhanced logging in providers (OpenAI, Anthropic, Gemini) with request context and verbose modes
  • Adds context-aware logging and tracing in agent runner and tools execution
  • Provides an observability example with detailed usage instructions and environment configuration
  • Includes Docker Compose setup for Jaeger, OpenTelemetry Collector, Prometheus, and Grafana

Changes

Observability Example

  • New examples/observability directory with:
    • main.go: Demonstrates logging, tracing, error handling, and environment variable configuration
    • README.md: Detailed guide on observability features, setup, and best practices
    • docker-compose.yml: Services for Jaeger, OTEL Collector, Prometheus, and Grafana
    • otel-collector-config.yaml and prometheus.yml: Configuration files for metrics and tracing

Logging Enhancements

  • New pkg/logging package:
    • Structured and console loggers with configurable levels and verbosity
    • Context helpers for trace IDs and request IDs
  • Providers now use the logging package for detailed request/response logs
  • Environment variables control log level, format, verbosity, and tracing enablement

Tracing Integration

  • New pkg/tracing package:
    • OpenTelemetry tracer implementation with support for multiple exporters
    • Global tracer provider with environment-based configuration
    • Helper functions for tracing provider, agent, and tool operations
  • Agent runner enhanced to create trace spans for turns, provider calls, and tool executions

Provider Updates

  • OpenAI, Anthropic, and Gemini providers updated to:
    • Initialize with logger instances
    • Log request start, details, and completion with token usage
    • Support verbose logging of messages and tool calls

Agent Runner

  • Adds trace and logging context propagation
  • Wraps agent turns and provider completions in trace spans
  • Logs errors and metrics with trace correlation

Basic Example Update

  • Initializes global tracer and logs current logging/tracing configuration
  • Demonstrates shutdown of tracer on exit

Module and Dependency Updates

  • Adds OpenTelemetry and related dependencies
  • Updates go.mod and example modules accordingly

Test plan

  • Run examples/observability/main.go with different environment variable settings
  • Verify logs appear in console and JSON formats
  • Start Jaeger and OTEL Collector via Docker Compose and confirm traces are collected
  • Test provider calls log detailed request and response info
  • Confirm agent runner traces agent turns and tool calls
  • Validate error logging and trace error propagation
  • Check environment variable overrides for logging and tracing
  • Use Prometheus and Grafana to monitor metrics when running full stack

This PR significantly improves the observability capabilities of the Agents SDK, enabling developers to monitor, debug, and analyze agent behavior and provider interactions effectively.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/53fedd2e-6d19-46e0-98c1-bccce25b1f85

… tracing

- Introduce structured logging with configurable levels and formats
- Integrate OpenTelemetry tracing with support for Jaeger, Zipkin, OTLP, and console exporters
- Add context propagation for trace and request IDs
- Enhance providers (OpenAI, Anthropic, Gemini) with detailed logging and tracing
- Implement tracing spans for agent, provider, and tool operations
- Add observability example with detailed README and Docker Compose setup
- Update runner to include tracing and logging context
- Provide environment variable configuration for observability features
- Add new logging and tracing packages with console and structured loggers
- Support graceful tracer shutdown and error handling

This enables detailed monitoring, debugging, and performance analysis for the Agents SDK.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant