Apache Burr is uniquely positioned to handle complex stateful agent workflows. To transition from beta observability to enterprise-grade readiness, I propose a formal initiative to evolve our current OpenTelemetry integration.
Current Limitations:
Async/Reactive Gaps: Existing integrations face stability challenges in fully asynchronous event loops (e.g., issue #413).
Semantic Consistency: Lack of standardized semantic conventions for agent steps (e.g., token usage, state transitions, and actor latency) leads to fragmented trace data.
Context Propagation: Inability to maintain parent-child trace context across state machine transitions in distributed agentic workflows.
Proposed Scope of Work:
Semantic Convention Standardization: Define and implement standard OTel attributes for Burr (e.g., ai.agent.state, ai.agent.transition_id, llm.token_usage).
Native Async Instrumentation: Refactor the bridge to utilize modern opentelemetry-instrumentation patterns that ensure non-blocking trace propagation.
PII/Sensitive Data Masking: Implement a built-in scrubbing mechanism for agent state payloads before exporting them to OTel collectors—critical for healthcare and regulated industry adoption.
Correlation IDs: Automatic injection of trace_id into all persisted state snapshots, allowing users to jump from a database record directly to the corresponding trace.
Why this aligns with Apache Way:
This move would position Burr as the primary choice for regulated AI agent development, where visibility and auditability are non-negotiable. I am prepared to contribute the initial reference implementation, drawing on my experience with production-grade OTel modules for Spring AI.
Next Steps:
I request the PPMC/Maintainer feedback on this roadmap. If agreed, I will open a series of PRs beginning with the core Semantic Conventions.
Apache Burr is uniquely positioned to handle complex stateful agent workflows. To transition from beta observability to enterprise-grade readiness, I propose a formal initiative to evolve our current OpenTelemetry integration.
Current Limitations:
Async/Reactive Gaps: Existing integrations face stability challenges in fully asynchronous event loops (e.g., issue #413).
Semantic Consistency: Lack of standardized semantic conventions for agent steps (e.g., token usage, state transitions, and actor latency) leads to fragmented trace data.
Context Propagation: Inability to maintain parent-child trace context across state machine transitions in distributed agentic workflows.
Proposed Scope of Work:
Semantic Convention Standardization: Define and implement standard OTel attributes for Burr (e.g., ai.agent.state, ai.agent.transition_id, llm.token_usage).
Native Async Instrumentation: Refactor the bridge to utilize modern opentelemetry-instrumentation patterns that ensure non-blocking trace propagation.
PII/Sensitive Data Masking: Implement a built-in scrubbing mechanism for agent state payloads before exporting them to OTel collectors—critical for healthcare and regulated industry adoption.
Correlation IDs: Automatic injection of trace_id into all persisted state snapshots, allowing users to jump from a database record directly to the corresponding trace.
Why this aligns with Apache Way:
This move would position Burr as the primary choice for regulated AI agent development, where visibility and auditability are non-negotiable. I am prepared to contribute the initial reference implementation, drawing on my experience with production-grade OTel modules for Spring AI.
Next Steps:
I request the PPMC/Maintainer feedback on this roadmap. If agreed, I will open a series of PRs beginning with the core Semantic Conventions.