OLS-3274: add structured audit event logging with independent OTEL tracing by vimalk78 · Pull Request #84 · openshift/lightspeed-agentic-sandbox

vimalk78 · 2026-06-23T18:08:14Z

Description:

Summary

Adds AuditLogger class that emits structured JSON audit events (audit.agent.started, tool.call, tool.result, text, thinking, completed) to
stdout during agent execution
Audit JSON logging and OTEL tracing are independent controls per spec: JSON logs gated by LIGHTSPEED_AUDIT_ENABLED, OTEL spans created whenever an endpoint
is configured
Phase derivation from request context (analysis/execution/verification/escalation)
Wired into /run route for all providers

Test plan

make test — 130 unit tests pass
make lint — clean
Verified on cluster: audit logs enabled + OTEL disabled → JSON logs to stdout, no sandbox spans
Verified on cluster: audit logs disabled + OTEL enabled → no JSON logs, spans in Jaeger
Verified on cluster: both enabled → JSON logs + spans
Verified trace_id correlation across analysis and execution phases

coderabbitai · 2026-06-23T18:08:28Z

Warning

Review limit reached

@vimalk78, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 57 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e07ab6a9-f2ec-481a-bc6a-03d143652936

📥 Commits

Reviewing files that changed from the base of the PR and between 7dfa648 and de0dbbd.

📒 Files selected for processing (5)

src/lightspeed_agentic/audit.py
src/lightspeed_agentic/providers/openai.py
src/lightspeed_agentic/routes/query.py
tests/test_audit.py
tests/test_routes.py

📝 Walkthrough

Walkthrough

Adds structured audit logging for agent runs, including phase derivation, streaming text/thinking/tool events, completion records with token and cost metrics, provider usage extraction, and query route instrumentation.

Changes

Audit logging flow

Layer / File(s)	Summary
Phase derivation and startup records `src/lightspeed_agentic/audit.py`, `tests/test_audit.py`	`derive_phase` resolves the audit phase from context, `AuditLogger` initializes tracing and buffering state, and startup tests assert the emitted `audit.agent.started` record and common fields.
Streaming events and completion records `src/lightspeed_agentic/audit.py`, `tests/test_audit.py`	`AuditLogger.process_event` buffers text and thinking, emits tool call/result records, flushes on block boundaries, and `complete` emits `audit.agent.completed`; tests cover buffering, tool spans, completion, and disabled logging.
Query route wiring and result metrics `src/lightspeed_agentic/providers/openai.py`, `src/lightspeed_agentic/routes/query.py`, `tests/test_routes.py`	`OpenAIProvider` now reads token counts from `result.context_wrapper.usage`; `/run` creates an `AuditLogger`, forwards provider events, records token and cost metrics, and route tests cover enabled/disabled audit output and phase derivation.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant run_endpoint
  participant OpenAIProvider
  participant AuditLogger
  participant stdout
  Client->>run_endpoint: POST /v1/agent/run
  run_endpoint->>OpenAIProvider: query(...)
  OpenAIProvider-->>run_endpoint: ProviderEvent stream
  run_endpoint->>AuditLogger: process_event(event)
  OpenAIProvider-->>run_endpoint: ResultEvent usage
  run_endpoint->>AuditLogger: complete(success, input_tokens, output_tokens, cost_usd)
  AuditLogger->>stdout: emit audit JSON

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: structured audit logging with separate OTEL tracing.
Description check	✅ Passed	The description is clearly related to the PR and describes the audit logging and tracing changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-06-23T18:08:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xrajesh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vimalk78 · 2026-06-23T18:39:23Z

/retest

red-hat-konflux · 2026-06-23T18:39:37Z

All PipelineRuns for this commit have already succeeded. Use /retest <pipeline-name> to re-run a specific pipeline or /test to re-run all pipelines.

vimalk78 · 2026-06-24T05:47:15Z

/test lint

vimalk78 · 2026-06-24T06:30:20Z

/test e2e-claude

openshift-ci-robot · 2026-06-25T08:04:13Z

@vimalk78: This pull request references OLS-3274 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

blublinsky

should-fix: `_tool_span` leak on timeout, crash, or back-to-back tool calls

Location: src/lightspeed_agentic/audit.py:56-68

Problem:

The AuditLogger starts an OTEL span on tool_call and ends it on tool_result, but there are three paths where the span is never ended:

Timeout/crash mid-tool — tool_call fires, agent hangs, TimeoutError raised. complete(success=False) doesn't end the span.
Back-to-back tool_call — Two tool_call events without an intervening tool_result (e.g., parallel tool calls). Second call overwrites _tool_span without ending the first.
Exception during tool execution — Error raised between tool_call and tool_result events.

Suggested fix:

Add cleanup to complete():

def complete(self, *, success: bool, ...) -> None:
    self._flush_buffers()
    if self._tool_span is not None:
        self._tool_span.end()
        self._tool_span = None
    self._emit("audit.agent.completed", ...)

And handle back-to-back in process_event:

case "tool_call":
    self._flush_buffers()
    if self._tool_span is not None:
        self._tool_span.end()  # end previous before starting new
    self._last_tool_name = event.name or "unknown"
    self._tool_span = self._tracer.start_span(...)

blublinsky

nice-to-have: Simpler token counting via SDK aggregated usage

Location: src/lightspeed_agentic/providers/openai.py:262-263

Current:

input_tokens = sum(r.usage.input_tokens for r in result.raw_responses)
output_tokens = sum(r.usage.output_tokens for r in result.raw_responses)

Suggested:

input_tokens = result.context_wrapper.usage.input_tokens
output_tokens = result.context_wrapper.usage.output_tokens

The SDK already aggregates token usage across all responses internally. Single access, no loop, and it's the SDK-recommended approach for totals.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_agentic/routes/query.py`:
- Around line 133-158: The audit completion in the query flow is recorded too
early, so `audit_logger.complete` can mark `success=True` even when the final
`RunResponse` is unsuccessful. Move or recompute the `success` value in
`query.py`’s `run()` handling so it reflects the actual returned outcome after
the empty-text check and the `parsed.get("success", True)` response shaping, and
keep `audit_logger.complete` consistent in both success and failure paths.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 15fe14f2-33e1-40c7-8a96-d457be6efa69

📥 Commits

Reviewing files that changed from the base of the PR and between c9daf3f and 7dfa648.

📒 Files selected for processing (5)

src/lightspeed_agentic/audit.py
src/lightspeed_agentic/providers/openai.py
src/lightspeed_agentic/routes/query.py
tests/test_audit.py
tests/test_routes.py

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

openshift/lightspeed-agentic-operator (manual)

Adds AuditLogger that emits structured JSON audit events to stdout during agent execution. Logging and OTEL tracing are independent controls per spec: JSON logs are gated by LIGHTSPEED_AUDIT_ENABLED, OTEL spans are always created when an endpoint is configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vimal Kumar <vimal78@gmail.com>

openshift-ci · 2026-06-25T11:42:24Z

@vimalk78: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2026

openshift-ci Bot requested review from blublinsky and raptorsun June 23, 2026 18:08

vimalk78 force-pushed the ols-3274-audit-events branch from dc5d789 to 4c50771 Compare June 24, 2026 06:05

vimalk78 force-pushed the ols-3274-audit-events branch 2 times, most recently from 8671661 to 7c3aee4 Compare June 25, 2026 07:43

vimalk78 changed the title ~~WIP: Ols 3274 audit events~~ OLS-3274: add structured audit event logging with independent OTEL tracing Jun 25, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2026

blublinsky reviewed Jun 25, 2026

View reviewed changes

vimalk78 force-pushed the ols-3274-audit-events branch from 7c3aee4 to 7dfa648 Compare June 25, 2026 10:33

coderabbitai Bot requested changes Jun 25, 2026

View reviewed changes

Comment thread src/lightspeed_agentic/routes/query.py

vimalk78 force-pushed the ols-3274-audit-events branch from 7dfa648 to de0dbbd Compare June 25, 2026 11:29

coderabbitai Bot approved these changes Jun 25, 2026

View reviewed changes

Uh oh!

Conversation

vimalk78 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jun 23, 2026

Uh oh!

vimalk78 commented Jun 23, 2026

Uh oh!

red-hat-konflux Bot commented Jun 23, 2026

Uh oh!

vimalk78 commented Jun 24, 2026

Uh oh!

vimalk78 commented Jun 24, 2026

Uh oh!

openshift-ci-robot commented Jun 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blublinsky left a comment

Choose a reason for hiding this comment

should-fix: _tool_span leak on timeout, crash, or back-to-back tool calls

Uh oh!

blublinsky left a comment

Choose a reason for hiding this comment

nice-to-have: Simpler token counting via SDK aggregated usage

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vimalk78 commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

openshift-ci-robot commented Jun 25, 2026 •

edited by openshift-ci Bot

Loading

should-fix: `_tool_span` leak on timeout, crash, or back-to-back tool calls