Skip to content

TKAI-2: add session tracing propagation#45

Open
figitaki wants to merge 5 commits into
yourbuddyconner:mainfrom
tkhq:carey/tkai-2-session-tracing
Open

TKAI-2: add session tracing propagation#45
figitaki wants to merge 5 commits into
yourbuddyconner:mainfrom
tkhq:carey/tkai-2-session-tracing

Conversation

@figitaki
Copy link
Copy Markdown
Collaborator

@figitaki figitaki commented May 6, 2026

Summary

  • add a dependency-free OTLP/HTTP JSON tracer shared by worker and runner
  • propagate W3C traceparent through sandbox env, DO→runner protocol messages, and runner→OpenCode HTTP calls
  • instrument worker lifecycle/dispatch spans plus runner bootstrap, repo setup, turn, workflow, tool, and LLM usage spans
  • document TKAI-2 implementation coverage in the session tracing spec

Test plan

  • Unit tests
  • Smoke test
  • Deploy

Need to set.

OTEL_EXPORTER_OTLP_ENDPOINT=<grafana otlp endpoint>
OTEL_EXPORTER_OTLP_HEADERS=<auth headers>

Verification

  • pnpm --filter @valet/shared typecheck
  • pnpm --filter @valet/runner typecheck
  • pnpm --filter @valet/worker typecheck
  • pnpm --filter @valet/runner test -- src/prompt.test.ts
  • pnpm --filter @valet/worker exec vitest run src/durable-objects/prompt-queue.test.ts src/durable-objects/runner-link.test.ts

Refs TKAI-2

@figitaki figitaki requested a review from yourbuddyconner May 6, 2026 02:48
Comment thread packages/runner/src/agent-client.ts Outdated
private reconnectCallback?: () => void;

private promptHandler: ((messageId: string, content: string, model?: string, author?: PromptAuthor, modelPreferences?: string[], attachments?: PromptAttachment[], channelType?: string, channelId?: string, opencodeSessionId?: string, continuationContext?: string, threadId?: string, replyChannelType?: string, replyChannelId?: string) => void | Promise<void>) | null = null;
private promptHandler: ((messageId: string, content: string, model?: string, author?: PromptAuthor, modelPreferences?: string[], attachments?: PromptAttachment[], channelType?: string, channelId?: string, opencodeSessionId?: string, continuationContext?: string, threadId?: string, replyChannelType?: string, replyChannelId?: string, traceparent?: string) => void | Promise<void>) | null = null;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This payload is getting quite big, maybe we consider breaking this up into an object shape and enforce with an object type instead of function argument signatures.

Comment thread packages/shared/src/tracing.ts Outdated
}

function msToUnixNano(ms: number): string {
return `${Math.floor(ms) * 1_000_000}`;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number math can exceed MAX_SAFE_INTEGER for epoch nanoseconds, causing rounding errors in span timestamps. Use BigInt.

figitaki added 3 commits May 8, 2026 11:50
…-flush

msToUnixNano multiplied a JS number by 1_000_000, exceeding MAX_SAFE_INTEGER
for present-day epoch milliseconds and silently rounding span timestamps.

Adds maxQueuedSpans + scheduleFlush options so a hot tracer can auto-flush
in batches once a buffer threshold is hit, with a host hook for ctx.waitUntil.

Addresses review feedback on #1 and yourbuddyconner#45.
The prompt handler signature had grown to 14 positional args (messageId,
content, model, author, modelPreferences, attachments, channelType,
channelId, opencodeSessionId, continuationContext, threadId,
replyChannelType, replyChannelId, traceparent), making call sites and
the wire shape unreviewable.

Introduces PromptDispatch / PromptHandlerFn so onPrompt and handlePrompt
take a single typed object. Updates the agent-client dispatcher,
PromptHandler.handlePrompt, the bin.ts callback, and the prompt unit
tests to use the new shape.

Addresses figitaki review on yourbuddyconner#45.
Previously every span ended in a session-agent dispatch path called
flushTracing() inline, fanning out to a separate ctx.waitUntil(flush())
per span and POSTing the buffer to OTLP each time. Across spawn →
dispatch → dispatch → hibernate → restore lifecycles this generated
many small HTTP requests against Grafana Cloud.

Configures SimpleTracer with maxQueuedSpans=50 and a scheduleFlush hook
that registers auto-flush promises with ctx.waitUntil. Drops the
per-span flushTracing() calls at dispatch sites; explicit flushes remain
at session-level boundaries (hibernate, terminate, wake guard, child
session error finally blocks) where the DO may go idle before the
threshold is hit.

Addresses f3nry review on #1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant