Skip to content

burn diagnose: surface execution-graph (relationships + tool-result chronology) #111

@willwashburn

Description

@willwashburn

Context

PR #77 landed the execution-graph substrate (#42) — SessionRelationshipRecord and ToolResultEventRecord are now persisted by the ledger. From the PR's deferred-work list:

Consumer CLI surfaceburn summary --subagent-tree, burn diagnose, burn waste --patterns, burn summary --by-relationship. The execution graph is now persisted, so these can be built on top.

Today burn diagnose <session> (packages/cli/src/commands/diagnose.ts) reads queryAll, queryCompactions, and readContent, then runs attributeWaste + detectPatterns. It never reads the relationship or tool-result-event tables that PR #77 introduced — there's no queryRelationships / queryToolResultEvents call. That makes it impossible to answer the chronology / outcome questions issue #42 was meant to unlock:

  • which subagent invocation cost the most?
  • which tool calls errored three times before succeeding?
  • how many spawned subagents never completed successfully?

The data is in the ledger; burn diagnose just doesn't read it yet.

Proposal

Extend burn diagnose <session> to:

  • Query SessionRelationshipRecords and ToolResultEventRecords for the session (and its child / parent sessions where relationship rows make the connection).
  • Render a compact relationship summary: root, subagent chains, fork / continuation links if present. Reuse / share the renderer with the subagent-tree consumer migration in this PR's sibling issue.
  • Render a tool-result-event timeline: per toolUseId, list the chronological (eventIndex, ts, status, eventSource, contentLength) rows. Highlight cases where a tool call has multiple events with errored statuses (the substrate for retry-loop attribution called out in Execution graph for passive readers: session relationships and tool-result event chronology #42 and Waste-pattern detection: retry loops, consecutive failures, compaction loss, edit-revert #11).
  • Roll up terminal-status counts per session (completed / errored / cancelled / unknown) so the top-of-output session totals can include "X of N tool calls errored".
  • Add a new burn summary --by-relationship subcommand variant that aggregates totals (turns, cost, tokens) grouped by relationshipType across the queried sessions, mirroring the existing --by-subagent-type.
  • Add --json parity for both the diagnose graph view and the summary --by-relationship view.

This issue covers all of burn diagnose's execution-graph surfacing plus burn summary --by-relationship, since they're the same wiring (query the new tables, group, render).

Acceptance criteria

  • burn diagnose <session> includes a relationship summary section listing each relationship row with type, related session id, and (when present) subagentType / description / parentToolUseId.
  • burn diagnose <session> includes a tool-result chronology section keyed by toolUseId, listing events in eventIndex order with status / event source / content length.
  • burn diagnose --json includes relationships and toolResultEvents arrays alongside today's fields.
  • burn summary --by-relationship aggregates queried sessions by relationshipType, producing the same shape as today's --by-subagent-type (totals, counts, median / p95 cost where appropriate).
  • A session with no relationship / event rows renders today's diagnose output unchanged (graceful empty sections).
  • Tests exercise both populated and empty-graph sessions.

Out of scope

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions