What
Expose per-run token and cost telemetry in the CLI artifacts and governance history, instead of keeping cost accounting only inside adapters / agent traces.
Motivation
The project already computes input_tokens, output_tokens, and cost_usd at the adapter layer, and BaseAgent accumulates per-agent totals through CostGuard. But the normal CLI flow does not persist that data into workspace/governance_history.json or surface it in debate artifacts.
That creates an observability gap:
- cost exists internally but is not first-class in the user-facing run record
- mixed-model runs can become materially more expensive without the CLI making that obvious
ac score and downstream analysis cannot reason about debate quality vs cost tradeoffs
This is separate from issue #8, which is about whether soft-limit warnings belong in the runtime path.
Scope
- persist total input/output tokens and total cost for each CLI run
- optionally break out cost by role: analyst / critic / judge
- include cost metadata in saved debate artifacts and/or CLI summary output
- decide whether governance history should store raw telemetry, aggregates, or both
- add tests covering real adapters and mock behavior where possible
- document pricing visibility expectations for role-specific model configurations
Why this matters
If Agent Constitution is going to encourage role-specific model choices, cost needs to be inspectable at the same level as verdicts and audit trails. Otherwise users cannot tell whether a governance pattern is operationally viable.
What
Expose per-run token and cost telemetry in the CLI artifacts and governance history, instead of keeping cost accounting only inside adapters / agent traces.
Motivation
The project already computes
input_tokens,output_tokens, andcost_usdat the adapter layer, andBaseAgentaccumulates per-agent totals throughCostGuard. But the normal CLI flow does not persist that data intoworkspace/governance_history.jsonor surface it in debate artifacts.That creates an observability gap:
ac scoreand downstream analysis cannot reason about debate quality vs cost tradeoffsThis is separate from issue #8, which is about whether soft-limit warnings belong in the runtime path.
Scope
Why this matters
If Agent Constitution is going to encourage role-specific model choices, cost needs to be inspectable at the same level as verdicts and audit trails. Otherwise users cannot tell whether a governance pattern is operationally viable.