Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0) by 00PrabalK00 · Pull Request #44 · 00PrabalK00/Continuum

00PrabalK00 · 2026-05-31T16:36:08Z

Summary

Surfaces the evidence Continuum already records and adds cost/benefit accounting on top of it. Closes the remaining trust-UI and cost roadmap items.

Agent Flight Recorder (continuum flight-record <task>, flight.py) — a replayable record built entirely from stored state: context packet, claimed vs touched files, gate evidence with SHAs, events, messages and detected risks. Never trusts agent self-reports.
Agent ROI (continuum roi, roi.py) — tokens, cost-per-accepted-change, out-of-scope edits, reruns, manual corrections, provider usage, plus deterministic routing recommendations. No fabricated dollar pricing (USD left null).
Benchmark harness (continuum benchmark capture|compare, benchmark.py) — diffs with-vs-without task metrics and emits an evidence-based verdict.
Control Center trust views — Workflow Timeline, Multi-Agent Worktree Board, Agent Flight Recorder, Agent ROI and Context Packet Studio, backed by read-only /api/flight-records, /api/flight-record, /api/timeline, /api/worktree-board, /api/context-packets, /api/roi. Rendering is XSS-safe (escapeHtml).
store.messages now carries workflow/task refs so records and timeline attribute messages correctly.
Bumped to 0.9.0; CHANGELOG, README, roadmap, TASKS updated.

Closes #28, #29, #30, #34, #35.

Test plan

286 tests pass locally under CI-mimic (GIT_CONFIG_GLOBAL=/dev/null GIT_CONFIG_SYSTEM=/dev/null python -m unittest discover -s tests).
New coverage: flight record (content + CLI json exit), Control Center trust endpoints (timeline/board/context/flight/roi over live HTTP), ROI/benchmark capture+compare. Worktree-touching tests persist git identity per the CI gotcha.

🤖 Generated with Claude Code

Surface the evidence Continuum already records and add cost/benefit accounting on top of it. - flight.py: replayable Agent Flight Recorder built from stored state (context packet, claimed vs touched files, gate evidence, events, messages, risks) — never from agent self-reports. - roi.py: Agent ROI evidence (tokens, cost-per-accepted-change, out-of-scope edits, reruns, manual corrections, provider usage) plus deterministic routing recommendations. - benchmark.py: with-vs-without capture/compare harness with an evidence-based verdict. - control_center.py + ui: Workflow Timeline, Multi-Agent Worktree Board, Agent Flight Recorder, Agent ROI and Context Packet Studio views over new read-only API endpoints (XSS-safe rendering). - core.py: store.messages now carries workflow/task refs for attribution. - CLI: continuum flight-record, roi, benchmark capture/compare. - Bumped to 0.9.0; CHANGELOG, README, roadmap and TASKS updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

00PrabalK00 merged commit 797563b into main May 31, 2026
12 checks passed

00PrabalK00 deleted the feat/trust-ui-and-cost branch May 31, 2026 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0)#44

Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0)#44
00PrabalK00 merged 1 commit into
mainfrom
feat/trust-ui-and-cost

00PrabalK00 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

00PrabalK00 commented May 31, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant