Skip to content

Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0)#44

Merged
00PrabalK00 merged 1 commit into
mainfrom
feat/trust-ui-and-cost
May 31, 2026
Merged

Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0)#44
00PrabalK00 merged 1 commit into
mainfrom
feat/trust-ui-and-cost

Conversation

@00PrabalK00
Copy link
Copy Markdown
Owner

Summary

Surfaces the evidence Continuum already records and adds cost/benefit accounting on top of it. Closes the remaining trust-UI and cost roadmap items.

  • Agent Flight Recorder (continuum flight-record <task>, flight.py) — a replayable record built entirely from stored state: context packet, claimed vs touched files, gate evidence with SHAs, events, messages and detected risks. Never trusts agent self-reports.
  • Agent ROI (continuum roi, roi.py) — tokens, cost-per-accepted-change, out-of-scope edits, reruns, manual corrections, provider usage, plus deterministic routing recommendations. No fabricated dollar pricing (USD left null).
  • Benchmark harness (continuum benchmark capture|compare, benchmark.py) — diffs with-vs-without task metrics and emits an evidence-based verdict.
  • Control Center trust views — Workflow Timeline, Multi-Agent Worktree Board, Agent Flight Recorder, Agent ROI and Context Packet Studio, backed by read-only /api/flight-records, /api/flight-record, /api/timeline, /api/worktree-board, /api/context-packets, /api/roi. Rendering is XSS-safe (escapeHtml).
  • store.messages now carries workflow/task refs so records and timeline attribute messages correctly.
  • Bumped to 0.9.0; CHANGELOG, README, roadmap, TASKS updated.

Closes #28, #29, #30, #34, #35.

Test plan

  • 286 tests pass locally under CI-mimic (GIT_CONFIG_GLOBAL=/dev/null GIT_CONFIG_SYSTEM=/dev/null python -m unittest discover -s tests).
  • New coverage: flight record (content + CLI json exit), Control Center trust endpoints (timeline/board/context/flight/roi over live HTTP), ROI/benchmark capture+compare. Worktree-touching tests persist git identity per the CI gotcha.

🤖 Generated with Claude Code

Surface the evidence Continuum already records and add cost/benefit
accounting on top of it.

- flight.py: replayable Agent Flight Recorder built from stored state
  (context packet, claimed vs touched files, gate evidence, events,
  messages, risks) — never from agent self-reports.
- roi.py: Agent ROI evidence (tokens, cost-per-accepted-change,
  out-of-scope edits, reruns, manual corrections, provider usage) plus
  deterministic routing recommendations.
- benchmark.py: with-vs-without capture/compare harness with an
  evidence-based verdict.
- control_center.py + ui: Workflow Timeline, Multi-Agent Worktree Board,
  Agent Flight Recorder, Agent ROI and Context Packet Studio views over
  new read-only API endpoints (XSS-safe rendering).
- core.py: store.messages now carries workflow/task refs for attribution.
- CLI: continuum flight-record, roi, benchmark capture/compare.
- Bumped to 0.9.0; CHANGELOG, README, roadmap and TASKS updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@00PrabalK00 00PrabalK00 merged commit 797563b into main May 31, 2026
12 checks passed
@00PrabalK00 00PrabalK00 deleted the feat/trust-ui-and-cost branch May 31, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UI: Workflow Timeline MVP

1 participant