Add trust UI, flight recorder, ROI and benchmark harness (v0.9.0)#44
Merged
Conversation
Surface the evidence Continuum already records and add cost/benefit accounting on top of it. - flight.py: replayable Agent Flight Recorder built from stored state (context packet, claimed vs touched files, gate evidence, events, messages, risks) — never from agent self-reports. - roi.py: Agent ROI evidence (tokens, cost-per-accepted-change, out-of-scope edits, reruns, manual corrections, provider usage) plus deterministic routing recommendations. - benchmark.py: with-vs-without capture/compare harness with an evidence-based verdict. - control_center.py + ui: Workflow Timeline, Multi-Agent Worktree Board, Agent Flight Recorder, Agent ROI and Context Packet Studio views over new read-only API endpoints (XSS-safe rendering). - core.py: store.messages now carries workflow/task refs for attribution. - CLI: continuum flight-record, roi, benchmark capture/compare. - Bumped to 0.9.0; CHANGELOG, README, roadmap and TASKS updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced May 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surfaces the evidence Continuum already records and adds cost/benefit accounting on top of it. Closes the remaining trust-UI and cost roadmap items.
continuum flight-record <task>,flight.py) — a replayable record built entirely from stored state: context packet, claimed vs touched files, gate evidence with SHAs, events, messages and detected risks. Never trusts agent self-reports.continuum roi,roi.py) — tokens, cost-per-accepted-change, out-of-scope edits, reruns, manual corrections, provider usage, plus deterministic routing recommendations. No fabricated dollar pricing (USD left null).continuum benchmark capture|compare,benchmark.py) — diffs with-vs-without task metrics and emits an evidence-based verdict./api/flight-records,/api/flight-record,/api/timeline,/api/worktree-board,/api/context-packets,/api/roi. Rendering is XSS-safe (escapeHtml).store.messagesnow carries workflow/task refs so records and timeline attribute messages correctly.Closes #28, #29, #30, #34, #35.
Test plan
GIT_CONFIG_GLOBAL=/dev/null GIT_CONFIG_SYSTEM=/dev/null python -m unittest discover -s tests).🤖 Generated with Claude Code