Add end-to-end trust demo flow integration test#46
Merged
Conversation
Drives the full advertised v0.9.0 pipeline as one chained flow on a real temporary git project: init -> objective (schedule: tasks + worktree lanes + claims) -> commit change in lane worktree -> record PASS/APPROVED gates -> evidence -> pr-packet -> flight-record (merge_ready) -> roi -> benchmark capture/compare (improved verdict). Also asserts each CLI command exits 0 on the populated project and exits 1 (no traceback) on a bogus task id. Documents the file-level lane-scoping detail in the README demo section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this proves
Adds
tests/test_demo_flow.py, a single class-basedunittestintegration test that drives the full advertised v0.9.0 trust pipeline as one chained flow on a real temporary git project, through the public API (MemoryStore+ managers) and the CLI surface (continuum.cli.main([...])):continuum init->objective --mode schedule(tasks + worktree lanes + file claims) -> real change + commit inside a lane worktree ->record_tests(PASS) +record_review(APPROVED) ->evidence->pr-packet->flight-record->roi->benchmark capture/compare.Assertions
objective ... --mode scheduleproduces tasks with lanes/owned paths and worktrees (exit 0).gather_evidenceshows the changed file and PASS/APPROVED with no risks.flight-recordfinal_status == "merge_ready".roireportsflight_records >= 1andmerge_ready_tasks >= 1.benchmark capturethencompareagainst a synthetic worse baseline yields verdict"Continuum improved the measured run."(both via the pure functions and the CLI render path).objective,evidence,pr-packet,flight-record,roi,benchmark capture/compare) returns exit 0 on the populated project and exit 1 (no traceback, gracefulError:on stderr) on a bogus task id.Git identity is persisted in
setUp(CI runners have no global identity); the project is atempfile.TemporaryDirectoryand all CLI calls pass--projectso the real repo is never touched.Integration result
The pipeline chains correctly end to end with no code changes required to
objective.pyorcli.py.Finding (documented, not a blocker)
The out-of-scope risk heuristic in
evidence._risksmatches changed files against claimed paths by exact path, so owning a directory lane (e.g.--path backend=src) does not cover a nested change likesrc/app.py— it is flagged as an out-of-scope edit. The demo therefore owns precise file paths (--path backend=src/app.py), which is also how a reviewer wants scope expressed. This is inevidence.py(owned by another agent), so it was not modified; the behavior is documented in the README demo section and called out here for routing if directory-prefix scope matching is desired.Docs
Adds an "End-To-End Trust Demo" subsection to the README "Plan Objectives And Evidence" section showing the full chained flow and the file-level scoping note.
Verification
CI-mimic from the worktree root:
All 289 tests pass (286 prior + 3 new).
🤖 Generated with Claude Code