A Codex Desktop App plugin that orchestrates multi-stage agentic work: manifest -> research -> plan -> test-write -> execute -> policy -> verify -> drift-detect -> critique -> auto-promote -> manager, with human-approval gates at manifest, plan, and manager-decision (the last auto-fires on clean runs at v0.5+). Built from real lessons across CivicCast, CivicSuite, AgentSuiteLocal and other projects where autonomous agent runs go wrong silently and "manager-PROMOTE" failures slip past CI.
Current release: v0.9.1 (baseline isolation for release-rehearsal failures and hangs). CHANGELOG | User Manual | Architecture | Landing page | Discussions
Agentic work fails in predictable ways:
- The agent doesn't understand the project's conventions, so it improvises and the work silently diverges from the spec.
- The agent claims tests pass without running them against a fresh dependency set.
- The agent merges in-flight work while a scope question is open.
- The agent picks architectural decisions silently rather than surfacing them.
This plugin enforces a structural pattern that catches every one of those:
-
Manifest + scope-lock gate - every run starts with an explicit, human-approved manifest plus a scope lock naming the canonical release-plan rung, proof statement, allowed terms, and forbidden future-rung terms.
-
Director-decisions gate - the researcher surfaces open questions; the human picks; choices are recorded as binding constraints before the planner runs.
-
Plan gate - the planner produces a plan; the human approves or sends back.
-
Execute readiness gate - before the runner may advance from execute to policy/verify,
implementation-report.mdmust prove the full manifest Definition of Done is ready, not merely that a backend or test slice passed. The report needs a machine-readable**DoD readiness: READY**line and a zero-blocker DoD checklist covering expected outputs, DoD clauses, project UX/docs/testing gates, and prior blockers. -
Policy stage - automated checks block the run if the manifest fails strict schema validation (v0.5), the scope lock disagrees with the canonical release plan (v0.5.9), execute readiness is missing or blocked, any change falls outside
allowed_paths, edited paths or commit messages belong to another rung, docs contradict the locked rung, the diff contains TODO/FIXME/HACK markers, or an existing ADR was modified. -
Verifier stage - independent fresh-context check against every manifest exit criterion.
-
Manager gate - final PROMOTE/BLOCK/REPLAN decision, must cite verifier evidence verbatim.
-
Control-loop gate - before any final response, stop, deferral, or skipped action during an authorized run,
.agent-runs/<run-id>/active-control-state.mdmust record a valid stop condition,scripts/policy/check_pipeline_control_loop.py --run <run-id>must pass,scripts/policy/final_response_gate.py --require-active-runmust printfinal_response_gate: ALLOW, andscripts/policy/agent_decision_gate.pymust allow the specific decision through the sharedstop_validator.py. Green CI, successful push, draft PR status, stale state, unverified blocker risk, and a recommended next action are not stop conditions.
Install through the Codex Desktop App plugin flow once the repository is published, or clone it locally and add the local path to your Codex plugin configuration:
git clone https://github.com/scottconverse/agent-pipeline-codex.git ~/agent-pipeline-codex-pluginBefore calling the plugin installed, run the release verifier and prove the Codex runtime can see it:
python scripts/verify_plugin_release.py --liveThat gate must print PLUGIN-RELEASE-VERIFY: PASSED, including the nested
PLUGIN-INSTALL-ACCEPTANCE: PASSED live check. Repository tests, standalone
skill copies, and docs deployment are not install proof.
The live gate first verifies deterministic source and installed-cache state, then runs repeated fresh Codex probes. One transient model enumeration miss is preserved in the transcript but does not fail the release if another fresh probe sees the complete namespaced plugin surface and no plugin-specific loader warning appears.
Fresh-session prompts must verify the namespaced plugin skills, not only the standalone skill names. The expected names are:
agent-pipeline-codex:agent-pipelineagent-pipeline-codex:intakeagent-pipeline-codex:pipeline-initagent-pipeline-codex:new-runagent-pipeline-codex:run-pipelineagent-pipeline-codex:audit-initagent-pipeline-codex:show-run-statusagent-pipeline-codex:validate-manifest
If a session only sees agent-pipeline, pipeline-init, or run-pipeline
without the agent-pipeline-codex: prefix, it may be seeing stale standalone
skills under $CODEX_HOME/skills; that is not proof the current plugin is
active.
For headless CI that cannot see the local Codex Desktop registry, use the source-only gate:
python scripts/verify_plugin_release.py --source-onlyDrop into the project root (or a fresh empty directory) and ask Codex to use the pipeline-init skill:
Use pipeline-init for this project.
The plugin asks one question - what do you have? - and accepts one of three inputs:
- A PRD or spec document. Paste the path or contents. The plugin reads it, derives the project's conventions, scaffolds
AGENTS.md+.pipelines/+scripts/policy/+ a.gitignoreentry, and produces a project-orientation summary. - A repo URL (or a local path to an existing repo). The plugin clones (or reads), inspects
README,AGENTS.md,pyproject.toml/package.json,.github/workflows/,docs/adr/, and recent commits. Produces a project-orientation summary, flags missing pieces, installs.pipelines/+scripts/policy/. - A project description paragraph or two. Plugin reads it, asks "scaffold a new project from this, or use as context for an existing repo?" and routes to (1) or (2) accordingly.
After init, your project has:
.pipelines/
|-- feature.yaml # stage sequence for new functionality
|-- bugfix.yaml # stage sequence for bug fixes
|-- module-release.yaml # six-phase release pipeline (v0.2+)
|-- manifest-template.yaml # blank template with field docs
|-- scope-lock-template.yaml # blank canonical rung/scope lock
|-- directive-template.yaml # optional v0.6 directive contract template
|-- action-classification.yaml # opt-in: enables the v0.4 judge layer
|-- self-classification-rules.md # pre-authorized cases the executor handles solo
`-- roles/
|-- researcher.md
|-- planner.md
|-- test-writer.md
|-- executor.md # has the v0.5 pre-edit fact-forcing gate
|-- verifier.md # emits v0.5 parseable criteria count line
|-- drift-detector.md # v0.5 - manifest contract vs assembled state
|-- critic.md # v0.5 - adversarial cold read, six lenses
|-- manager.md # v0.5 auto-promote-aware
|-- judge.md # v0.4 - used only when judge layer is opt-in
|-- preflight-auditor.md # v0.2 module-release Phase 0
|-- local-rehearsal.md # v0.2 module-release Phase 2
|-- cross-agent-auditor.md # v0.3 audit-handoff
`-- implementer-pre-push.md # v0.3 audit-handoff
scripts/policy/
|-- check_manifest_schema.py # v0.5 - strict manifest contract validator
|-- check_scope_lock.py # v0.5.9 - canonical rung lock validator
|-- check_directive_conformance.py # v0.6 - manifest/scope directive conformance
|-- check_plan_against_directive.py # v0.6 - plan assertion auto-approval gate
|-- check_allowed_paths.py # generic, manifest-driven
|-- check_rung_file_ownership.py # v0.5.9 - future-rung path/commit blocker
|-- check_release_docs_consistency.py # v0.5.9 - docs vs release-plan consistency
|-- final_response_gate.py # pre-final continuation gate
|-- agent_decision_gate.py # stop/skip/defer decision gate + ledger
|-- pipeline_continue.py # executable next-action navigator
|-- stop_validator.py # v0.5.10 - shared stop-condition truth validator
|-- check_decision_ledger.py # schema validator for decision-ledger.ndjson
|-- show_run_status.py # read-only run state summary
|-- check_no_todos.py # generic, configurable scan dirs
|-- check_adr_gate.py # generic, ADRs are append-only
|-- auto_promote.py # v0.5 - six-condition machine-checkable promote
`-- run_all.py # runner
.agent-runs/ # gitignored - pipeline run artifacts
hooks/ # plugin-bundled v0.7 lifecycle hooks
If you know the work in plain English but do not have a manifest yet, start with the separate intake skill:
Use intake for Add account deletion to settings, including tests, docs, and a clear rollback path.
intake creates draft run artifacts and stops. It does not validate the
manifest, approve scope, run agents, or start implementation. It writes:
intake.md- the verbatim source description plus conservative interpretationmanifest.yaml- a draft manifest with uncertain scope left as TODOsscope-lock.yaml- a draft scope lock that refuses to invent release-plan authorityintake-questions.md- missing answers that must be resolved before validation
The hard boundary is unchanged: draft intake is not approval. The next step is
to complete the TODOs and run agent-pipeline-codex:validate-manifest; only
then should agent-pipeline-codex:run-pipeline start.
The trusted hook layer now includes a file-backed memory foundation for active
runs. When hooks can identify .agent-runs/<run-id>/active-control-state.md,
they write compact receipts under:
.agent-runs/<run-id>/memory/
|-- events.jsonl # all memory-worthy hook events
|-- turns.jsonl # user prompts submitted during the active run
|-- decisions.jsonl # tool warnings, denials, and approval decisions
|-- open_loops.jsonl # failed-tool guidance and invalid-stop continuations
|-- memory_probe.log # proof that hooks fired and saw the run
`-- handoff_current.md # compact wake-up summary regenerated from JSONL
SessionStart injects handoff_current.md alongside the active run context on
the next startup/resume, so a new Codex session wakes up with the run state,
recent warnings, open loops, and resume checklist without depending on chat
memory. This is deliberately local and deterministic: no daemon, no vector
database, and no external service are required.
The JSONL files are memory receipts, not approval authority. Before relying on remembered state, re-run the relevant manifest, scope, directive, policy, or release checks. A future optional adapter can mirror these receipts into a localmem MCP server for semantic search and longer-term retention, but the built-in memory layer stands on its own.
local-rehearsal.md now inserts Step 2.5 after the local verify-release.sh
run. On any failure or hang, the role requires a quick branch-vs-merge-base
check before deeper debugging so the agent can tell whether it is fixing its
branch or chasing a pre-existing main-branch flake.
verifier.md cross-references the same rule when a verifier test run fails or
hangs in a way the implementation report did not show. The durable behavior is
role text, not a chat-side reminder.
Once a project is initialized:
Use intake for Add account deletion to settings with tests and docs.
# draft intake.md + manifest/scope-lock
Use new-run for feature my-task-slug. # initialize blank manifest + scope-lock skeletons
# (you fill in both, then:)
Use validate-manifest for 2026-05-09-my-task-slug.
# fix schema issues before agent work starts
Use run-pipeline for feature 2026-05-09-my-task-slug.
# orchestrates the full sequence,
# stops at human gates and on failure,
# resumable from .agent-runs/<run-id>/run.log
Use show-run-status for 2026-05-09-my-task-slug.
# read-only status when you need orientation
Three human-approval gates per run: manifest, plan, manager-decision. Each is a one-question prompt: APPROVE or describe what should change.
Before starting a run, validate the manifest and scope lock with the same checks used by the policy stage:
python scripts/policy/validate_manifest.py --run 2026-05-09-my-task-slug
python scripts/policy/check_scope_lock.py --run 2026-05-09-my-task-slugThe validators report exact violations and the fix path before the pipeline spends agent time on a fuzzy manifest or the wrong canonical rung.
For work whose end-state is a published release artifact (module version bump, dependency migration), use module-release instead of feature. It adds two stages that prevent the most expensive class of failure: cascading discovery of pre-existing CI infrastructure bugs during the remote release run.
Use new-run for module-release my-module-v1.2.0-migration.
Use run-pipeline for module-release 2026-05-11-my-module-v1.2.0-migration.
The pipeline runs six phases:
- Phase 0 - Preflight auditor. Audits the module's release workflow before any product code is touched. YAML parse, workflow run health, referenced scripts exist, local
verify-release.shon fresh state, cross-platform reality check, diagnostic instrumentation, audit-punchlist correlation. Bugs found are bundled into ONE PR. Seepipelines/roles/preflight-auditor.md. - Phase 1 - Scoped product work. The executor role with pre-authorized self-classification rules (LIVE-STATE / FROZEN-EVIDENCE / SHAPE-GUARD / OWN-MODULE-VERSION for grep hits; MECHANICAL-CI-BUG / CONTRACT-CHANGE / ENVIRONMENTAL / NOVEL for failures) so the agent doesn't halt-and-ask on routine cases. See
pipelines/self-classification-rules.md. - Phase 2 - Local release rehearsal. Mirrors the CI environment and runs the release sequence locally on fresh state before tag push. The workflow becomes the execution mechanism, not the discovery mechanism. See
pipelines/roles/local-rehearsal.md. - Phase 3 - Remote release + umbrella reconciliation. Tag push, release workflow watch, umbrella PR through the project's release-lockstep gate.
- Phase 4 - Verifier. Independent fresh-context check of all release artifacts and durable docs.
- Phase 5 - Manager. Final PROMOTE/BLOCK/REPLAN with verifier evidence cited verbatim.
Human gates at Phase 0 results review, Phase 2 rehearsal-ok, and Phase 5 release-ok.
Operator reference: docs/module-release-handbook.md covers initial setup, expected timing per sprint type, and an honest "what this pipeline does NOT prevent" section. Originating failure receipts are documented in the handbook so future operators understand why each stage exists.
For projects where one AI implements and a second AI audits, v0.3 adds a complementary discipline that catches drift the pipeline doesn't:
Use audit-init for this project.
This scaffolds three artifacts:
<PROJECT>_AUDIT_GATE.md(out-of-repo) - short mandatory gate the auditing agent reads every verification turn.<PROJECT>_AUDIT_PROTOCOL.md(out-of-repo) - long reference protocol with the 10-section output shape, status-word rules, and a known drift patterns catalog.<project>/docs/process/5-lens-self-audit.md(in-repo, via PR) - shared discipline both agents read. The implementer runs a hostile 5-lens self-audit before every push (Engineering / UX / Tests / Docs / QA), plus a post-push SHA-propagation step.
Plus per-agent wiring (Codex project instructions feedback file on the Codex side; the equivalent project-context or skill-registration mechanism on the other AI's side) so each agent reads the right artifact on session start.
The discipline is symmetric - any AI can play either role. The plugin runs in Codex Desktop App; the second AI can be any tool that exposes a project-context or standing-instructions surface.
Stacking with v0.2:
- Pipeline (v0.2) catches execution-cascade failures: pre-existing CI bugs, tag-move dances, halt-and-ask loops.
- Audit-handoff (v0.3) catches drift failures: wrong endpoint, stale CHANGELOG, "Closed" without evidence, status-word abuse.
The two address different failure modes. Use both for projects with two AI systems.
Operator reference: docs/audit-handoff-handbook.md.
Real-time action-level supervision inside the executor stage. This is not a new pipeline stage; it is opt-in infrastructure that intercepts the executor's tool calls, classifies each one by risk, and spawns a judge subagent for the dangerous ones. Built from the Lindy case study (May 2026, Nate Jones, "LLM-as-Judge"): an agent that sent 14 unauthorized emails because manual confirmation prompts had trained the operator to click "okay" reflexively. Prompts don't hold across long context; the architectural fix is a second agent whose sole loyalty is the manifest.
# Opt in by creating .pipelines/action-classification.yaml in your project.
# The orchestrator detects the file at run start and uses Handler 3a
# (classify -> judge -> execute) instead of Handler 3 for the executor stage.
# No other commands change.
Every tool call the executor proposes is classified against .pipelines/action-classification.yaml and routed by risk class:
read_only(e.g.cat,grep,git log) -> execute + log.reversible_write(e.g.str_replace_editor,git commit,cp) -> execute + log.external_facing(e.g.git pushto non-main,gh pr create,curl -X POST) -> STOP, spawn judge.high_risk(e.g.rm -rf,git push --force,git push main,DROP TABLE,npm publish) -> STOP, spawn judge, and require human confirm even after the judge ALLOWs.
The judge subagent receives the manifest, the matched action policy, prior judge decisions for this run, and the structured action proposal - but not the executor's internal reasoning chain. Context isolation is the mechanism: the judge cannot be persuaded by the same chain of inferences that produced the over-broad action.
allow- within scope, evidence supports the justification. Proceed.block- outside scope, violates policy, or unsupported by evidence. Halt.revise- legitimate basis, wrong form. Send concrete revision instruction back to the executor (e.g. "push to feature branch, not main"). Max 3 revision cycles per action; auto-escalate after.escalate- requires human judgment (money, credentials, legal, or judge confidence < 0.7). Pipeline pauses for a specific human question.
judge-log.yaml- chronological log of every action: tool, arguments, class, disposition (auto_allow / judged_allow / judged_revise / judged_block / judged_escalate / human_confirmed / human_blocked), judge reason, revision instruction. Verifier and manager read this.judge-metrics.yaml- counts by class and disposition,escalation_rate,judge_invocations,revision_cycles. The escalation rate is the operator's tuning signal - too low means rules too permissive, too high means trust is being eroded by reflexive APPROVE clicking.
If .pipelines/action-classification.yaml does not exist in your project, the executor stage runs exactly as in v0.3 and earlier - no judge, no judge-log.yaml, no behavioral change. The presence of the file at run start opts that run into the judge layer. Operators choose whether to enable it per-project (or per-branch, since it's a file in the repo).
Stacking with v0.2 and v0.3:
- Pipeline (v0.2) catches execution-cascade failures - pre-existing CI bugs, tag-move dances, halt-and-ask loops.
- Audit-handoff (v0.3) catches drift failures - wrong endpoint, stale CHANGELOG, status-word abuse.
- Judge layer (v0.4) catches unauthorized actions in real time - destructive commands, external writes, force pushes, and credential-touching operations evaluated against the manifest at the action boundary instead of after the fact.
Operator reference: USER-MANUAL.md Section "The judge layer (v0.4)".
v0.7.0 adds optional Codex lifecycle hooks. When Codex is configured with
[features].plugin_hooks = true, the plugin can add active-run context at
session start, warn on stale standalone skill names, preflight risky tool
calls, deny clearly unsafe approval requests, add corrective context after
failed tools, and continue instead of stopping when active-control-state.md
says the run is not at a valid stop condition.
Hooks do not replace the directive contract, judge layer, policy scripts, or human fallback gates. They move existing pipeline discipline closer to Codex's runtime lifecycle. The default mode is warn/context; hard blocks are limited to concrete safety violations such as destructive commands, force pushes, secret exposure, out-of-scope active-run writes, and invalid pipeline stops.
Local hooks work with a normal signed-in Codex session. Codex access tokens are documented only for ChatGPT Business and Enterprise workspaces, so they are optional for CI automation and not required for local Max/Pro-style operation.
Six structural changes that make the pipeline run with one AI while still blocking drift and rogue-agent failures. Built from the design question: "can the pipeline do both action-level judge AND post-hoc audit with one AI?" Answer is yes - at the cost of accepting some correlated single-model-family blind spots, with explicit mitigations.
- Critic stage (
pipelines/roles/critic.md) - adversarial cold read of every artifact in a fresh context. Walks six lenses (engineering, UX, tests, docs, QA, scope). Emits a parseable**Findings:**count line. Structural substitute for cross-family verification. - Drift-detector stage (
pipelines/roles/drift-detector.md) - compares manifest contract against assembled final state. Catches durable doc drift, status-word abuse, cross-file inconsistency, ledger top-totals vs row counts. Emits parseable**Drift:**count line. - Pre-edit fact-forcing in executor - before the first edit per file, the executor must produce importers/callers, public API affected, schema, and the manifest goal quoted verbatim. Forces investigation that catches blast-radius surprises before they hit the verifier.
- Expanded judge classification - five new
high_riskpatterns:npm install -g,sudo, non-editable non-userpip install,git commitwith BREAKING in message. - Machine-checkable auto-promote (
scripts/auto_promote.py) - six conditions checked from the artifact stack: verifier-clean, critic-clean, drift-clean, policy-passed, judge-clean, tests-passed. When all six pass, the manager gate auto-fires; when any fails, the human gate remains. - Strict manifest schema validation (
scripts/check_manifest_schema.py) - minimum-lengthgoalanddefinition_of_done, non-emptyexpected_outputs/non_goals/rollback_plan, forbidden status words banned from manifest contracts, broadallowed_pathsrequires non-emptyforbidden_paths. Fuzzy manifests fail at the start of the run, not after they cascade.
Pre-v0.5: manifest -> research -> plan -> test-write -> execute -> policy -> verify -> manager
v0.5: manifest -> research -> plan -> test-write -> execute -> policy -> verify -> drift-detect -> critique -> auto-promote -> manager
The new three stages add ~10-20 minutes of wall-clock per run depending on artifact size. The auto-promote stage typically lands the manager decision in <30 seconds when the six conditions are green, eliminating the third human gate on clean runs.
Single-model-family blind spots correlate. If both the executor and the critic share a wrong assumption that fits the manifest, both sign off and auto-promote fires green. Dual-AI (v0.3) is the only structural defense against this. Recommended mitigation: periodic sample audit by a different model family on a weekly cadence. The v0.3 audit-init discipline still applies; v0.5 does not replace it.
- v0.2 catches execution-cascade failures (pre-executor).
- v0.3 catches drift failures via cross-family audit (post-executor, separate session).
- v0.4 catches unauthorized actions in real time (during executor).
- v0.5 catches the drift class without needing a second AI (during verify -> drift-detect -> critique -> auto-promote).
The four stack. Most projects run v0.4 + v0.5 by default and reach for v0.3 when they have two model families available.
Operator reference: USER-MANUAL.md Section "v0.5 single-AI hardening" + ARCHITECTURE.md Section 8 "Single-AI hardening (v0.5)".
v0.6.0 closes the directive-conformant re-approval loop. Operators can add
.agent-runs/<run-id>/directive.yaml before a run to pre-declare the exact
manifest, exact scope lock, plan acceptance assertions, and manager acceptance
assertions. The runner binds the directive hash into run.log, auto-approves
the manifest and plan gates only when on-disk artifacts mechanically conform,
and extends auto-promote from six conditions to six plus the directive's
manager assertions.
Manual fallback is preserved: absent directives, malformed directives,
mismatches, failed assertions, and hash tampering keep or restore the existing
interactive gate. Judge-layer high-risk confirmations and escalate prompts
remain human surfaces because a directive cannot know future tool calls.
v0.5.10 closes the stale-control-state escape hatch by centralizing stop
truth in stop_validator.py. final_response_gate.py,
pipeline_continue.py, and agent_decision_gate.py now call the same validator
instead of each trusting active-control-state.md independently. Human gates
are valid only at manifest, plan, or manager stages; failed-gate stops must be
backed by current run.log evidence when a log exists; stale scope-repair
states cannot authorize stopping after a passing scope-lock-receipt.txt; and
unrecorded blocker claims require an evidence file, not just plausible text.
v0.5.9 adds the missing guard between "the agent is authorized to continue"
and "the agent is working on the right thing." A run now carries
.agent-runs/<run-id>/scope-lock.yaml, which binds the work to a specific
section of the canonical release plan. The policy stage blocks missing locks,
release-plan mismatches, future-rung paths, contradictory docs, and commit
subjects that belong to another rung.
Before starting rung work from a prompt, the orchestrator runs:
python scripts/policy/agent_decision_gate.py --run <run-id> --intent start_rung_work --claimed-rung <rung> --prompt-text "<user wording>"If the user says "Begin v0.6 publish-dashboard work" while the release plan
says v0.6 is "Summary + signed records," the gate returns SCOPE_CONFLICT.
The agent stops before edits and requires an explicit scope amendment. It may
not infer that the release ladder changed.
- It will not propose broad autonomous mode. Hooks add guardrails around explicit pipeline authority; every gate remains explicit or mechanically directive-bound.
- It will not silently expand scope. The policy stage blocks any change outside
allowed_paths. - It will not silently change rungs. The scope-lock gates block prompt, path, docs, and commit-message conflicts with the canonical release plan.
- It will not skip tests. AGENTS.md hard-rule "never skip tests" is enforced as a project default.
- It will not promote a run if the verifier marked any criterion NOT MET. Manager hard rule.
After pipeline-init, the files installed in your repo are yours to edit. Add project-specific policy checks alongside the generic ones (e.g., a CivicCast-style check_ffmpeg_wrapper.py). Customize the role files to reference your project's specific ADRs, AGENTS.md sections, test patterns. The plugin's workflow skills work against whatever's in your repo's .pipelines/ and scripts/policy/ directories.
skills/*/SKILL.md- Codex skill entrypoints for onboarding, new-run, validation, run-pipeline, status, and audit-initcommands/pipeline-init.md- the full onboarding workflow logiccommands/new-run.md- the run-init workflow logiccommands/validate-manifest.md- manifest schema preflightcommands/run-pipeline.md- the orchestrator workflow logic (includes v0.4 Handler 3a judge interception and v0.5 Handler 4 auto-promote-aware manager)commands/show-run-status.md- read-only run status workflowcommands/audit-init.md- scaffolds the v0.3 dual-AI audit-handoff infrastructurepipelines/roles/*.md- what each role does, what's forbiddenpipelines/manifest-template.yaml- every manifest field with inline docspipelines/scope-lock-template.yaml- canonical rung/scope lock templatescripts/auto_promote.py- v0.5 six-condition promote-eligibility checker (--versionto confirm release)scripts/check_manifest_schema.py- v0.5 manifest validator (--versionto confirm release)scripts/check_scope_lock.py- canonical release-plan rung validatorscripts/check_rung_file_ownership.py- future-rung path and commit-subject blockerscripts/check_release_docs_consistency.py- docs vs locked rung consistency gatescripts/validate_manifest.py- standalone manifest preflight wrapper for source and scaffolded projectsscripts/check_decision_ledger.py- schema-v1 decision-ledger validatorscripts/show_run_status.py- read-only status summary over run.log and active-control-state.pipelines/templates/workflow-cost-directives.md- canonical GitHub Actions cost-discipline contractUSER-MANUAL.md- end-to-end operator reference, every command + gate + troubleshooting caseARCHITECTURE.md- diagrams, file layout, artifact data flow, extension pointsCHANGELOG.md- every release with rationale, what changed, what didn't
This plugin's defaults reflect failures from prior projects. Notably:
- Halts apply to ALL repo state changes, including in-flight cleanup work
- Auto mode never overrides explicit stops
- "Tests pass locally" is not evidence; CI on a fresh dep set is
- Manifest amendments are corrections, not expansions, and the pattern of needing them means the manifest template should use directory-level path granularity for same-module test paths
- The manager must cite verifier evidence verbatim; encouragement and summarization are forbidden in the role file
Apache 2.0.