feat: add JSON settings runtime policy flow by bma-d · Pull Request #1 · bmad-code-org/automator

bma-d · 2026-04-14T08:12:50Z

Summary

add the JSON settings implementation plan plus bundled policy, prompt templates, parse contracts, and review contract payloads
move runtime prompt, parse, verifier, retry-budget, and state-snapshot behavior onto runtime_policy and shared success verifier flows
extend unit, smoke, and verify coverage for snapshot resume semantics, parser/runtime policy loading, and validator compatibility edge cases

Testing

npm run verify

Summary by CodeRabbit

New Features
- Runtime policy system with JSON-configurable orchestration, pinned policy snapshots, and a new unified verify-step CLI; validate-story-creation documented as a legacy wrapper.
Bug Fixes & Improvements
- Fail-closed resume semantics, stricter state-file propagation across create/build→monitor→verify handoffs, clearer verifier/parse error responses, and safer asset/snapshot path validation.
Documentation
- Expanded changelogs, CLI/dev guides, plans, and examples describing policy/verifier flows and migration.
Tests
- Added unit/integration suites for policy, parsing, verifiers, state snapshots, and resume behavior; added Python test script.

coderabbitai · 2026-04-14T08:13:04Z

Walkthrough

Adds a runtime policy system and verifier registry, refactors CLI to forward --state-file and use policy-driven step contracts, introduces orchestration policy/prompt/parse artifacts, snapshots/policy-hash resume semantics, many new tests, and extensive documentation and changelog updates.

Changes

Cohort / File(s)	Summary
Core policy & verifiers `source/src/story_automator/core/runtime_policy.py`, `source/src/story_automator/core/success_verifiers.py`	New runtime policy loader/merge/validation/snapshot APIs and a verifier registry with implementations for session_exit, create_story_artifact, review_completion, epic_complete; policy hash and path-safety checks.
CLI command surface `source/src/story_automator/commands/orchestrator.py`, `source/src/story_automator/commands/orchestrator_parse.py`, `source/src/story_automator/commands/tmux.py`, `source/src/story_automator/commands/validate_story_creation.py`, `source/src/story_automator/commands/state.py`	Added `verify-step`, propagated `--state-file` handling, wired CLI flows to load runtime policy/step contracts, unified verifier dispatch, stronger error JSON shapes and exit-code behavior.
Workflow path resolution `source/src/story_automator/core/workflow_paths.py`	Replaced filesystem-first asset discovery with policy-driven step asset resolution and centralized path factory.
Parse/agent runtime `source/src/story_automator/commands/orchestrator_parse.py`	Parse-output uses policy-provided parse schema/prompt and runtime parser config (provider/model/timeout); stronger validation of sub-agent JSON against requiredKeys/schema.
State handling & templates `payload/.claude/skills/bmad-story-automator/templates/state-document.md`, `docs/state-and-resume.md`, `source/src/story_automator/commands/state.py`	State frontmatter now includes `policyVersion`, `policySnapshotFile`, `policySnapshotHash`, `legacyPolicy`; state build validates/snapshots effective policy and resume semantics require snapshot/hash for new-format state.
Payload policy & assets `payload/.claude/skills/bmad-story-automator/data/orchestration-policy.json`, `payload/.claude/skills/bmad-story-automator-review/contract.json`, `payload/.claude/skills/bmad-story-automator/data/prompts/`, `payload/.claude/skills/bmad-story-automator/data/parse/`	Added orchestration-policy.json (runtime settings, workflow, step configs), review contract, new prompt templates and parse schemas for steps.
Payload docs / handoffs `payload/.claude/skills/bmad-story-automator/data/monitoring-pattern.md`, `payload/.claude/skills/bmad-story-automator/data/code-review-loop.md`, `payload/.claude/skills/bmad-story-automator/data/retry-fallback-implementation.md`, `payload/.claude/skills/bmad-story-automator/steps-c/*`, `payload/.claude/skills/bmad-story-automator/workflow.md`	Examples updated to propagate `--state-file`, replace legacy create validation with `orchestrator-helper verify-step`, and switch spawn→monitor→parse to spawn→monitor→verify flows.
CLI docs & developer flow `docs/cli-reference.md`, `payload/.claude/skills/bmad-story-automator/data/scripts-reference.md`, `docs/development.md`	Documented `orchestrator-helper verify-step`, marked `validate-story-creation` as legacy wrapper, added verify/create examples, and added `npm run test:python` into verification flow.
Package & smoke checks `package.json`, `scripts/smoke-test.sh`	Added `test:python` npm script and inserted it into `verify`; smoke test expanded to assert new policy assets and `--state-file` usage in docs.
Tests `source/tests/test_runtime_policy.py`, `source/tests/test_success_verifiers.py`, `source/tests/test_orchestrator_parse.py`, `source/tests/test_state_policy_metadata.py`, `source/tests/test_success_verifiers.py`	New comprehensive unittest suites covering runtime policy loading/merge/snapshotting, verifier behavior and safety, parse-output validation, state/policy metadata, and CLI integration.
Docs / plans / changelog `docs/plans/json-settings/*`, `docs/changelog/{260413,260414,260415}.md`	Added multi-file JSON-settings design and migration plan, and extended changelog entries documenting snapshot, verifier, and resume semantics.
Small payload updates `payload/.claude/skills/bmad-story-automator/data/retry-fallback-implementation.md`, `payload/.claude/skills/bmad-story-automator/data/scripts-reference.md`, `payload/.claude/skills/bmad-story-automator/steps-c/step-03a-execute-review.md`	Updated scripts/docs to forward `--state-file` and to use verifier-driven checks instead of before/after deltas.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant RuntimePolicy as RuntimePolicy\n(runner)
  participant Verifier as SuccessVerifiers\n(verifier)
  participant ParserAgent as SubAgent\n(parser)
  participant FS as Filesystem

  Note over CLI,RuntimePolicy: CLI invokes spawn/monitor/verify-step/parse-output
  CLI->>RuntimePolicy: load_runtime_policy(state_file?)
  RuntimePolicy->>FS: read bundled/override policy & snapshot
  FS-->>RuntimePolicy: policy JSON + snapshot hash
  RuntimePolicy-->>CLI: step contract + parser/runtime config

  alt verify-step / monitor completion
    CLI->>Verifier: run_success_verifier(step, story_key, contract, state_file)
    Verifier->>FS: read artifacts / story files / sprint-status
    FS-->>Verifier: artifact list / frontmatter / sprint data
    Verifier-->>CLI: {verified: true|false, reason?, metadata}
  else parse-output
    CLI->>ParserAgent: run sub-agent with provider/model/timeout and prompt
    ParserAgent->>FS: read prompt templates / parse schema if needed
    ParserAgent-->>CLI: agent stdout (JSON)
    CLI->>RuntimePolicy: validate returned JSON against parse schema
    RuntimePolicy-->>CLI: validation result (ok / error reason)
  end

  CLI->>FS: write/read policy snapshot or emit structured JSON result

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I nibbled at JSON, pinned a tiny map,
Verifiers hopped on, checked every gap.
Prompts and schemas, snug in a row,
State-file in paw, ready to go.
A rabbit cheers: "Snapshots — steady and sap!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add JSON settings runtime policy flow' clearly describes the main change: introducing a JSON-based runtime policy system. It is concise, specific, and aligns with the substantial refactoring across documentation, configuration, and code modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bma-d/json-settings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode · 2026-04-14T08:16:55Z

🤖 Augment PR Summary

Summary: This PR introduces a snapshot-backed JSON “runtime policy” layer for story-automator so prompts, parse contracts, budgets, and success verification become data-driven and deterministic on resume.

Changes:

Adds bundled policy + prompt templates + parse schema JSON under the installed skill payload.
Implements runtime_policy.py to load/merge overrides, validate, resolve assets/paths, and write pinned policy snapshots.
Adds a named success verifier registry (success_verifiers.py) and routes review verification through a structured contract file.
Refactors tmux-wrapper build-cmd and monitor-session to use policy-backed prompt rendering and verifier dispatch.
Adds orchestrator-helper verify-step and updates docs/workflow examples to pass --state-file consistently.
Extends unit + smoke coverage and updates npm run verify to include Python unit tests.

Technical Notes: Resume paths for new-format state docs are enforced to use the pinned snapshot (hash-verified), while legacy state docs continue to use bundled defaults.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md (1)
155-157: ⚠️ Potential issue | 🟠 Major

Pass --state-file in dev build-cmd to keep policy snapshot pinned.
Line 155-157 still builds the dev command without --state-file, while create now pins it. This can cause create/dev to resolve different runtime policy sources in the same orchestration run.
Proposed fix
 session=$("$scripts" tmux-wrapper spawn dev {epic} {story_id} \
   --agent "$current_agent" \
-  --command "$("$scripts" tmux-wrapper build-cmd dev {story_id} --agent "$current_agent")")
+  --command "$("$scripts" tmux-wrapper build-cmd dev {story_id} --agent "$current_agent" --state-file "$state_file")")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md`
around lines 155 - 157, The dev tmux-wrapper invocation (session=...
tmux-wrapper spawn dev) builds the dev command via tmux-wrapper build-cmd but
omits the --state-file flag, so dev can pick a different policy snapshot than
create; update the build-cmd call inside the session assignment to pass the same
--state-file value used by create (e.g. --state-file "$state_file" or
{state_file}) so tmux-wrapper build-cmd dev {story_id} --agent "$current_agent"
includes --state-file and both spawn dev and build-cmd use the pinned policy
snapshot.
source/src/story_automator/commands/tmux.py (1)
206-219: ⚠️ Potential issue | 🔴 Critical

Use shlex.quote() to properly escape the rendered prompt before passing it to the shell.

The current escaping prompt.replace("\\", "\\\\").replace('"', '\\"') only neutralizes \ and " characters. However, within double-quoted shell strings, $VAR, $(...), and backticks will still be expanded. Since prompt is constructed from policy templates (repo-controlled) and extra_instruction (CLI-provided), these shell metacharacters create a command injection vulnerability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/src/story_automator/commands/tmux.py` around lines 206 - 219, The
prompt built by _render_step_prompt is being escaped manually into escaped and
passed into shell strings which leaves shell metacharacters injectable; replace
the manual replace-based escaping with shlex.quote(prompt) (import shlex at top)
and use that quoted value in both branches (the codex exec command and the
fallback print using cli) so the prompt is safely shell-escaped; update
references to escaped in the codex branch (the CODEX_EXEC string) and the else
branch (the cli invocation) to use shlex.quote(prompt) instead of the current
escaped variable.

🧹 Nitpick comments (1)

source/tests/test_state_policy_metadata.py (1)

254-262: Assert the state-file value is excluded too.

This only checks that the literal --state-file token is absent. A parser regression could still leak str(state_file) into the rendered prompt and this test would stay green.

✅ Small test hardening

         rendered = stdout.getvalue()
         self.assertNotIn("--state-file", rendered)
+        self.assertNotIn(str(state_file), rendered)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/tests/test_state_policy_metadata.py` around lines 254 - 262, The test
test_build_cmd_does_not_treat_state_file_flag_as_prompt_text currently only
asserts the literal "--state-file" is absent from the rendered output; extend
the assertion to also ensure the actual state file path value (the
str(state_file) variable) is not present. Modify the test to add an assertion
like self.assertNotIn(str(state_file), rendered) after captured output is read
so the test rejects any regression that leaks the state file value into the
rendered prompt.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/development.md`:
- Around line 16-18: The verify sequence is inconsistent: the bullet list under
`npm run verify` currently lists `npm run test:python -> npm run pack:dry-run ->
npm run test:smoke` but the Mermaid diagram shows `npm run test:python -> npm
run test:smoke -> npm run pack:dry-run`; pick the correct canonical order and
make both places match by editing either the bullet list or the Mermaid flow to
use the same sequence of scripts (`npm run test:python`, `npm run pack:dry-run`,
`npm run test:smoke` or the alternate ordering), ensuring the script names
(`test:python`, `pack:dry-run`, `test:smoke`, and the `npm run verify`
description) are updated consistently.

In `@docs/plans/json-settings/02-policy-model.md`:
- Around line 132-160: The example uses unimplemented keys (workflow.optional
and workflow.triggers) and models "retro" as a trigger
(retrospective_on_epic_complete), which will be rejected by the current
validator; update the example to only use the implemented policy surface: keep
workflow.sequence (include "retro" in the sequence if you want that step),
workflow.repeat (review block with maxCycles, successVerifier, onIncomplete,
onExhausted) and workflow.crash (maxRetries, onExhausted), and remove
workflow.optional, the triggers array, and the retrospective_on_epic_complete
trigger entry as well as skipWhenOverride/skipAutomate usage so the example
matches the bundled policy and covered tests.

In `@docs/plans/json-settings/README.md`:
- Around line 61-63: The README currently lists only the snapshot pointer/hash
but omits other runtime-required metadata; update the "state-policy metadata
contract" to explicitly enumerate all four fields used by the runtime loader:
the snapshot pointer, snapshot hash, policyVersion, and legacyPolicy (or
alternatively add a cross-reference link to the section that fully defines
them). For clarity, show expected types/semantics for each field (e.g., pointer:
string, hash: string, policyVersion: semver/string, legacyPolicy: boolean) and
indicate how the loader uses policyVersion and legacyPolicy to distinguish
legacy state from snapshot-backed state.

In `@payload/.claude/skills/bmad-story-automator-review/workflow.yaml`:
- Line 19: The top-level "contract" field in workflow.yaml is unused by the
runtime; remove or deprecate the top-level contract entry ("contract:
\"./contract.json\"") and ensure all policy contracts are referenced via
step-level settings ("contractFile" / "contractPath") in the step configs or
update the loader to read the top-level field—pick one approach: either delete
the top-level "contract" key from workflow.yaml to avoid confusion, or modify
the policy resolution code to merge/consider the top-level "contract" alongside
step-level "contractFile"/"contractPath" so runtime enforcement is consistent.

In `@payload/.claude/skills/bmad-story-automator/data/orchestration-policy.json`:
- Around line 110-115: The policy currently relies on the text field
defaultExtraInstruction to trigger autonomous auto-fixes which is brittle;
instead add a stable boolean flag (e.g., "autoFix": true) at the same level as
"prompt" or inside the "prompt" object and remove reliance on
defaultExtraInstruction for behavior control; update the
orchestration-policy.json entry containing "prompt", "templateFile",
"interactionMode", "acceptExtraInstruction", and "defaultExtraInstruction" so
that the runtime checks the new "autoFix" flag (and ignore/override
defaultExtraInstruction) to deterministically enable auto-fix behavior.
- Around line 43-49: The snapshot currently records only
templateFile/schemaFile/contract paths (e.g., templateFile, schemaFile,
contractPath) while runtime consumers call read_text() to load files, so mid-run
edits bypass snapshot integrity; change the snapshot logic that builds
orchestration-policy.json to store the actual file contents (e.g.,
templateContent, schemaContent, contractContent) or at minimum the content hash
alongside paths, and update consumers that now call read_text() to instead use
the embedded content from the snapshot (or verify read_text() results against
the stored content hash and fail if mismatched); specifically update the code
that resolves "prompt.templateFile"/"parse.schemaFile" to capture file contents
or hashes at snapshot time and use those fields during resume.

In `@payload/.claude/skills/bmad-story-automator/data/parse/review.json`:
- Around line 2-14: The parser currently only checks top-level "requiredKeys"
presence but must enforce the nested "schema" rules; update the
parsing/validation logic so that "status" only accepts the enum values
SUCCESS|FAILURE|AMBIGUOUS, "issues_found" is an object with integer fields
"critical","high","medium","low", "all_fixed" is a boolean (true|false), and
"next_action" only allows proceed|retry|escalate; locate the code that reads the
"requiredKeys"/"schema" definition (the parser/validator handling review.json)
and implement recursive shape validation and enum/type checks for those specific
keys ("status","issues_found","all_fixed","next_action") and return/raise a
clear validation error when the shape or types do not match.

In `@source/src/story_automator/commands/orchestrator_parse.py`:
- Around line 69-77: The _load_parse_contract function currently only checks
that payload.get("requiredKeys") is a list; update _load_parse_contract to
validate that each element of requiredKeys is a non-empty string (or other
expected key type) and raise ValueError("invalid parse schema") if any item is
not; locate the checks in _load_parse_contract (payload, requiredKeys, schema)
and add an iteration over requiredKeys to assert isinstance(k, str) and
k.strip() != "" (or the project-specific key format), so malformed entries like
true are rejected before runtime and avoid downstream _has_required_keys
failures.
- Around line 14-21: The loop that parses args currently silently ignores a bare
"--state-file" token and leaves state_file as empty, causing
load_runtime_policy(...) to read the live policy; change the parsing in the
while loop that inspects args and state_file so that when args[idx] ==
"--state-file" but there is no following value (idx + 1 >= len(args)) you fail
closed: raise or return an error (e.g., raise ValueError or exit with non-zero
and clear error message) indicating a malformed "--state-file" flag instead of
continuing; ensure the error path is tied to the same parsing block that sets
state_file so load_runtime_policy(...) is only called when a valid non-empty
state_file was provided.

In `@source/src/story_automator/commands/orchestrator.py`:
- Around line 424-444: The loop over tail should treat a flag present without a
following value as a contract error instead of silently ignoring it and the
verifier run must not allow ValueError to escape; update the for loop that
inspects tail (the block that sets state_file/output_file) so that if arg ==
"--state-file" or "--output-file" but idx+1 is out of range or
tail[idx+1].startswith("--"), raise a PolicyError (or otherwise set payload to
the same verifier_contract_invalid error path) instead of ignoring it, and
expand the exception handler to catch ValueError in addition to
FileNotFoundError and PolicyError so that resolve_success_contract and
run_success_verifier errors (including malformed contract ValueError) map to the
verifier_contract_invalid payload.

In `@source/src/story_automator/commands/state.py`:
- Line 46: The snapshot_effective_policy(get_project_root()) call in the
state-doc creation path can raise and currently will crash the command; wrap
that call in a try/except, catch exceptions from snapshot_effective_policy, log
the error, and ensure the command returns a structured JSON result instead of
propagating the exception (for example include keys like "policy_snapshot": null
and "policy_error": "<error message>" in the state-doc output). Update the code
around snapshot_effective_policy/get_project_root so failures are handled
gracefully and the rest of state-doc creation continues or returns the handled
JSON error response.

In `@source/src/story_automator/commands/tmux.py`:
- Around line 787-788: The current branch returning None when verifier_name is
one of {"create_story_artifact", "review_completion", "epic_complete"} and
story_key.strip() is empty hides a verifier misconfiguration; change this to
return an explicit failure (e.g., raise a ValueError or return a failure
sentinel) instead of None so cmd_monitor_session can detect and abort rather
than falling back to normal_completion; update the branch around the
verifier_name check that references story_key.strip() to raise (or return) a
clear error containing the verifier_name and that story_key is required.

In `@source/src/story_automator/commands/validate_story_creation.py`:
- Around line 180-184: The try/except around
create_check_payload()/build_check_response() currently only catches PolicyError
and ValueError so a FileNotFoundError from resolve_success_contract (when
--state-file points to a missing snapshot or resolved contract is absent)
bubbles up; update the exception handling to also catch FileNotFoundError and
call print_check_error(story_id, reason=str(exc), before_count=before_count,
after_count=after_count) so missing snapshot/contract errors are normalized into
the JSON retry/error payload. Ensure the change references create_check_payload,
build_check_response, resolve_success_contract, and print_check_error so the
missing-file path is handled the same as PolicyError/ValueError.

In `@source/src/story_automator/core/review_verify.py`:
- Around line 16-20: The wrapper currently uses "success_contract or
resolve_success_contract(...)" which prevents an injected empty contract and
only catches FileNotFoundError/ValueError, so PolicyError leaks; change to
select the contract with an explicit None check (if success_contract is None:
contract = resolve_success_contract(... ) else: contract = success_contract) and
extend the except clause to include PolicyError (e.g., except
(FileNotFoundError, ValueError, PolicyError) as exc) so malformed/missing
contracts are normalized to the existing {"verified": False, "reason":
"review_contract_invalid", "input": story_key, "error": str(exc)} return;
reference resolve_success_contract and review_completion to locate the code to
update.

In `@source/src/story_automator/core/runtime_policy.py`:
- Around line 70-74: The snapshot currently only hashes the policy JSON but
keeps templateFile, schemaFile, and contractFile as paths so resumed runs
re-resolve those files against the working tree; update
snapshot_effective_policy()/_stable_policy_json to read and embed the actual
contents (or content hashes) of any referenced templateFile, schemaFile, and
contractFile into the stable JSON before computing snapshot_hash and writing
snapshot_path, and ensure load_policy_snapshot() uses those embedded contents
(or verifies embedded content hashes) instead of re-reading from disk; reference
the symbols _stable_policy_json, snapshot_effective_policy,
load_policy_snapshot, templateFile, schemaFile, contractFile, snapshot_dir,
snapshot_path, and policySnapshotHash when making the change.
- Around line 68-74: The code builds snapshot_dir from
_snapshot_relative_dir(policy) and then calls ensure_dir/write_atomic, allowing
an absolute or path-traversal override via snapshot.relativeDir; restrict this
by validating and normalizing the returned path against the workspace root:
obtain root (project_root), compute candidate = (root /
_snapshot_relative_dir(policy)).resolve(), ensure
candidate.is_relative_to(root.resolve()) (or compute candidate.relative_to(root)
and raise/log and abort if it fails), and use that constrained candidate for
ensure_dir and write_atomic; apply the same validation/sanitization to the other
places that use _snapshot_relative_dir or snapshot.relativeDir (the occurrences
around the other reported blocks).
- Around line 354-386: Both _resolve_candidate_file and _resolve_data_path
currently allow absolute paths and parent traversal to arbitrary files; update
them to reject any resolved path that escapes its allowed root(s). For
_resolve_candidate_file (references: function name _resolve_candidate_file,
variables skill_dir, candidates, project_root) resolve each candidate path and
ensure candidate.resolve().is_relative_to(skill_dir.resolve()) (or use try:
candidate.resolve().relative_to(skill_dir.resolve())) before checking is_file(),
and if it escapes raise PolicyError. For _resolve_data_path (references:
function name _resolve_data_path, parameters project_root and bundle_root)
disallow absolute paths that are outside both bundle_root.resolve() and
project_root.resolve(); for relative candidates resolve against each base and
verify resolved_candidate.relative_to(base.resolve()) before accepting it,
otherwise raise PolicyError indicating the path escapes allowed roots. Ensure
checks occur prior to returning file paths so no external filesystem locations
can be referenced.

In `@source/tests/test_runtime_policy.py`:
- Around line 120-132: The tests currently expect load_runtime_policy to
silently fall back to the live policy when a .claude/.story-automator-active
marker is dangling or malformed; instead update the two tests
(test_missing_marker_state_falls_back_to_effective_policy and
test_malformed_marker_falls_back_to_effective_policy) to assert that
load_runtime_policy(str(self.project_root)) raises PolicyError for those marker
conditions (dangling stateFile and invalid JSON respectively), so the codebase
fails closed on a broken active-run marker rather than switching back to the
live policy.

---

Outside diff comments:
In `@payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md`:
- Around line 155-157: The dev tmux-wrapper invocation (session=... tmux-wrapper
spawn dev) builds the dev command via tmux-wrapper build-cmd but omits the
--state-file flag, so dev can pick a different policy snapshot than create;
update the build-cmd call inside the session assignment to pass the same
--state-file value used by create (e.g. --state-file "$state_file" or
{state_file}) so tmux-wrapper build-cmd dev {story_id} --agent "$current_agent"
includes --state-file and both spawn dev and build-cmd use the pinned policy
snapshot.

In `@source/src/story_automator/commands/tmux.py`:
- Around line 206-219: The prompt built by _render_step_prompt is being escaped
manually into escaped and passed into shell strings which leaves shell
metacharacters injectable; replace the manual replace-based escaping with
shlex.quote(prompt) (import shlex at top) and use that quoted value in both
branches (the codex exec command and the fallback print using cli) so the prompt
is safely shell-escaped; update references to escaped in the codex branch (the
CODEX_EXEC string) and the else branch (the cli invocation) to use
shlex.quote(prompt) instead of the current escaped variable.

---

Nitpick comments:
In `@source/tests/test_state_policy_metadata.py`:
- Around line 254-262: The test
test_build_cmd_does_not_treat_state_file_flag_as_prompt_text currently only
asserts the literal "--state-file" is absent from the rendered output; extend
the assertion to also ensure the actual state file path value (the
str(state_file) variable) is not present. Modify the test to add an assertion
like self.assertNotIn(str(state_file), rendered) after captured output is read
so the test rejects any regression that leaks the state file value into the
rendered prompt.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0373739-64c5-4287-bdbb-6d49573cb2e8

📥 Commits

Reviewing files that changed from the base of the PR and between 27e6a1c and 55b2fd4.

📒 Files selected for processing (45)

docs/changelog/260413.md
docs/cli-reference.md
docs/development.md
docs/plans/json-settings/01-why-json-settings.md
docs/plans/json-settings/02-policy-model.md
docs/plans/json-settings/03-code-and-payload-changes.md
docs/plans/json-settings/04-migration-testing-and-risks.md
docs/plans/json-settings/README.md
docs/plans/json-settings/TODO.md
docs/state-and-resume.md
package.json
payload/.claude/skills/bmad-story-automator-review/contract.json
payload/.claude/skills/bmad-story-automator-review/workflow.yaml
payload/.claude/skills/bmad-story-automator/data/code-review-loop.md
payload/.claude/skills/bmad-story-automator/data/monitoring-pattern.md
payload/.claude/skills/bmad-story-automator/data/orchestration-policy.json
payload/.claude/skills/bmad-story-automator/data/parse/auto.json
payload/.claude/skills/bmad-story-automator/data/parse/create.json
payload/.claude/skills/bmad-story-automator/data/parse/dev.json
payload/.claude/skills/bmad-story-automator/data/parse/retro.json
payload/.claude/skills/bmad-story-automator/data/parse/review.json
payload/.claude/skills/bmad-story-automator/data/prompts/auto.md
payload/.claude/skills/bmad-story-automator/data/prompts/create.md
payload/.claude/skills/bmad-story-automator/data/prompts/dev.md
payload/.claude/skills/bmad-story-automator/data/prompts/retro.md
payload/.claude/skills/bmad-story-automator/data/prompts/review.md
payload/.claude/skills/bmad-story-automator/data/retry-fallback-implementation.md
payload/.claude/skills/bmad-story-automator/data/scripts-reference.md
payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md
payload/.claude/skills/bmad-story-automator/templates/state-document.md
payload/.claude/skills/bmad-story-automator/workflow.md
scripts/smoke-test.sh
source/src/story_automator/commands/orchestrator.py
source/src/story_automator/commands/orchestrator_parse.py
source/src/story_automator/commands/state.py
source/src/story_automator/commands/tmux.py
source/src/story_automator/commands/validate_story_creation.py
source/src/story_automator/core/review_verify.py
source/src/story_automator/core/runtime_policy.py
source/src/story_automator/core/success_verifiers.py
source/src/story_automator/core/workflow_paths.py
source/tests/test_orchestrator_parse.py
source/tests/test_runtime_policy.py
source/tests/test_state_policy_metadata.py
source/tests/test_success_verifiers.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

payload/.claude/skills/bmad-story-automator/steps-c/step-03a-execute-review.md (1)

77-84: ⚠️ Potential issue | 🟠 Major

The compact review summary drops issue counts from the shipped review contract.

parse-output "$review_log" review is validated against the review parse contract, which guarantees issues_found and summary, not an .issues array. In this form, issues_count will stay 0 and top_issues will stay [] unless the model happens to emit undocumented extras, so the parent flow loses the actual review signal.

Suggested fix

-review_summary=$("$scripts" orchestrator-helper parse-output "$review_log" review --state-file "$state_file" | jq -c '
-  {
-    next_action: (.next_action // "retry"),
-    confidence: (.confidence // 0),
-    error_class: (.error_class // "unknown"),
-    issues_count: ((.issues // []) | length),
-    top_issues: ((.issues // [])[:3])
-  }
-')
+review_summary=$("$scripts" orchestrator-helper parse-output "$review_log" review --state-file "$state_file" | jq -c '
+  {
+    next_action: (.next_action // "retry"),
+    confidence: (.confidence // 0),
+    error_class: (.error_class // "unknown"),
+    issues_count: ((.issues_found // {}) | [.critical, .high, .medium, .low] | map(. // 0) | add),
+    top_issues: ([.summary] | map(select(type == "string" and length > 0))[:3])
+  }
+')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@payload/.claude/skills/bmad-story-automator/steps-c/step-03a-execute-review.md`
around lines 77 - 84, The compact review summary is reading a non-existent
.issues array so issues_count and top_issues are always empty; update the
parsing of the output produced by parse-output "$review_log" review to use the
contract fields (e.g. use .issues_found for the count and .summary for the
returned summary/top issues) instead of .issues; modify the block that builds
review_summary (the jq transform invoked after parse-output) to compute
issues_count from (.issues_found // 0) and set top_issues from (.summary // []
or appropriate shape) while preserving the existing
next_action/confidence/error_class fallbacks.

🧹 Nitpick comments (2)

source/src/story_automator/core/runtime_policy.py (1)
501-516: Consider clarifying the fallback logic for missing policy fields.

The _state_policy_mode function returns ("", "", True) (legacy mode) in two cases: when legacyPolicy == "true" and when all policy fields are absent. This fallback behavior is correct but the duplicate return on lines 515-516 could be consolidated.
💡 Simplify fallback return

The final two returns (lines 514-516) both return ("", "", True). Consider consolidating:
     if legacy_policy == "true":
         return "", "", True
-    return "", "", True
+    # No policy metadata present - treat as legacy
+    return "", "", True
A comment would clarify that reaching line 516 means no policy metadata was found, defaulting to legacy mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/src/story_automator/core/runtime_policy.py` around lines 501 - 516, In
_state_policy_mode, consolidate the duplicate final returns that both return
("", "", True) by removing the redundant branch and leaving a single default
return for legacy mode; specifically, handle the explicit legacy_policy ==
"true" case with one return and then have a final return ("", "", True) as the
default when no metadata is present, and add a short comment explaining that the
final return is the fallback to legacy mode when no policy metadata is supplied.
source/tests/test_orchestrator_parse.py (1)
131-146: Verify mock assertion for runtime config wiring.

The test checks that run_cmd is called with expected provider/model arguments and timeout. However, the assertion uses call_args.args[:4] which assumes positional argument ordering.
💡 Consider using keyword argument assertions for clarity

If run_cmd signature changes, this test could become fragile. Consider verifying specific arguments by name if the mock supports it, or add a comment documenting the expected argument order.
# Current assertion relies on positional order: (provider, "-p", "--model", model)
self.assertEqual(mock_run.call_args.args[:4], ("claude", "-p", "--model", "sonnet"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/test_orchestrator_parse.py` around lines 131 - 146, The
assertion in test_parser_runtime_uses_policy_settings is fragile because it
inspects run_cmd positional args via mock_run.call_args.args[:4]; update the
test to assert the specific parameters more robustly by either checking
mock_run.call_args.kwargs for named parameters (e.g., "timeout") and using
helper checks for the provider/model flags, or use mock_run.assert_any_call /
mock_run.assert_called_with with the exact expected signature to explicitly
verify ("claude", "-p", "--model", "sonnet") and timeout=33; update references
to run_cmd, parse_output_action, and mock_run accordingly so the test no longer
depends on positional ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/src/story_automator/commands/tmux.py`:
- Around line 793-801: The code in resolve_success_contract handling returns
None when the contract omits "verifier", causing cmd_monitor_session() to treat
the run as "normal_completion"; change that behavior to fail closed like
orchestrator.py by returning the same error tuple used for invalid contracts.
Specifically, inside the block after verifier_name =
str(contract.get("verifier") or "").strip(), replace the "if not verifier_name:
return None" with a return ({"verified": False, "reason":
"verifier_contract_invalid"}, "") so resolve_success_contract /
cmd_monitor_session and the verifier_name logic (and the subsequent special-case
check for create_story_artifact/review_completion/epic_complete) handle missing
verifiers as an invalid contract rather than a successful normal completion.
- Around line 202-216: The code currently catches
load_runtime_policy/step_contract errors but calls _render_step_prompt
afterwards which can raise FileNotFoundError (missing template) and crash;
update the flow to catch template-loading/rendering errors too by either moving
the call to _render_step_prompt (and subsequent shlex.quote usage) into the same
try block that handles load_runtime_policy and step_contract, or wrap the
_render_step_prompt call in its own try/except that catches FileNotFoundError
(and PolicyError if applicable), prints the exception to stderr (same style as
existing handler) and returns 1; reference policy, contract,
_render_step_prompt, prompt, and shlex.quote so the fix keeps consistent error
handling for missing prompt templates.

---

Outside diff comments:
In
`@payload/.claude/skills/bmad-story-automator/steps-c/step-03a-execute-review.md`:
- Around line 77-84: The compact review summary is reading a non-existent
.issues array so issues_count and top_issues are always empty; update the
parsing of the output produced by parse-output "$review_log" review to use the
contract fields (e.g. use .issues_found for the count and .summary for the
returned summary/top issues) instead of .issues; modify the block that builds
review_summary (the jq transform invoked after parse-output) to compute
issues_count from (.issues_found // 0) and set top_issues from (.summary // []
or appropriate shape) while preserving the existing
next_action/confidence/error_class fallbacks.

---

Nitpick comments:
In `@source/src/story_automator/core/runtime_policy.py`:
- Around line 501-516: In _state_policy_mode, consolidate the duplicate final
returns that both return ("", "", True) by removing the redundant branch and
leaving a single default return for legacy mode; specifically, handle the
explicit legacy_policy == "true" case with one return and then have a final
return ("", "", True) as the default when no metadata is present, and add a
short comment explaining that the final return is the fallback to legacy mode
when no policy metadata is supplied.

In `@source/tests/test_orchestrator_parse.py`:
- Around line 131-146: The assertion in test_parser_runtime_uses_policy_settings
is fragile because it inspects run_cmd positional args via
mock_run.call_args.args[:4]; update the test to assert the specific parameters
more robustly by either checking mock_run.call_args.kwargs for named parameters
(e.g., "timeout") and using helper checks for the provider/model flags, or use
mock_run.assert_any_call / mock_run.assert_called_with with the exact expected
signature to explicitly verify ("claude", "-p", "--model", "sonnet") and
timeout=33; update references to run_cmd, parse_output_action, and mock_run
accordingly so the test no longer depends on positional ordering.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c8edcd3e-5163-4da9-91f9-7aab50250164

📥 Commits

Reviewing files that changed from the base of the PR and between 55b2fd4 and e8a8f78.

📒 Files selected for processing (21)

docs/changelog/260414.md
docs/changelog/260415.md
docs/development.md
docs/plans/json-settings/02-policy-model.md
docs/plans/json-settings/03-code-and-payload-changes.md
docs/plans/json-settings/README.md
payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md
payload/.claude/skills/bmad-story-automator/steps-c/step-03a-execute-review.md
scripts/smoke-test.sh
source/src/story_automator/commands/orchestrator.py
source/src/story_automator/commands/orchestrator_parse.py
source/src/story_automator/commands/state.py
source/src/story_automator/commands/tmux.py
source/src/story_automator/commands/validate_story_creation.py
source/src/story_automator/core/review_verify.py
source/src/story_automator/core/runtime_policy.py
source/src/story_automator/core/success_verifiers.py
source/tests/test_orchestrator_parse.py
source/tests/test_runtime_policy.py
source/tests/test_state_policy_metadata.py
source/tests/test_success_verifiers.py

✅ Files skipped from review due to trivial changes (4)

docs/changelog/260415.md
payload/.claude/skills/bmad-story-automator/steps-c/step-03-execute.md
docs/plans/json-settings/README.md
docs/plans/json-settings/02-policy-model.md

🚧 Files skipped from review as they are similar to previous changes (4)

docs/development.md
docs/plans/json-settings/03-code-and-payload-changes.md
source/src/story_automator/commands/validate_story_creation.py
source/src/story_automator/commands/orchestrator.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/src/story_automator/commands/tmux.py`:
- Around line 736-755: The code currently infers output_verified from the
presence of the output file when calling _emit_monitor, causing contradictory
JSON; instead thread the actual verifier result returned by
_verify_monitor_completion into _emit_monitor. In the branch where verification
is not None (the variables verification -> verified, verifier_name), pass
output_verified as bool(verified.get("verified")) (or the appropriate truthy
check of verified["verified"]) to _emit_monitor for both the "completed" and
"incomplete" calls rather than using bool(output) so the emitted JSON reflects
the verifier outcome; make the same change in the similar block around the lines
noted (the other _emit_monitor calls at 777-779).
- Around line 802-811: The try/except around run_success_verifier currently only
catches PolicyError, so file-missing errors (e.g., FileNotFoundError from a
verifier contract referencing a deleted JSON) escape cmd_monitor_session; update
the handler around run_success_verifier to also catch FileNotFoundError (and
optionally OSError) and return ({"verified": False, "reason":
"verifier_contract_invalid"}, verifier_name) just like the existing PolicyError
branch; touch the block that calls run_success_verifier (and related
resolve_success_contract flow) so missing verifier-side files are normalized to
the same verifier_contract_invalid result.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0dc5f9d4-2169-44a1-910f-1d56791d95a1

📥 Commits

Reviewing files that changed from the base of the PR and between e8a8f78 and d7ffdf5.

📒 Files selected for processing (4)

docs/changelog/260415.md
source/src/story_automator/commands/tmux.py
source/tests/test_state_policy_metadata.py
source/tests/test_success_verifiers.py

🚧 Files skipped from review as they are similar to previous changes (1)

docs/changelog/260415.md

coderabbitai

🧹 Nitpick comments (2)

source/tests/test_success_verifiers.py (1)
700-717: Consider importing os at module level.

The patch_env class imports os locally in each method. Since os is a standard library module with no side effects on import, moving it to the top-level imports would be more idiomatic.
🔧 Suggested change

Add to imports at top of file:
 import io
 import json
+import os
 import shutil
Then simplify the class:
     def __enter__(self) -> None:
-        import os
-
         self.previous = os.environ.get("PROJECT_ROOT")
         os.environ["PROJECT_ROOT"] = self.project_root

     def __exit__(self, exc_type, exc, tb) -> None:
-        import os
-
         if self.previous is None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/test_success_verifiers.py` around lines 700 - 717, Import the
standard library module os at the module level and remove the local imports
inside the patch_env class; update the class (patch_env) to reference os
directly in its __enter__ and __exit__ methods instead of performing "import os"
there, keeping the existing behavior of saving/restoring PROJECT_ROOT via
self.previous and os.environ.
source/src/story_automator/commands/tmux.py (1)
743-765: Dead code path at line 765.

_verify_monitor_completion() always returns a tuple[dict, str]—no code path returns None. This makes line 765 unreachable under the current implementation. While not harmful (it's defensive), consider either:

Removing the None return type from _verify_monitor_completion and this fallback, or

Adding a comment explaining this is intentional defensive code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/src/story_automator/commands/tmux.py` around lines 743 - 765, The
final fallback branch handling verification is None is unreachable because
_verify_monitor_completion() always returns a (dict, str) tuple; remove the dead
path by deleting the unconditional return that assumes verification is None and
instead rely on the verified/incomplete branches; also update
_verify_monitor_completion's return type/signature (and any callers) to reflect
it never returns None or, if you prefer defensive clarity, keep the fallback but
add a concise comment above it stating it is intentionally defensive and why.
Ensure references to _verify_monitor_completion and the local variable
verification are updated accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@source/src/story_automator/commands/tmux.py`:
- Around line 743-765: The final fallback branch handling verification is None
is unreachable because _verify_monitor_completion() always returns a (dict, str)
tuple; remove the dead path by deleting the unconditional return that assumes
verification is None and instead rely on the verified/incomplete branches; also
update _verify_monitor_completion's return type/signature (and any callers) to
reflect it never returns None or, if you prefer defensive clarity, keep the
fallback but add a concise comment above it stating it is intentionally
defensive and why. Ensure references to _verify_monitor_completion and the local
variable verification are updated accordingly.

In `@source/tests/test_success_verifiers.py`:
- Around line 700-717: Import the standard library module os at the module level
and remove the local imports inside the patch_env class; update the class
(patch_env) to reference os directly in its __enter__ and __exit__ methods
instead of performing "import os" there, keeping the existing behavior of
saving/restoring PROJECT_ROOT via self.previous and os.environ.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c94a33d9-0635-4ee7-a317-11a091c4498e

📥 Commits

Reviewing files that changed from the base of the PR and between d7ffdf5 and 85f49a4.

📒 Files selected for processing (3)

docs/changelog/260415.md
source/src/story_automator/commands/tmux.py
source/tests/test_success_verifiers.py

🚧 Files skipped from review as they are similar to previous changes (1)

docs/changelog/260415.md

bma-d added 12 commits April 14, 2026 05:12

docs: add json settings implementation plan

1fd07b7

feat: add JSON runtime policy foundation

af8c1d6

fix: harden runtime policy snapshot handling

01a3e1c

feat: wire policy-backed success verifiers

e84e4b6

fix: harden success verifier review repairs

f12226a

refactor: route create validation through verifier

5ea8540

fix: restore verify-step retry contract

1b8e2d0

fix: enforce snapshot-only resume semantics

9236024

fix: tighten state policy compatibility helpers

3b5d55f

fix: harden parser runtime and validator compatibility

54ffd3f

fix: close state-summary and validator gaps

2dbf6f3

docs: update json settings todo status

667b4d5

augmentcode bot reviewed Apr 14, 2026

View reviewed changes

Comment thread source/src/story_automator/core/runtime_policy.py Outdated

Comment thread source/src/story_automator/commands/orchestrator.py Outdated

Comment thread source/src/story_automator/commands/tmux.py

fix: address PR review follow-ups

55b2fd4

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

bma-d added 2 commits April 14, 2026 22:01

fix: address CodeRabbit review findings

fdb4d71

fix: harden policy resume and review parsing

e8a8f78

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread source/src/story_automator/commands/tmux.py

Comment thread source/src/story_automator/commands/tmux.py

fix: harden tmux monitor failure handling

d7ffdf5

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread source/src/story_automator/commands/tmux.py

Comment thread source/src/story_automator/commands/tmux.py

fix: tighten tmux monitor output verification

85f49a4

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

bma-d merged commit 15da072 into main Apr 15, 2026
3 checks passed

coderabbitai bot mentioned this pull request Apr 16, 2026

Fix tmux shell compatibility and Claude runner completion #2

Open

Conversation

bma-d commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

augmentcode bot commented Apr 14, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bma-d commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading