diff --git a/GOAL_PROMPT.md b/GOAL_PROMPT.md index c6fb0ed..788cd33 100644 --- a/GOAL_PROMPT.md +++ b/GOAL_PROMPT.md @@ -10,7 +10,7 @@ This file is the current goal-mode entrypoint. Historical phase prompts live in Current phase: -- v1.11 JS Dynamic Runtime MVP is planned as the next implementation contract. +- v1.11 JS Dynamic Runtime MVP is implemented in the current working tree; next phase is not selected yet. Latest decisions: @@ -31,6 +31,12 @@ Latest decisions: SHA-256 in v1.11. CWF must never grant more authority than the parent Codex session already has. Generated scripts must be previewed and approved before execution. +- v1.11 implementation closeout: `dynamic-js` now supports local preview-first + `workflow.js` harnesses, AST policy validation with Acorn, Node Permission + Model child execution, parent CWF JSON-RPC APIs, read-only agent mutation + detection, strict `inherit-session` gating, dynamic artifacts, CLI preview + smoke, and documentation updates. Controlled live Codex-worker smoke still + requires Ender approval. Archived phase prompts: diff --git a/README.md b/README.md index 733e821..bb3597f 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ A lightweight, Codex-native workflow layer for multi-agent engineering review. 中文文档: [README.zh-CN.md](README.zh-CN.md) -Codex Flow lets you run repeatable multi-worker workflows using only Codex-native surfaces: no external LLM routers, no private adapters, no separate agent platform. The public pack is read-only by default: review workflows start Codex workers in parallel and aggregate their findings into a stable reduced JSON envelope plus a readable Markdown report. v1.4 ships one narrow gated write workflow, `doc-refresh`, for documentation-only edits after preview and explicit approval. v1.10 adds a safer general write-worker path for bounded patch-mode workflows: a writer works in an isolated target, Codex Flow extracts `artifacts/proposed.patch`, checks `write_policy` paths, runs `git apply --check --3way`, then applies only after the existing approval gate and drift check. +Codex Flow lets you run repeatable multi-worker workflows using only Codex-native surfaces: no external LLM routers, no private adapters, no separate agent platform. The public pack is read-only by default: review workflows start Codex workers in parallel and aggregate their findings into a stable reduced JSON envelope plus a readable Markdown report. v1.4 ships one narrow gated write workflow, `doc-refresh`, for documentation-only edits after preview and explicit approval. v1.10 adds a safer general write-worker path for bounded patch-mode workflows: a writer works in an isolated target, Codex Flow extracts `artifacts/proposed.patch`, checks `write_policy` paths, runs `git apply --check --3way`, then applies only after the existing approval gate and drift check. v1.11 adds the first JavaScript dynamic workflow runtime: local `workflow.js` harnesses are parsed with an AST policy, copied into run artifacts, previewed, approved, then executed in a Node Permission Model child process that can only talk to parent CWF through JSON-RPC. The implemented preview also includes intent-to-preview generation, built-in local dynamic templates, local save/reuse with SHA-bound trust metadata, and guarded dynamic `safePatch`. The long-term shape (post-v1) is a thin layer over Codex itself: Codex owns threads, subagents, sandbox, approvals, permissions, skills, plugins, and worktrees; Codex Flow owns workflow specs, run-state evidence, gates, reducer output, and artifact manifests. @@ -22,6 +22,7 @@ The default catalog includes: - `research-crosscheck`: source fidelity and unsupported-claim review for research or documentation diffs - `release-review`: ship readiness, rollout, rollback, and regression review - `doc-refresh`: gated documentation-only write workflow with dry-run preview, approval, diff summary, rollback, and verification artifacts +- `dynamic-js`: preview-first JavaScript harness execution through `cwf.git`, `cwf.agent.run`, `cwf.safePatch`, `cwf.map`, `cwf.artifacts`, and `cwf.report` The reducer merges duplicate findings, drops weak unsupported claims, ranks severity, preserves worker provenance, and writes a final report. If a worker fails or falls back from malformed structured output, the final verdict can be `DEGRADED` and the report says which evidence is partial. @@ -68,6 +69,19 @@ cwf run doc-refresh --target cwf run workflows/diff-review.yaml --target ``` +Preview a local dynamic JavaScript workflow: + +```bash +cwf dynamic list +cwf dynamic show change-summary +cwf dynamic generate --goal "Summarize this repo diff" --target +cwf dynamic run fixtures/dynamic/read-only.workflow.js --target +cwf dynamic run change-summary --target +cwf dynamic save ./workflow.js --id local-review +cwf approve approve-dynamic +cwf resume +``` + Run in the background: ```bash @@ -117,6 +131,18 @@ Duplicate workflow ids fail clearly instead of picking one silently. Gated workflows can pause before a risky or write-capable phase. `cwf status` and `cwf show` explain the waiting gate and print the exact approve/reject commands. `cwf approve ` records the approval, and `cwf resume ` continues only pending phases. `cwf reject --reason ` stops the run cleanly. Write workflows write `artifacts/write-plan.md`, `artifacts/dry-run-preview.md`, `artifacts/verification-plan.md`, and `artifacts/rollback.md` before approval. After approval the writer runs in an isolated target, CWF stores `artifacts/proposed.patch`, checks `write_policy` paths and `git apply --check --3way`, applies the patch, and records diff, verification, and rollback artifacts. The bundled `doc-refresh` workflow uses `direct-docs` only as a docs/readme/release-note policy preset; it still goes through the same isolated patch apply path. +Dynamic JavaScript workflows are also gated. `cwf dynamic run --target ` writes `artifacts/workflow.js`, `workflow.sha256`, `dynamic-preview.md`, `dynamic-capabilities.json`, and `dynamic-budget.json`, then pauses at `approve-dynamic`. The script must export one async default function. The AST gate rejects imports, dynamic import, `require`, `eval`, `Function`, `globalThis`, `process`, `fetch`, constructor/prototype escape paths, direct shell strings, and call expressions outside `cwf` or approved builtins. Execution fails closed unless the child process can run with Node Permission Model active and without target repo filesystem, network, child-process, worker, native-addon, WASI, FFI, or inspector permissions. `cwf.agent.run` defaults to `read-only`; read-only agents fail the run if the target diff changes. `cwf.safePatch.apply` is the only dynamic write path that applies patches directly: the script must declare `metadata.safe_patch_policy` so the policy is visible in preview, and the runtime `write_policy` must exactly match that metadata. CWF stores `dynamic-proposed.patch`, checks policy and `git apply --check --3way`, applies through the parent process, runs verification, records rollback evidence, and reverse-applies the patch if verification fails. `inherit-session` is allowed only for `generated-current-session` scripts with matching SHA-256 and a known write-capable parent permission cap; copied, remote, unknown, and hash-mismatched workflows fail closed. Remote dynamic workflow URLs cannot run directly; inspect and save a local trusted copy first. + +Dynamic template discovery is local-only: + +```text +./workflows/dynamic/ +./.codex-flow/dynamic-workflows/ +~/.codex-workflows/dynamic/ +``` + +`cwf dynamic save --id ` copies a validated script into `~/.codex-workflows/dynamic/` and writes a `.trust.json` sidecar bound to the source SHA-256. If the saved script changes without a matching trust record, discovery fails instead of running the tampered copy. + `cwf desktop result` bridges completed filesystem runs back into Codex. When CWF is launched by a Codex skill from an active conversation, the primary UX is for the skill to read the completed run and answer in that same conversation. `--print` prints a concise handoff prompt for that path. Without app-server, the command still writes `artifacts/handoff-prompt.md`. `--new-thread` and `--thread ` require a Codex CLI with app-server support, a running app-server daemon, and remote control enabled: ```bash @@ -145,6 +171,15 @@ Run artifacts are stored under: tests.json safety.json artifacts/ + workflow.js + workflow.sha256 + dynamic-preview.md + dynamic-capabilities.json + dynamic-budget.json + dynamic-events.jsonl + dynamic-final.json + dynamic-proposed.patch + dynamic-safe-patch.json write-plan.md dry-run-preview.md verification-plan.md @@ -219,7 +254,7 @@ It does not attempt exact product parity with Claude Code Dynamic Workflows: - no native `/workflows` UI - no automatic `workflow` keyword trigger -- no generated JavaScript workflow scripts +- no unrestricted `node workflow.js`; JavaScript dynamic workflows must pass preview, AST policy, approval, and permissioned child-process execution - no non-Codex model routing - no web UI @@ -231,6 +266,7 @@ See [docs/claude-vs-codex-workflows.md](docs/claude-vs-codex-workflows.md). - `doc-refresh` remains the only bundled user-facing write workflow. It is documentation-only, gated, reversible, and applies through the isolated patch path after explicit approval. - General non-doc write-capable workflows must declare `write_policy` and use patch mode. CWF refuses paths outside `allowed_paths`, forbidden paths, target drift after preview, `git apply --check --3way` conflicts, and failed workflow verification commands. If patch-mode verification fails after apply, CWF attempts to reverse-apply the same proposed patch before returning a failed run. - `direct-docs` is a compatibility policy for `doc-refresh`; source/config write workflows must use explicit patch-mode policy with their own allowed paths and verification commands. +- Dynamic JavaScript workflows do not receive `fs`, `process`, network, shell, or target repo access. All git, agent, artifact, and report actions go through parent CWF APIs. - GitHub PR output is local by default. Nothing is posted unless `cwf github-pr` is run with explicit `--post --repo --pr `. - Workflow suggestions are YAML specs only. They are validated after generation, but they are not installed or run automatically. - Reviews tracked git diffs; untracked file contents are not included. diff --git a/README.zh-CN.md b/README.zh-CN.md index bd705a5..3da3526 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -2,7 +2,7 @@ 一个轻量的 Codex 原生工作流层,用来把一次工程审查拆成多个 Codex worker(工作者)并行做,再合成一份可追踪的 reduced JSON 和 Markdown 报告。 -它只依赖 Codex 原生能力:不接第三方模型路由,不接私有 adapter,不再造一个单独的 agent 平台。公开版默认还是只读 workflow;v1.4 额外带一个很窄的 `doc-refresh`,只允许文档写入,并且必须先生成 preview、过 gate、显式 approve 后才进入写入阶段。v1.10 增加了更安全的通用写 worker 路径:writer 先在 isolated target 里产出 `artifacts/proposed.patch`,CWF 检查 allowed/forbidden path,跑 `git apply --check --3way`,再 apply 到真实 target;非文档写必须声明显式 `write_policy`。 +它只依赖 Codex 原生能力:不接第三方模型路由,不接私有 adapter,不再造一个单独的 agent 平台。公开版默认还是只读 workflow;v1.4 额外带一个很窄的 `doc-refresh`,只允许文档写入,并且必须先生成 preview、过 gate、显式 approve 后才进入写入阶段。v1.10 增加了更安全的通用写 worker 路径:writer 先在 isolated target 里产出 `artifacts/proposed.patch`,CWF 检查 allowed/forbidden path,跑 `git apply --check --3way`,再 apply 到真实 target;非文档写必须声明显式 `write_policy`。v1.11 增加第一版 JavaScript dynamic workflow runtime:本地 `workflow.js` 先过 AST policy、复制到 run artifacts、生成 preview、显式 approve,再放进 Node Permission Model child process 执行;child 只能通过 JSON-RPC 调 parent CWF API。这个 preview 面现在还包括 intent-to-preview 生成、内置本地 dynamic template、本地 save/reuse + SHA trust metadata,以及受控 dynamic `safePatch`。 Codex 负责线程、子 agent、权限和写文件边界;Codex Flow 负责 workflow spec、run store、events、gate、reducer 和 artifact manifest。 @@ -60,6 +60,19 @@ cwf run doc-refresh --target cwf run workflows/diff-review.yaml --target ``` +本地 dynamic JavaScript workflow 先 preview,不会直接启动 worker: + +```bash +cwf dynamic list +cwf dynamic show change-summary +cwf dynamic generate --goal "Summarize this repo diff" --target +cwf dynamic run fixtures/dynamic/read-only.workflow.js --target +cwf dynamic run change-summary --target +cwf dynamic save ./workflow.js --id local-review +cwf approve approve-dynamic +cwf resume +``` + 大 diff 推荐后台跑: ```bash @@ -139,6 +152,18 @@ worker 执行现在走 adapter 层,但仍然只使用 Codex。默认是 `codex 带 gate 的 workflow 会在风险步骤前暂停。`cwf status` / `cwf show` 会直接说明卡在哪个 gate,并给出 approve / reject 命令。`cwf approve ` 记录批准,`cwf resume ` 只继续还没完成的后续 phase;`cwf reject --reason ` 会干净地停止 run。写 workflow 在 approval 前只写 run artifact(`write-plan.md`、`dry-run-preview.md`、`verification-plan.md`、`rollback.md`)。approval 后 writer 只在 isolated target 里写,CWF 保存 `proposed.patch`,检查 `write_policy` 路径和 `git apply --check --3way` 后才 apply 到真实 target,并记录 diff、verification 和 rollback artifact。如果 verification 在 apply 后失败,CWF 会尝试用同一个 patch 做 reverse apply,然后返回 failed run。内置 `doc-refresh` 的 `direct-docs` 只是 docs/readme/release-note policy preset,也走同一条 isolated patch apply 路径;源码/配置写入要用显式 patch policy。 +Dynamic JavaScript workflow 也是 gated。`cwf dynamic run --target ` 会写 `artifacts/workflow.js`、`workflow.sha256`、`dynamic-preview.md`、`dynamic-capabilities.json`、`dynamic-budget.json`,然后停在 `approve-dynamic`。脚本只能导出一个 async default function。AST gate 会拒绝 import、dynamic import、`require`、`eval`、`Function`、`globalThis`、`process`、`fetch`、constructor/prototype escape、直接 shell 字符串,以及不从 `cwf` 或允许 builtin 发起的 call expression。执行时 child process 必须启用 Node Permission Model,且不给 target repo filesystem、network、child-process、worker、native-addon、WASI、FFI、inspector 权限。`cwf.agent.run` 默认 `read-only`;read-only worker 如果改动 target diff,会让 run failed。`cwf.safePatch.apply` 是 dynamic runtime 里唯一会直接 apply patch 的写路径:脚本必须先声明 `metadata.safe_patch_policy`,让 policy 出现在 preview 里;runtime 传入的 `write_policy` 必须和 metadata 完全一致。CWF 会保存 `dynamic-proposed.patch`,检查 policy 和 `git apply --check --3way`,由 parent process apply,跑 verification,记录 rollback evidence;如果 verification 失败,会尝试 reverse apply,然后返回 failed run。`inherit-session` 只允许 `generated-current-session`、SHA-256 匹配、parent permission cap 已知且可写的脚本,copied/remote/unknown/hash mismatch 都 fail closed。远程 dynamic workflow URL 不能直接运行,必须先 inspect 并保存本地 trusted copy。 + +Dynamic template discovery 只扫本地: + +```text +./workflows/dynamic/ +./.codex-flow/dynamic-workflows/ +~/.codex-workflows/dynamic/ +``` + +`cwf dynamic save --id ` 会把通过校验的脚本复制到 `~/.codex-workflows/dynamic/`,并写一个绑定 SHA-256 的 `.trust.json` sidecar。保存后的脚本如果被改但 trust record 没同步,discovery 会失败,不会继续运行。 + `cwf desktop result` 用来把已完成的文件系统 run 带回 Codex。如果 CWF 是由当前 Codex 会话里的 skill 发起,主路径应该是 skill 读取 run 结果,然后直接在这个发起会话里回复。`--print` 会打印一段适合这条路径的简洁 handoff prompt;不依赖 app-server 时也会写 `artifacts/handoff-prompt.md`。`--new-thread` 和 `--thread ` 需要支持 app-server 的 Codex CLI、运行中的 app-server daemon,以及已开启 remote control: ```bash @@ -196,6 +221,15 @@ Artifacts: tests.json safety.json artifacts/ + workflow.js + workflow.sha256 + dynamic-preview.md + dynamic-capabilities.json + dynamic-budget.json + dynamic-events.jsonl + dynamic-final.json + dynamic-proposed.patch + dynamic-safe-patch.json write-plan.md dry-run-preview.md verification-plan.md diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index 488a6db..f6d92ef 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -37,6 +37,20 @@ Codex Flow 1.0.0 is the first stable public release of the Codex-native workflow - `research-crosscheck` - `release-review` +## Implemented Preview + +- Dynamic JavaScript workflow runtime: + - `cwf dynamic generate` + - `cwf dynamic list` + - `cwf dynamic show` + - `cwf dynamic save` + - `cwf dynamic run` +- Built-in local dynamic templates: + - `change-summary` + - `docs-change-check` +- SHA-bound local trust metadata for saved dynamic workflows +- Guarded dynamic `cwf.safePatch.apply` with preview-visible `metadata.safe_patch_policy`, runtime policy match checks, `git apply --check --3way`, verification, and rollback-on-verification-failure evidence + ## Public Boundary Codex Flow 1.0.0 intentionally does not include: @@ -44,7 +58,7 @@ Codex Flow 1.0.0 intentionally does not include: - non-Codex model routing - private adapters - remote workflow marketplace -- generated JavaScript workflow execution +- remote dynamic workflow execution by URL - broad production write-capable workflows beyond gated documentation refresh - automatic GitHub posting - automatic installation or execution of generated workflow suggestions diff --git a/docs/CWF_COMPLETE_STATE_PLAN.md b/docs/CWF_COMPLETE_STATE_PLAN.md new file mode 100644 index 0000000..6fef4f7 --- /dev/null +++ b/docs/CWF_COMPLETE_STATE_PLAN.md @@ -0,0 +1,784 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete Claude-like dynamic workflow state +coverage: Complete roadmap for moving Codex Flow from the current v1.11 preview state to a Claude-like, Codex-native complete state, including usage boundaries and phase acceptance criteria. +not_complete_for: Exact Claude product parity, hosted platform scheduling, unrestricted JavaScript, non-Codex model routing, production deploy automation, database writes, credentials, payments, permissions, or unreviewed autonomous writes. +verification_level: docs-only +real_smoke_status: requires_approval +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Approved; no blocker/high/real medium issues after integrating trq212/MinLi failure modes, pattern library, quarantine, and use-case guidance. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# CWF Complete State Plan + +## Alignment Snapshot + +- **Building**: the full roadmap for making CWF feel like Claude Dynamic Workflows while staying Codex-native. +- **Not building**: a one-step next-phase plan only, an unrestricted Node runtime, an exact Claude clone, hosted queues, model routing, or broad autonomous writes. +- **Source of truth**: current CWF docs and implementation state, especially `README.md`, `docs/POST_V1_PLAN.md`, `docs/JS_DYNAMIC_WORKFLOWS_PLAN.md`, `docs/WHEN_TO_USE_CWF.md`, `docs/WORKER_APP_THREADS_PLAN.md`, `docs/WRITE_WORKERS_PLAN.md`, and PR #1 state on `codex/v1.11-js-dynamic-runtime`. +- **External references**: trq212's "A harness for every task" article and MinLi's Chinese annotated breakdown, both cached under `.superx/articles/`. +- **Deliverables**: complete-state definition, current-vs-target gap table, roadmap phases, PRD, SPEC, acceptance criteria, usage matrix, and staged goal prompts. +- **Phase scope**: roadmap. It contains multiple implementable phases; it is not itself one goal-mode slice. +- **Completeness**: complete for deciding what "CWF complete" means and how to get there; not complete for implementation until each phase is opened as its own goal. +- **Verification level**: docs-only for this plan. Each implementation phase below has its own required local, CI, controlled real-smoke, and review evidence. +- **Review requirement**: Reasonix/v4Pro review required before this plan is treated as final. +- **Verification**: `git diff --check`, delivery-doc validator, Reasonix review, then per-phase commands listed in acceptance. +- **Open decisions**: none blocking. Phase A has an MVP in the current working tree; if that slice is accepted, the next roadmap phase is Phase B: same-conversation result return. + +Capability sentence: + +This planning pass helps Codex Flow reach a complete Claude-like dynamic workflow experience by defining the finished product, roadmap phases, usage boundaries, and verification gates, using the current CWF runtime/docs evidence, while avoiding unsafe writes, exact Claude parity claims, and duplicated Codex-native infrastructure. + +## Direct Answer + +The list below is the **complete-state roadmap**, not just the next stage: + +1. Codex generates `workflow.js` for the user's task instead of making the user hand-write scripts. +2. Results return to the initiating Codex conversation by default. +3. Workers become visible when useful: read-only workers can run as Desktop threads, write workers stay behind `safePatch` or trusted `inherit-session`. +4. CWF ships built-in dynamic modes such as deep research, repo audit, migration planning, adversarial review, and safe fix loop. +5. Good dynamic workflows can be saved, reused, and packaged as workflow templates or skills. + +The first vertical slice of that roadmap is: + +> Given a user request, Codex produces a previewable `workflow.js`, CWF validates it, asks for approval, runs it safely, and returns the result to the same Codex conversation. + +Current implementation note: the current working tree already contains a Phase A MVP for `cwf dynamic generate`. Treat Phase A docs as acceptance/hardening guidance unless that code is reverted or rejected. + +## Delivery Pack + +For implementation handoff, use the split delivery pack instead of copying sections out of this long roadmap: + +- `docs/cwf-complete-state/PRD.md` +- `docs/cwf-complete-state/SPEC.md` +- `docs/cwf-complete-state/CURRENT_VS_COMPLETE.md` +- `docs/cwf-complete-state/ACCEPTANCE.md` +- `docs/cwf-complete-state/GOAL_PROMPTS.md` + +The roadmap here explains why and how the pieces fit together. The delivery pack is the sharper surface for goal-mode execution. + +## Why CWF Exists: Failure Modes To Prevent + +The strongest Claude Dynamic Workflows breakdowns frame workflow as a way to move control flow out of one long chat context. CWF should use the same product logic, with Codex-native safety. + +| Failure mode | What happens in one long agent conversation | CWF answer | +|---|---|---| +| Agentic laziness | The agent stops after partial progress and calls the task done. | Use explicit phases, worker counts, stop conditions, and artifacts. | +| Self-preferential bias | The agent prefers its own theory, patch, or ranking when asked to judge it. | Use independent verifier, challenger, tournament, or reducer workers. | +| Goal drift | Edge constraints disappear after many turns or compaction. | Keep the approved workflow script, preview, budget, and stop rules as external state. | +| Context pollution | Worker findings, raw logs, and irrelevant detail crowd the main conversation. | Store worker outputs in run artifacts and return only a reduced result to Codex. | +| Privilege mixing | The same worker reads untrusted input and performs high-permission actions. | Use quarantine: read untrusted content in read-only workers; gated actor workers perform any action. | +| Budget runaway | Many workers or loops spend too much time/tokens. | Put max agents, max concurrency, timeout, output bytes, and future token budget in preview. | + +This means CWF is not just "parallel Codex." It is the layer that holds the plan, evidence, budget, gates, and reducer outside the main conversation. + +## What "Complete" Means + +CWF is complete when a user can say: + +> Run a dynamic workflow to audit this repo for auth risks and fix only the small safe issues after review. + +And the system does this: + +1. Codex chooses whether CWF is appropriate. +2. Codex drafts a task-specific `workflow.js`. +3. CWF shows a readable preview: purpose, phases, agents, budgets, permissions, write policy, stop rules. +4. The user approves or rejects. +5. CWF runs the script through a constrained runtime and Codex-native workers. +6. Read-only workers can appear as Codex Desktop threads when app-server execution is available. +7. Write work uses only approved paths: + - `safePatch` for public/auditable patch mode; + - `inherit-session` only for trusted generated scripts and never beyond the parent Codex permission cap. +8. CWF stores full evidence: script, SHA, preview, events, workers, findings, patches, verification, rollback, result. +9. The initiating Codex conversation receives a short human result and links to artifacts. +10. The user can save the workflow as a reusable local workflow template or skill. + +Complete does not mean: + +- CWF replaces Codex conversation mode. +- CWF automatically runs on every large task. +- CWF becomes a hosted agent platform. +- Dynamic JavaScript can touch files, network, process, shell, or target repo directly. +- Writes happen without approval, policy, verification, and rollback evidence. + +## Current State vs Complete State + +| Layer | Current state | Complete state | +|---|---|---| +| Static workflows | Stable CLI workflows exist | Still supported as the reliable repeatable base | +| Safe writes | v1.10 patch-mode path exists | Used as the default write path for dynamic workflows | +| Dynamic JS runtime | v1.11 preview branch supports local `workflow.js` with preview, gate, AST policy, child runtime, and CWF APIs | Codex can generate the script from user intent and run it through the same guarded path | +| Same-conversation return | Skill wrapper/manual result handoff is the intended default | CWF invocation from Codex reliably returns a plain result to the initiating thread | +| Worker visibility | app-thread worker path exists with capability/probe constraints | Read-only workers can be visible Desktop worker threads when available; SDK fallback remains explicit | +| Write worker visibility | Safe writes run through isolated patch application, not Desktop app-thread writes | Write workers remain safePatch/inherit-session controlled; no hidden Desktop direct writes | +| Built-in modes | Static catalog plus dynamic fixture | Dynamic catalog: deep-research, repo-audit, migration-plan, adversarial-review, safe-fix-loop, root-cause-investigation, rule-mining, tournament-selection, triage-quarantine, eval-and-rubric | +| Save/reuse | Local YAML registry and suggestions exist; dynamic JS is preview execution | Approved dynamic scripts can become templates, local workflows, or skills with trust metadata | +| Native UI parity | CLI status/watch/artifacts; no Claude `/workflows` panel | Codex-native best effort: same-conversation summaries, visible worker threads, artifact links, optional explicit new thread | + +## Where CWF Should Be Used + +Use CWF when the work benefits from orchestration, evidence, and repeatability. + +| Work type | Use CWF? | Best mode | Availability | +|---|---:|---|---| +| Code diff review | Yes | `diff-review` | Current stable | +| Broad repo health audit | Yes | `repo-audit`; later dynamic repo-audit | Current stable for static workflow; dynamic version planned | +| PRD/SPEC/plan review | Yes | `implementation-plan` | Current stable | +| Factual/source-fidelity review | Yes | `research-crosscheck` | Current stable | +| Release readiness | Yes | `release-review` | Current stable | +| Documentation-only bounded write | Yes | `doc-refresh` | Current stable gated write | +| Small safe code fix with known paths | Sometimes | patch-mode write; later safe-fix-loop | Current stable for custom patch-mode with per-workflow verification; dynamic loop planned | +| Large migration planning | Yes | `migration-plan` dynamic mode | Planned | +| Adversarial review before merge | Yes | `adversarial-review` dynamic mode | Planned | +| Deep research with many independent sources | Sometimes | `deep-research` dynamic mode; source retrieval still belongs to `superx`, browser, or read tools | Planned | +| Flaky test or intermittent failure investigation | Yes | `root-cause-investigation` dynamic mode | Planned | +| Mining repeated corrections from sessions/reviews | Yes | `rule-mining` dynamic mode | Planned | +| Naming, ranking, or selecting among many candidates | Sometimes | `tournament-selection` dynamic mode | Planned | +| Large queue triage over untrusted public input | Sometimes | `triage-quarantine` dynamic mode | Planned with quarantine safety | +| Skill/model/prompt eval against a rubric | Yes | `eval-and-rubric` dynamic mode | Planned | +| One-file bug fix | Usually no | direct Codex | Current direct Codex path | +| UI taste, visual design, copywriting | Usually no | MiMo/design/Reasonix, then Codex implements | Current adjacent skills, not CWF | +| Production deploy, DB migration, credentials, payments, permissions | No by default | separate high-risk plan with explicit approval | Out of CWF public core | + +Short rule: + +> Use CWF when you need multiple workers, durable evidence, gates, repeatability, or dynamic orchestration. Stay in direct Codex when the task is small, taste-driven, or faster as one conversation. + +## PRD + +### Problem + +The current CWF engine is powerful but still feels tool-shaped. Users have to know whether to run YAML workflows, dynamic JS, safe writes, desktop handoff, or GitHub artifacts. + +Claude Dynamic Workflows feel stronger because the user can state an intent and the system creates the harness. The user reviews the plan, approves, and gets one final answer. + +CWF needs to keep that experience but with Codex-native boundaries: + +- Codex writes and judges; +- CWF orchestrates and records evidence; +- Codex-owned sandbox, approvals, subagents, threads, skills, plugins, and worktrees are reused instead of reimplemented. + +### Target Users + +- Codex users doing complex engineering work. +- Maintainers who need repeatable review, audit, migration, or release workflows. +- Skill authors who want reusable workflow templates. +- Public users comparing CWF to Claude Dynamic Workflows. + +### Goals + +- Make intent-to-workflow possible: user request to generated `workflow.js`. +- Keep preview and approval mandatory. +- Return results to the initiating Codex conversation by default. +- Make read-only worker visibility native where possible. +- Keep write workers behind `safePatch` or parent-capped `inherit-session`. +- Add built-in dynamic workflow modes for the common high-value cases. +- Save and reuse approved dynamic workflows. +- Preserve artifacts, reducer output, and run evidence as the source of truth. + +### Non-Goals + +- No exact Claude product parity. +- No unrestricted JavaScript runtime. +- No hidden writes. +- No direct JS filesystem, network, shell, package import, or target repo access. +- No non-Codex model routing in the public core. +- No hosted scheduler or managed-agent platform in this roadmap. +- No external production writes without a separate high-risk plan and explicit approval. + +## SPEC + +### Complete Runtime Flow + +```text +user asks for complex workflow + -> Codex decides CWF is appropriate + -> Codex generates workflow.js from intent + -> CWF validates AST and capability use + -> CWF renders preview and budget/write summary + -> user approves approve-dynamic + -> CWF child runtime executes through cwf APIs only + -> workers run through Codex-native adapters + -> safe writes go through safePatch or capped inherit-session + -> reducer produces result and artifacts + -> initiating Codex conversation receives summary + artifact links + -> user may save workflow as template/skill +``` + +### Complete Capability Surface + +Required `cwf` APIs: + +- `cwf.git.changedFiles` +- `cwf.git.diff` +- `cwf.agent.run` +- `cwf.map` +- `cwf.artifacts.write` +- `cwf.report.summarize` +- `cwf.write.safePatch` +- `cwf.verify.run` +- `cwf.classify.route` +- `cwf.tournament.run` +- `cwf.loop.until` +- `cwf.quarantine.read` +- `cwf.template.save` + +Required runtime controls: + +- source SHA binding; +- origin trust enum; +- AST policy gate; +- Node Permission Model child; +- no target repo read from child; +- max agents; +- max concurrency; +- wall-clock timeout; +- output byte limit; +- token usage recording where available; +- gate before dynamic execution; +- gate before writes; +- failure summary. + +### Result Return Contract + +Default: + +- result returns to the initiating Codex conversation when launched from Codex. + +Optional: + +- `--new-thread` creates a separate coordinator/result thread only when explicitly requested; +- worker app threads are visible only when app-server execution is available and preflight proves real execution; +- CLI-only users still get `cwf result RUN_ID`. + +### Write Contract + +Dynamic JS never writes directly. + +Allowed write routes: + +1. `safePatch` + - isolated writer target; + - proposed patch artifact; + - `allowed_paths`; + - `forbidden_paths`; + - drift check; + - `git apply --check --3way`; + - verification; + - rollback artifact. + +2. `inherit-session` + - generated-current-session origin only; + - approved script SHA only; + - never exceeds parent sandbox or approval policy; + - records runtime metadata; + - still bounded by task prompt and CWF artifacts. + +Forbidden: + +- direct Desktop app-thread writes without a stable Codex approval path; +- remote untrusted dynamic scripts with write permissions; +- external irreversible writes. + +### Quarantine Contract + +Quarantine is mandatory whenever a workflow reads untrusted public content, customer messages, third-party issues, Slack/Discord exports, web pages, or arbitrary uploaded files. + +Worker classes: + +1. **Reader workers** + - read untrusted content; + - run read-only; + - cannot call write, shell, external post, or high-permission APIs; + - output structured observations and evidence only. + +2. **Verifier workers** + - check reader outputs against rubric, source quality, duplication, or policy; + - run read-only; + - can reject weak or unsafe claims. + +3. **Actor workers** + - perform any proposed action; + - require gate, path policy, safePatch, or explicit external approval; + - never receive raw untrusted content unless needed and sanitized. + +Safety invariant: + +> The worker that reads untrusted content is not the worker that writes, posts, deletes, merges, deploys, or changes permissions. + +### Pattern Library + +CWF should treat these as first-class patterns for generated dynamic workflows: + +| Pattern | Use when | CWF implementation shape | +|---|---|---| +| Classify-and-act | Items need routing by type, severity, ownership, or next action. | Classifier worker produces labels; branch executes specific read-only or gated actions. | +| Fan-out-and-synthesize | Many independent files, claims, items, or hypotheses need separate context. | `cwf.map` spawns workers; reducer waits at a barrier and merges. | +| Adversarial verification | The main output needs skeptical checking. | Each proposal gets a verifier/challenger worker before final synthesis. | +| Generate-and-filter | Many ideas or candidate fixes need dedupe and rubric filtering. | Generator workers propose; filter workers score; reducer keeps survivors. | +| Tournament | Ranking/naming/design/solution selection benefits from comparison. | Agents compete on same task; judges run pairwise comparisons until top candidates remain. | +| Loop-until-done | The amount of work is unknown. | Workflow repeats until explicit stop condition: no new findings, no failing tests, no new logs, or budget cap. | +| Quarantine triage | Inputs are untrusted and action may be high privilege. | Reader workers are isolated/read-only; actor workers require gate and sanitized instructions. | +| Rule mining | Repeated corrections should become durable rules. | Mine sessions/reviews, cluster candidates, adversarially verify, then propose AGENTS/skill updates. | + +## Roadmap Phases + +### Phase A: Intent To Previewed `workflow.js` + +Purpose: + +Turn the user's request into a generated workflow harness and preview, without requiring the user to hand-write JS. + +Deliverables: + +- command or skill path that asks Codex to generate `workflow.js`; +- preview artifact that explains phases, agents, permissions, budgets, and stop rules in human language; +- validation that generated script passes AST and capability policy; +- no execution before approval. + +Acceptance: + +- [ ] A user request can produce a saved `workflow.js` artifact. + - Evidence: fixture or local run creates script plus preview. +- [ ] Generated script cannot run before `approve-dynamic`. + - Evidence: run pauses at gate. +- [ ] Invalid generated script fails before execution. + - Evidence: tests cover forbidden imports/process/fetch/shell. +- [ ] Existing local dynamic workflow smoke still passes. + - Evidence: `npm run check`, `bash scripts/smoke-cli.sh`, controlled dynamic real-smoke. + +### Phase B: Same-Conversation Result Return + +Purpose: + +Make the default UX feel like Codex did the work in this conversation, not like the user has to inspect CLI files. + +Deliverables: + +- skill wrapper or app integration that reads completed run result; +- concise human summary; +- artifact links; +- explicit fallback when host thread cannot be addressed. + +Acceptance: + +- [ ] A CWF run launched from Codex returns a summary in the same conversation. + - Evidence: local manual smoke with copied result or app-host-supported handoff. +- [ ] `--new-thread` remains explicit. + - Evidence: docs and tests do not default to new threads. +- [ ] CLI-only users still work. + - Evidence: `cwf result RUN_ID`. + +### Phase C: Worker Visibility + +Purpose: + +Make worker activity visible when Codex Desktop supports real thread execution. + +Deliverables: + +- read-only workers can use `codex-app-thread` when app-server preflight succeeds; +- failed execution preflight falls back with a clear reason; +- worker JSON records thread ids, turn ids, fallback, sandbox, approval policy. + +Acceptance: + +- [ ] `cwf desktop check` distinguishes schema availability from real execution. + - Evidence: probe thread returns fixed JSON. +- [ ] Read-only worker app threads appear in Desktop when available. + - Evidence: controlled live smoke with thread ids. +- [ ] SDK fallback is explicit when app-thread execution is unavailable. + - Evidence: status/result show fallback reason. + +### Phase D: Write-Capable Dynamic Workers + +Purpose: + +Let dynamic workflows safely propose or apply small scoped code changes. + +Deliverables: + +- `cwf.write.safePatch` dynamic API; +- path policy binding for generated scripts; +- safe fix loop template; +- verification command binding; +- failure cannot be reported as pass. + +Acceptance: + +- [ ] Dynamic safePatch creates `artifacts/proposed.patch`. + - Evidence: fixture run. +- [ ] Forbidden path patch is rejected before target changes. + - Evidence: test. +- [ ] Verification failure marks run failed. + - Evidence: test. +- [ ] Controlled real-smoke modifies only allowed paths. + - Evidence: target diff summary and verification output. + +### Phase E: Built-In Dynamic Modes + +Purpose: + +Make CWF useful without users designing workflows each time. + +Modes: + +- `deep-research`: source collection plan, independent source checks, synthesis. +- `repo-audit`: broad repo review with focused workers. +- `migration-plan`: inventory, risk map, staged migration proposal. +- `adversarial-review`: proposal worker plus challenger workers plus reducer. +- `safe-fix-loop`: find small issues, propose patch, verify, stop on conflict. +- `root-cause-investigation`: independent hypotheses from logs, files, tests, and data, followed by adversarial testing. +- `rule-mining`: mine repeated corrections from sessions or review comments, verify whether each rule would have prevented real mistakes, then propose durable rules. +- `tournament-selection`: generate/rank names, designs, plans, or candidates using pairwise comparison and rubric scoring. +- `triage-quarantine`: classify and dedupe untrusted queue items while isolating reader workers from high-permission actor workers. +- `eval-and-rubric`: evaluate prompts, skills, models, or generated outputs against a fixed rubric with independent graders. + +Acceptance: + +- [ ] Each mode has a template and plain-English preview. + - Evidence: template files and docs. +- [ ] Each mode has fixture tests. + - Evidence: test suite. +- [ ] At least two modes have controlled real-smoke evidence. + - Evidence: run ids and result summaries. +- [ ] Untrusted-input modes enforce quarantine. + - Evidence: tests show reader workers cannot perform gated writes or external actions. + +### Phase F: Save, Reuse, Package + +Purpose: + +Turn good dynamic workflows into reusable local assets. + +Deliverables: + +- save approved dynamic workflow as local template; +- trust metadata with source SHA and origin; +- registry integration for local templates; +- skill packaging guidance. + +Acceptance: + +- [ ] Saved workflow cannot silently change without SHA mismatch warning. + - Evidence: test. +- [ ] Saved workflow appears in local workflow discovery only after explicit enable. + - Evidence: registry test. +- [ ] Remote workflows require inspect/install/enable before run. + - Evidence: URL direct run remains invalid. + +### Phase G: Public Polish And Release + +Purpose: + +Make the complete-state UX understandable to public users. + +Deliverables: + +- README/README.zh-CN integration; +- workflow catalog updates; +- skill routing updates; +- release notes; +- CLI smoke expansion; +- public examples. + +Acceptance: + +- [ ] Public docs explain current, preview, and planned surfaces. + - Evidence: README and Chinese README. +- [ ] Smoke covers the stable command surface. + - Evidence: `bash scripts/smoke-cli.sh`. +- [ ] CI passes. + - Evidence: GitHub Actions success. + +## Staged Goal Prompts + +### Goal 1: Intent To Preview + +```text +/goal +Outcome: +Build Phase A of the CWF complete-state roadmap in this repository: given a user request, Codex can generate a preview-first workflow.js artifact for CWF, validate it, render a human-readable preview, and stop at approve-dynamic before execution. + +Allowed writes: +- src/dynamic-workflow.ts +- src/cli.ts +- src/workflow-suggestion.ts or a new focused generator module +- tests for dynamic workflow generation and validation +- fixtures/dynamic/ +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/JS_DYNAMIC_WORKFLOWS_PLAN.md only if wording must stay aligned + +Forbidden: +- Do not add unrestricted Node.js execution. +- Do not run generated scripts without preview and approval. +- Do not add non-Codex model routing. +- Do not add hosted queues, marketplace execution, production deploys, credentials, payments, database writes, or permissions changes. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- controlled dynamic real-smoke showing generated script preview, approval gate, successful read-only execution, and no target diff mutation +- Reasonix/v4Pro final review focused on overclaiming, sandbox escape, and approval bypass + +Constraints: +- Generated workflow.js must use only the allowed cwf API surface. +- Preview must show agents, permissions, budget, stop rules, and write intent. +- Failure must happen before execution for forbidden APIs. +- Existing YAML workflows and v1.10 safe writes must remain compatible. + +Iteration policy: +- Work in one vertical slice: generate -> preview -> approve gate -> existing dynamic execution. +- After every failing validation, fix the root cause and rerun the narrow test before broad tests. +- Keep user-facing text clear enough for non-CWF experts. + +Stop/Pause conditions: +- Stop complete when verification passes and Reasonix has no blocker/high findings. +- Pause for Ender if implementation requires changing public positioning, expanding write permissions, or adding a new external dependency. +- Stop as blocked after three repeated failures with the same root cause. +``` + +### Goal 2: Same-Conversation Result Return + +```text +/goal +Outcome: +Build Phase B of the CWF complete-state roadmap: a CWF run launched from Codex returns a concise result summary and artifact links to the initiating Codex conversation by default, while keeping --new-thread explicit. + +Allowed writes: +- skills/codex-workflows/SKILL.md +- src/cli.ts +- src/desktop-bridge.ts +- tests for handoff/result behavior +- docs/CWF_COMPLETE_STATE_PLAN.md +- README.md and README.zh-CN.md if command docs change + +Forbidden: +- Do not guess the current Codex thread from thread/list. +- Do not make Desktop required for CLI users. +- Do not default to creating a new Desktop thread. +- Do not change workflow execution semantics. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- manual same-conversation handoff smoke or documented app-host fallback +- Reasonix/v4Pro final review + +Constraints: +- CLI artifacts remain source of truth. +- App-server unavailable must produce clear fallback, not failure for completed CLI runs. +- Result summary must include run id, verdict, key findings, verification gaps, and artifact paths. + +Iteration policy: +- Start from existing `cwf desktop result --print` and skill behavior. +- Add tests before broadening UX. +- Keep new-thread behavior opt-in. + +Stop/Pause conditions: +- Stop complete when same-conversation result path is documented and verified. +- Pause if Codex host APIs cannot address the initiating thread safely. +``` + +### Goal 3: Worker Visibility + +```text +/goal +Outcome: +Build Phase C of the CWF complete-state roadmap: read-only CWF workers can use Codex Desktop-visible worker threads when app-server execution is actually available, and fall back explicitly when it is not. + +Allowed writes: +- src/adapters/worker-adapter.ts +- src/desktop-bridge.ts +- src/cli.ts +- tests/worker-adapter.test.ts +- tests/desktop-bridge.test.ts +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/WORKER_APP_THREADS_PLAN.md if behavior changes +- README.md and README.zh-CN.md if user commands change + +Forbidden: +- Do not require Codex Desktop for normal CLI workflows. +- Do not guess the current thread from thread/list. +- Do not create hidden worker threads without recording metadata. +- Do not allow Desktop app-thread writes in this phase. +- Do not mask app-thread execution failure as success. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- cwf desktop check +- controlled app-thread real-smoke when Codex Desktop app-server is available +- Reasonix/v4Pro final review + +Constraints: +- Execution preflight must prove a thread can run and return the expected probe response. +- Worker runtime metadata must include adapter, thread id, turn id, sandbox, approval policy, fallback status, and fallback reason. +- SDK fallback must remain clear and safe. + +Iteration policy: +- First harden fake app-server tests. +- Then verify local CLI behavior. +- Run live app-thread smoke only after deterministic tests pass. + +Stop/Pause conditions: +- Stop complete when read-only workers create visible threads in controlled smoke or clearly fall back when unavailable. +- Pause for Ender if Codex host APIs do not expose a reliable execution path. +- Stop as blocked after three repeated app-server failures with the same root cause. +``` + +### Goal 4: Write-Capable Dynamic Workers + +```text +/goal +Outcome: +Build Phase D of the CWF complete-state roadmap: dynamic workflows can request safe write work only through a guarded safePatch path or parent-capped inherit-session, with no direct JavaScript writes. + +Allowed writes: +- src/dynamic-workflow.ts +- src/safe-write.ts +- src/phase-engine.ts only if safe-write integration requires it +- tests/dynamic-workflow.test.ts +- tests/safe-write.test.ts +- tests/phase-engine.test.ts +- fixtures/dynamic/ +- fixtures/workflows/ +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/WRITE_WORKERS_PLAN.md if behavior changes + +Forbidden: +- Do not let dynamic JavaScript write files directly. +- Do not bypass approve-dynamic or approve-write gates. +- Do not allow patches outside allowed_paths. +- Do not touch credentials, deployments, databases, payments, permissions, or external messages. +- Do not report PASS after verification failure. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- fixture showing dynamic safePatch creates `artifacts/dynamic-proposed.patch` and `artifacts/dynamic-safe-patch.json` +- fixture showing forbidden path rejection leaves target unchanged +- fixture showing verification failure fails the run +- controlled real-smoke modifying only allowed paths +- Reasonix/v4Pro final review + +Constraints: +- safePatch must reuse v1.10 path policy, drift check, git apply --check --3way, verification, and rollback evidence. +- inherit-session must require generated-current-session origin, matching SHA, and known parent permission cap. +- All write results must appear in artifact manifest and final report. + +Iteration policy: +- Implement safePatch before expanding inherit-session behavior. +- Keep every write test narrow and target-diff checked. +- Treat any ambiguous write boundary as a stop condition. + +Stop/Pause conditions: +- Stop complete when write-capable dynamic workflows pass all safety tests and one controlled real-smoke. +- Pause for Ender if the implementation needs broader permissions than safePatch or parent-capped inherit-session. +- Stop as blocked after three repeated write-safety failures with the same root cause. +``` + +### Goal 5: Built-In Dynamic Modes And Save/Reuse + +```text +/goal +Outcome: +Build Phases E and F of the CWF complete-state roadmap: ship reusable dynamic workflow templates for high-value tasks and allow approved workflows to be saved/reused with trust metadata. + +Allowed writes: +- workflows/ or a dedicated dynamic templates directory +- src/workflow-registry.ts +- src/dynamic-workflow.ts +- tests for templates, registry, trust metadata, SHA mismatch, and no direct URL run +- docs/workflow-catalog.md +- README.md and README.zh-CN.md +- docs/CWF_COMPLETE_STATE_PLAN.md + +Forbidden: +- Do not execute remote workflows directly by URL. +- Do not enable write-capable templates by default. +- Do not bypass inspect/install/enable. +- Do not add non-Codex model routing or hosted marketplace behavior. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- fixture runs for every template +- controlled real-smoke for at least two templates +- Reasonix/v4Pro final review + +Constraints: +- Templates must declare capabilities and budgets. +- Saved workflows must bind source SHA and origin. +- Dynamic templates must still pass preview, approval, AST policy, and child runtime constraints. + +Iteration policy: +- Add one template at a time with tests. +- Do not add save/reuse until template execution is stable. +- Keep remote/public registry behavior inspect-first. + +Stop/Pause conditions: +- Stop complete when templates are discoverable, test-covered, and safe by default. +- Pause if save/reuse needs a trust model change beyond existing registry docs. +``` + +### Goal 6: Public Polish And Release + +```text +/goal +Outcome: +Build Phase G of the CWF complete-state roadmap: public docs, Chinese docs, workflow catalog, skill routing, release notes, and smoke coverage present CWF's complete-state UX clearly without overclaiming shipped capabilities. + +Allowed writes: +- README.md +- README.zh-CN.md +- RELEASE_NOTES.md +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/WHEN_TO_USE_CWF.md +- docs/workflow-catalog.md +- docs/claude-vs-codex-workflows.md +- skills/codex-workflows/SKILL.md +- scripts/smoke-cli.sh only if stable commands are added +- tests for docs/CLI smoke only if needed + +Forbidden: +- Do not change runtime semantics in this phase. +- Do not claim exact Claude Dynamic Workflows parity. +- Do not imply generated dynamic workflows, worker threads, safe writes, or GitHub posting are available beyond their verified availability label. +- Do not add non-Codex model routing. +- Do not add external writes or publishing automation. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- source audit for overclaiming phrases such as exact parity, automatic trigger, unrestricted JavaScript, or ungated writes +- Reasonix/v4Pro final review +- GitHub CI success after push + +Constraints: +- Public docs must separate current stable, implemented preview, and planned capabilities. +- Chinese README should be the default public entry if project convention keeps Chinese-first docs. +- Skill routing must say when not to use CWF. + +Iteration policy: +- Update one public surface at a time. +- After each docs surface, check whether it contradicts the complete-state plan. +- Keep release notes evidence-backed. + +Stop/Pause conditions: +- Stop complete when public docs and skill routing are aligned, local validation passes, and CI is green. +- Pause for Ender if product positioning changes or public release timing needs a decision. +- Stop as blocked after three repeated review findings about the same overclaim. +``` diff --git a/docs/JS_DYNAMIC_WORKFLOWS_PLAN.md b/docs/JS_DYNAMIC_WORKFLOWS_PLAN.md index 068f681..7a4a6f2 100644 --- a/docs/JS_DYNAMIC_WORKFLOWS_PLAN.md +++ b/docs/JS_DYNAMIC_WORKFLOWS_PLAN.md @@ -17,16 +17,16 @@ review_due: resolved 2026-06-06 # JS Dynamic Workflows Plan -Status: reviewed-plan. +Status: reviewed-plan; v1.11 MVP implemented in the current working tree. ## Alignment Snapshot - Building: a Claude-like JavaScript workflow harness for Codex Flow, where Codex can generate a task-specific `workflow.js`, show a preview, ask for approval, and then execute the script through safe CWF runtime APIs. - Not building: unrestricted `node workflow.js`, exact Claude product parity, hidden writes, remote untrusted workflow execution, marketplace lifecycle, daemon scheduling, non-Codex model routing, or production/database/credential/payment/permission writes. - Source of truth: v1.7 app-thread worker evidence, v1.10 safe write worker evidence, current CWF run-store/artifact/reducer contracts, Codex app-server and SDK capabilities, and Claude Dynamic Workflows public descriptions. -- Deliverables: PRD, SPEC, acceptance matrix, phase plan, and a copy-ready v1.11 `/goal` prompt. +- Deliverables: PRD, SPEC, acceptance matrix, phase plan, copy-ready v1.11 `/goal` prompt, and the implemented v1.11 MVP slice. - Phase scope: roadmap-level contract for v1.11-v1.16, with v1.11 as the first implementable version slice. -- Completeness: complete enough to start v1.11 without re-litigating whether JavaScript is the right dynamic-workflow surface. +- Completeness: complete enough to guide v1.11-v1.16; v1.11 MVP now covers local preview-first JavaScript harnesses, AST policy, permissioned child execution, parent CWF JSON-RPC APIs, read-only agent mutation detection, strict `inherit-session` gating, and dynamic artifacts. - Verification level: this planning artifact is docs-only; implementation must prove fixture, local, and controlled real-smoke behavior. - Review requirement: Reasonix/v4Pro rereview passed on 2026-06-06 after session-permission inheritance changes. - Verification: `git diff --check`, delivery-doc mechanical validation if available, `npm run check`, `bash scripts/smoke-cli.sh`, Reasonix review, and future implementation evidence listed below. diff --git a/docs/POST_V1_PLAN.md b/docs/POST_V1_PLAN.md index 5eb867b..d8f4310 100644 --- a/docs/POST_V1_PLAN.md +++ b/docs/POST_V1_PLAN.md @@ -19,6 +19,7 @@ Plain English: - v1.7 turns `codex-app-thread` into a live Desktop-visible worker-thread adapter. - v1.8 decides whether Managed-Agents-style platform scheduling is still needed after native worker threads. - v1.10 generalizes safe bounded writes with `write_policy.mode: patch`, isolated writer execution, proposed patch artifacts, path policy checks, `git apply --check --3way`, and verification artifacts. +- v1.11 adds preview-first JavaScript dynamic workflows through AST policy, explicit approval, Node Permission Model child execution, and parent CWF JSON-RPC APIs. - Later work can explore remote workflow sharing. Global rules: @@ -27,7 +28,7 @@ Global rules: - Do not make Codex Desktop required for CLI workflows. - Do not duplicate Codex subagent, sandbox, approval, skill, or plugin mechanisms. - Do not run generated workflow specs until they validate. -- Do not run generated JavaScript in the public core. +- Do not run JavaScript as unrestricted `node workflow.js`; dynamic JavaScript must pass preview, AST policy, approval, permissioned child-process execution, and parent CWF API mediation. - Do not ship write-capable workflows without gates and dry-run evidence. - Keep every new surface optional and gracefully degradable. diff --git a/docs/SPEC.md b/docs/SPEC.md index 3750c30..2fa0493 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -11,6 +11,11 @@ cwf workflows show cwf workflows validate [workflow-id-or-path] cwf run --target [--background] cwf run --target [--desktop-result] +cwf dynamic list +cwf dynamic show +cwf dynamic save --id +cwf dynamic generate --goal "" --target [--output ] +cwf dynamic run --target [--approve] cwf desktop check cwf desktop result [--thread ] [--new-thread] [--print] cwf github-pr [--format comment|review] [--post --repo --pr ] @@ -117,9 +122,14 @@ Path patterns in `allowed_paths` and `forbidden_paths` support the simple CWF gl ## Runtime Model -Codex Flow has two runtime modes: +Codex Flow has two workflow surfaces: -1. **CLI engine mode**: v1.0 stable behavior. The runner uses local workflow specs, a filesystem run store, and Codex SDK workers. This mode is reliable for CLI and CI-like usage, but worker activity is not guaranteed to appear as Codex App left-sidebar threads. +1. **Static YAML workflows**: v1.0 stable behavior. The runner uses local workflow specs, a filesystem run store, gates, Codex workers, and reducers. +2. **Dynamic JavaScript workflows**: v1.11 behavior. The runner accepts a local `workflow.js`, parses it with an AST policy, copies it into run artifacts, renders a non-skippable preview, waits at `approve-dynamic`, then executes the artifact copy in a Node Permission Model child process. The child process has no target repo filesystem permission, no network permission, and no child-process permission; all work goes through parent CWF JSON-RPC APIs. + +Codex Flow has two worker execution modes: + +1. **CLI engine mode**: stable filesystem-backed execution through Codex SDK workers. This mode is reliable for CLI and CI-like usage, but worker activity is not guaranteed to appear as Codex App left-sidebar threads. 2. **Native Codex runtime mode**: post-v1 behavior. The runner reuses Codex App Server threads, turns, review threads, subagents, sandbox, approvals, permissions profiles, and worktrees where available. These native capabilities are delivered incrementally: v1.2 adds explicit result return, v1.3 adds the worker adapter contract, v1.4 introduces gated write-capable workflows, and v1.7 turns `codex-app-thread` into the first live Desktop-visible worker-thread adapter. @@ -149,6 +159,8 @@ Supported phase kinds: - `codex-write`: one gated write worker. The writer runs in an isolated target and CWF applies the extracted patch only after policy checks and `git apply --check --3way`. `direct-docs` is the docs-only policy preset for `doc-refresh`; `patch` is the explicit mode for non-doc safe writes. - `reducer`: merge worker envelopes and artifact evidence into final result. +Dynamic JavaScript runs use a synthetic `dynamic-js` wrapper with `collect -> dynamic-preview -> approve-dynamic -> dynamic-execute`. The script must export exactly one async default function and can call only the exposed CWF runtime object: `cwf.git`, `cwf.agent.run`, `cwf.map`, `cwf.artifacts`, and `cwf.report`. The AST policy rejects imports, dynamic import, `require`, `eval`, `Function`, `globalThis`, `process`, `fetch`, constructor/prototype escapes, direct shell strings, and calls not rooted in `cwf` or approved builtins. `cwf.agent.run` defaults to `read-only`; read-only workers fail if target diff changes. `safePatch` is recognized but not executable until the dynamic run is attached to a v1.10 `write_policy`. `inherit-session` requires strict origin `generated-current-session`, matching SHA-256, and a known write-capable parent permission cap; untrusted origins and hash mismatches fail closed. + ### Agent vs Thread In Codex Flow vocabulary: @@ -210,6 +222,13 @@ Each run writes: tests.json safety.json artifacts/ + workflow.js + workflow.sha256 + dynamic-preview.md + dynamic-capabilities.json + dynamic-budget.json + dynamic-events.jsonl + dynamic-final.json reduced-result.json manifest.json result.md diff --git a/docs/WHEN_TO_USE_CWF.md b/docs/WHEN_TO_USE_CWF.md new file mode 100644 index 0000000..8899bce --- /dev/null +++ b/docs/WHEN_TO_USE_CWF.md @@ -0,0 +1,476 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF usage decision and adoption plan +coverage: Complete for deciding where Codex Flow should be used, which workflow surface to choose, and what follow-up docs/product work should make that decision easier. +not_complete_for: Runtime implementation, exact Claude Dynamic Workflows parity, hosted scheduling, marketplace execution, non-Codex model routing, production deploy automation, or broad autonomous writes. +verification_level: docs-only +real_smoke_status: not_required +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Approved; no blocker/high/real medium issues after integrating trq212/MinLi failure modes, pattern library, quarantine, and use-case guidance. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# When To Use Codex Flow + +## Alignment Snapshot + +- **Building**: a decision plan and public-facing usage guide for where CWF fits in real Codex work. +- **Not building**: new runtime features, a Claude clone, hosted queues, marketplace execution, model routing, or broader autonomous write behavior. +- **Source of truth**: `README.md`, `docs/PRD.md`, `docs/SPEC.md`, `docs/workflow-catalog.md`, `docs/claude-vs-codex-workflows.md`, `docs/JS_DYNAMIC_WORKFLOWS_PLAN.md`, `docs/POST_V1_PLAN.md`, and the current v1.10/v1.11 implementation evidence. +- **Deliverables**: PRD, SPEC, usage matrix, acceptance criteria, phase plan, and a copy-ready goal prompt for productizing this usage guide. +- **Phase scope**: roadmap-level usage and adoption contract, not one implementation slice. +- **Completeness**: complete for deciding when CWF should be used across stable, preview, and planned surfaces; not complete for implementing new CWF runtime features. +- **Verification level**: docs-only for this file. Runtime claims below are tied to existing local/CI evidence where available. +- **Review requirement**: Reasonix/v4Pro review required before this plan is treated as final; initial findings were resolved by adding availability and evidence labels. +- **Verification**: `git diff --check`, delivery-doc validator, docs/readability audit, Reasonix review, and optional future `npm run check` if this guide is wired into README/package docs. +- **Open decisions**: none blocking. The selected framing is "CWF is for repeatable, inspectable multi-step Codex work, not a default wrapper around every Codex action." + +Capability sentence: + +This planning pass helps Codex Flow users decide when to use CWF by producing a usage decision contract, workflow-selection guide, and adoption roadmap, using the current CWF docs/runtime evidence, while avoiding new runtime scope, Claude parity claims, and unsafe write expansion. + +## Availability Labels + +This guide uses these labels so users do not confuse today's supported surface with future product work: + +- **Stable public core**: available in the current CLI/package surface and protected by CI-safe smoke. +- **Implemented preview**: implemented and tested on the current v1.11 PR branch, but not treated as fully productized public UX until the PR is merged/released. +- **Planned**: documented direction only; do not use as a shipped command or safety guarantee. + +Current evidence as of 2026-06-06 on branch `codex/v1.11-js-dynamic-runtime`: + +| Surface | Availability | Evidence | +|---|---|---| +| Static read-only workflows | Stable public core | `bash scripts/smoke-cli.sh` validates registry/list/show/validate without live workers. | +| Background/status/watch/result/list/show | Stable public core | CLI smoke covers command surface; tests cover formatting and run-store behavior. | +| GitHub PR artifact generation | Stable public core | CLI smoke generates local `github-pr-comment.md` and `github-pr-review.json` without posting. | +| GitHub PR posting with `--post` | Explicit external action, not default smoke surface | Command requires explicit `--post --repo --pr`; CI smoke does not post. Treat each real post as requiring Ender GO and real-smoke evidence. | +| Desktop result handoff | Stable public core with fallback | README/SPEC document fallback; app-server-dependent posting remains explicit. | +| `doc-refresh` and patch-mode safe write | Stable public core for gated bounded writes | `npm run check` covers safe-write, phase-engine, schema, and run-store tests. Custom patch-mode YAML workflows must still prove their own allowed paths, forbidden paths, and verification commands. | +| Dynamic JavaScript `cwf dynamic run` | Implemented preview | Local verification covers preview, approval, child runtime, template execution, and dynamic safePatch smoke. | +| Generated dynamic workflow authoring from a user request | Implemented preview | `cwf dynamic generate --goal "" --target ` writes a local script, preview metadata, and stops at `approve-dynamic`; it is not an automatic trigger. | +| Saved dynamic workflow templates | Implemented preview | `cwf dynamic save --id ` writes a local SHA-bound trust record; remote URL run remains forbidden. | +| Claude-like native `/workflows` UI | Planned / out of current scope | CWF currently uses CLI status/watch/artifacts and Codex handoff, not a native panel. | + +## Plain-Language Result + +CWF is useful when one Codex conversation is no longer the cleanest place to hold all the work. + +Use CWF when the task needs at least one of these: + +- multiple independent review perspectives; +- durable progress and artifacts outside chat; +- a gate before risky or write-capable work; +- a repeatable command that can be rerun on another repo or diff; +- a reducer that merges worker outputs into one accountable result; +- a dynamic harness for a large task-specific investigation; +- adversarial verification against a rubric; +- a tournament or pairwise comparison across many candidates; +- quarantine between untrusted input readers and high-permission actors. + +Do not use CWF just because the task is "important." Use it when workflow structure, evidence, repeatability, or parallelism earns its overhead. + +## PRD + +### Problem + +Codex users now have several ways to work: + +- ask Codex directly in the current conversation; +- call one-off skills such as `check`, `hunt`, `design`, `superx`, or `delivery-planner`; +- run CWF static workflows; +- run CWF safe write workflows; +- run CWF dynamic JavaScript workflows. + +Without a decision guide, users may overuse CWF for trivial tasks or underuse it for work that needs separate worker contexts, durable evidence, and gates. + +The product problem is not "make every task a workflow." The product problem is: + +> Make it obvious when CWF is the right coordination layer, and make the wrong cases easy to reject. + +### Target Users + +- Codex users doing engineering review, release readiness, docs maintenance, research cross-checking, or controlled implementation slices. +- Maintainers who want a public package with clear boundaries and fewer overclaims. +- Skill authors who want to wrap CWF safely from Codex conversations. +- Advanced users comparing CWF with Claude Dynamic Workflows. + +### Goals + +- Define where CWF creates real value. +- Define where direct Codex or another skill is better. +- Map common tasks to the right CWF workflow surface. +- Keep write-capable paths explicitly gated. +- Keep dynamic JavaScript workflows preview-first and permission-scoped. +- Preserve "same-conversation result return" as the default Codex UX. +- Provide acceptance criteria for future docs/runtime changes that claim better CWF ergonomics. + +### Non-Goals + +- Do not auto-trigger CWF on vague keywords. +- Do not turn CWF into a replacement for Codex's own conversation, subagent, sandbox, approval, or skill systems. +- Do not introduce non-Codex model routing. +- Do not treat CWF as a general background job platform. +- Do not use CWF for arbitrary shell, network, deploy, database, credential, payment, or permission writes. +- Do not run generated JavaScript without preview, approval, AST policy, and permissioned child execution. +- Do not claim exact Claude Dynamic Workflows parity. + +### User Stories + +1. As a user, I can look at my task and decide whether CWF is worth the overhead. +2. As a user, I can pick `diff-review`, `repo-audit`, `implementation-plan`, `research-crosscheck`, `release-review`, `doc-refresh`, patch-mode write, or `dynamic-js` by task shape. +3. As a user, I can tell when to stay in the current Codex conversation instead. +4. As a cautious user, I can see which CWF modes are read-only, gated write, or inherited permission. +5. As a maintainer, I can reject future feature requests that duplicate Codex-native capabilities without adding workflow value. +6. As a skill author, I can wrap CWF without hiding approvals, artifacts, or failure states. + +## The Decision Rule + +Use CWF when the answer to at least two of these questions is yes: + +| Question | If yes, CWF is likely useful | +|---|---| +| Do I need multiple independent angles? | Use read-only worker workflows or dynamic fan-out. | +| Do I need durable evidence outside chat? | Use CWF run artifacts and reducer output. | +| Do I need to pause before a risky phase? | Use gated workflows. | +| Do I need to repeat this on future diffs/repos? | Use bundled YAML or saved workflow specs. | +| Do I need progress visibility during a long run? | Use `--background`, `status`, `watch`, and artifacts. | +| Do I need controlled writes with rollback evidence? | Use `doc-refresh` or patch-mode safe write, not direct dynamic JS writes. | +| Do I need task-specific orchestration logic? | Use `dynamic-js` after preview and approval. | + +If only one answer is yes, direct Codex or a narrower skill is usually better. + +## When To Use CWF + +| Situation | Use CWF? | Best surface | Why | +|---|---:|---|---| +| Code diff needs correctness/tests/safety review | Yes | `diff-review` | Parallel perspectives produce cleaner findings than one pass. | +| Repo structure or maintainability changed | Yes | `repo-audit` | Broader worker roles catch hygiene and release-risk gaps. | +| PRD/SPEC/plan needs pressure test | Yes | `implementation-plan` | Focuses on scope, sequencing, verification, and risk. | +| Research/doc claims need source-fidelity review | Yes | `research-crosscheck` | Good for catching unsupported claims visible in diff. | +| Release is close and needs ship-readiness audit | Yes | `release-review` | Checks rollback, regression, release notes, and rollout gaps. | +| Docs need bounded updates after preview | Yes | `doc-refresh` | Gated write path with preview, patch, verification, rollback. | +| Code implementation can be expressed as bounded patch | Sometimes | patch-mode write workflow | Only with `write_policy`, gate, allowed paths, verification. | +| Large task needs task-specific fan-out/merge logic | Yes, on v1.11 preview branch | `dynamic-js` | Externalizes orchestration into approved JS harness; generated previews and local saved templates are available as implemented preview. | +| Flaky test needs several competing theories | Planned | future `root-cause-investigation` | Separate hypothesis workers prevent one theory from dominating too early. | +| Repeated Codex corrections should become rules | Planned | future `rule-mining` | Mine sessions/reviews, cluster rules, adversarially verify before updating AGENTS/skills. | +| Many candidates need qualitative ranking | Planned/sometimes | future `tournament-selection` | Pairwise comparison is more reliable than one huge ranking prompt. | +| Large backlog or public-input triage | Planned/sometimes | future `triage-quarantine` | Reader workers stay read-only; actor workers need gate/approval. | +| Prompt/skill/model output needs rubric eval | Planned | future `eval-and-rubric` | Independent graders and comparison workers reduce self-preferential bias. | +| One small bug in one file | Usually no | direct Codex | CWF overhead is not earned. | +| UI taste, copy, naming, or visual direction | Usually no | MiMo/Reasonix/design skill, then Codex | CWF is not a taste engine. | +| Live web/X research | Usually no | `superx`, `read`, browser | CWF reviews tracked artifacts; it should not replace research tools. | +| Production deploy, DB migration, credentials, payments | No by default | direct G3 plan + approvals | CWF public core does not own irreversible external writes. | +| Need Claude-like background mega-run | Maybe, preview only | `dynamic-js`, conservatively | Use only with budgets, gates, and clear artifact evidence; generated workflow UX is still planned. | + +## Workflow Selection Matrix + +| Need | Command shape | Write risk | Verification level | +|---|---|---:|---| +| Validate workflow before spending tokens | `cwf validate WORKFLOW` | none | local | +| Review current git diff | `cwf run diff-review --target REPO` | read-only | local / real-smoke | +| Audit repo health and release risk | `cwf run repo-audit --target REPO` | read-only | local / real-smoke | +| Review planning docs or implementation plan | `cwf run implementation-plan --target REPO` | read-only | local | +| Cross-check factual docs | `cwf run research-crosscheck --target REPO` | read-only | local | +| Release readiness | `cwf run release-review --target REPO` | read-only | local / CI | +| Documentation write | `cwf run doc-refresh --target REPO` then approve | gated write | local | +| Bounded implementation patch | custom YAML with `write_policy.mode: patch` | gated write | per-workflow local + narrow tests | +| Task-specific orchestration | `cwf dynamic generate --goal GOAL --target REPO` or `cwf dynamic run WORKFLOW_JS_OR_ID --target REPO` then approve | read-only by default; `safePatch` only with explicit write policy | implemented-preview local / controlled real-smoke | +| Return result to Codex conversation | `cwf desktop result RUN_ID --print` or skill wrapper | none | local | +| PR artifact generation | `cwf github-pr RUN_ID --format comment|review` | local artifact only | local | +| GitHub posting | `cwf github-pr RUN_ID --post --repo OWNER/REPO --pr NUMBER` | explicit external write, not CI-smoked | Ender GO + per-PR real-smoke | + +## SPEC + +### Product Boundary + +CWF owns: + +- workflow specs and dynamic harness metadata; +- local run state; +- gates and gate decisions; +- worker result envelopes; +- reducer output; +- artifact manifests; +- workflow discovery; +- CLI status/watch/result; +- optional handoff artifacts to Codex Desktop/GitHub. + +Current-vs-planned split: + +| CWF-owned surface | Availability | +|---|---| +| workflow specs, registry, validation, run state, events, reducer output, status/result/watch | Stable public core | +| gated write artifacts, patch checks, rollback and verification records | Stable public core for bounded patch-mode | +| GitHub PR artifact generation | Stable public core; posting requires explicit flags | +| dynamic JS preview, approval, child runtime, CWF JSON-RPC APIs | Implemented preview on v1.11 branch | +| generated dynamic workflow authoring and saved dynamic templates | Implemented preview | +| native Claude-like workflow panel | Planned / not current CWF-owned UI | + +Codex owns: + +- model execution; +- conversation context; +- subagents/threads where available; +- sandbox and approval controls; +- worktrees; +- tools and skills; +- final engineering judgment in the initiating conversation. + +CWF should not duplicate Codex-native capabilities unless the duplication is only a thin adapter over saved run evidence. + +### Usage Modes + +#### 1. Direct Conversation Mode + +Use direct Codex when the task is small, local, and does not need durable orchestration. + +Examples: + +- single-file bug fix; +- quick explanation; +- one narrow refactor; +- local command output; +- UI/copy iteration where taste is the main question. + +#### 2. Read-Only Review Mode + +Use CWF read-only workflows when multiple independent review perspectives are useful and the target diff should not change. + +Safety invariant: + +- target repo diff must not change because of CWF. + +Evidence: + +- worker JSON; +- reduced result; +- result markdown; +- artifact manifest; +- unchanged target diff. + +#### 3. Gated Write Mode + +Use CWF gated writes only when the write boundary is clear before execution. + +Safety invariant: + +- no write phase without a prior gate; +- no patch outside `allowed_paths`; +- forbidden paths stop the run; +- patch conflicts stop before target changes; +- verification failure cannot be reported as pass. + +#### 4. Dynamic Harness Mode + +Use dynamic JavaScript when a static workflow is too rigid and the task benefits from task-specific orchestration. + +Safety invariant: + +- script is copied and hashed; +- preview and approval are mandatory; +- script runs only through CWF APIs; +- no unrestricted Node.js target access; +- read-only agents fail if target diff changes; +- `inherit-session` never exceeds the parent Codex permission cap. + +#### 5. Quarantine Mode + +Use quarantine mode when a workflow reads untrusted public content and may later suggest an action. + +Examples: + +- public issue triage; +- support queue classification; +- Slack/Discord incident mining; +- web/X/source collection for research; +- resume or ticket ranking from uploaded files. + +Safety invariant: + +- reader workers that ingest untrusted content stay read-only; +- verifier workers check evidence, duplication, and policy; +- actor workers perform any write/post/escalation only after gate and sanitized instructions. + +#### 6. Tournament And Rubric Mode + +Use tournament/rubric workflows when the task is qualitative but still judgeable. + +Examples: + +- naming; +- design direction comparison; +- solution approach selection; +- candidate ranking; +- prompt/skill evaluation. + +Safety invariant: + +- the rubric must be written before judging starts; +- generated candidates and judges should be separate workers; +- final output must preserve why winners beat alternatives, not just list the winner. + +### Error And Fallback Behavior + +- If app-thread execution is unavailable, CWF can fall back to SDK workers or handoff artifacts, but must say so. +- If a dynamic script asks for forbidden APIs, validation fails before execution. +- If a write workflow lacks a gate or policy, validation fails. +- If GitHub posting fails, local PR artifacts remain the durable output. +- If a worker returns malformed JSON, raw fallback must be visible in status/result. +- If evidence is only fixture/dry-run, the final result must not claim real-smoke completion. + +## Acceptance Criteria + +- [ ] A user can decide whether CWF is appropriate from a task description. + - Evidence: docs-only review confirms the decision rule and "when not to use" cases are explicit; future README wiring should add a link to this guide. + +- [ ] Every bundled workflow has a plain-English use case. + - Evidence: `docs/workflow-catalog.md` plus this guide cover `diff-review`, `repo-audit`, `implementation-plan`, `research-crosscheck`, `release-review`, `doc-refresh`, patch-mode write, and preview `dynamic-js`. + +- [ ] The guide separates direct Codex, static CWF, gated write CWF, and dynamic JS CWF. + - Evidence: SPEC usage modes define each separately; dynamic JS is labelled implemented preview, not fully productized automatic workflow UX. + +- [ ] The guide explains why dynamic workflows exist, not just how to run them. + - Evidence: use cases include adversarial verification, goal-drift prevention, tournament comparison, and quarantine for untrusted input. + +- [ ] The guide does not imply CWF can safely do unrestricted writes. + - Evidence: non-goals and safety invariants forbid ungated, direct, external, and production writes; write guidance points to gates, `write_policy`, patch checks, verification, and rollback. + +- [ ] The guide does not claim exact Claude Dynamic Workflows parity. + - Evidence: non-goals, availability labels, and `docs/claude-vs-codex-workflows.md` keep exact parity out of scope. + +- [ ] Future README/skill docs can link to one decision surface. + - Evidence: follow-up phase plan includes a docs integration slice with README, Chinese README, workflow catalog, and skill docs. + +- [ ] G2/G3 planning quality is reviewed before final status. + - Evidence: Reasonix/v4Pro review status in frontmatter records findings and resolution; final status requires no unresolved blocker/high findings. + +## Phase Plan + +### Phase 1: Canonical Usage Guide + +Status: this document. + +Deliverables: + +- `docs/WHEN_TO_USE_CWF.md` +- decision rule; +- workflow selection matrix; +- safety boundaries; +- acceptance criteria. + +Verification: + +- `git diff --check` +- Reasonix/v4Pro review + +Stop condition: + +- blocker/high review finding remains unresolved. + +### Phase 2: Wire Into Public Docs + +Deliverables: + +- README link near "What It Does" or "Usage"; +- `README.zh-CN.md` matching link and summary; +- `docs/workflow-catalog.md` link to this guide. + +Verification: + +- `npm run check` +- `bash scripts/smoke-cli.sh` +- source audit that old wording does not imply CWF is for every task. + +Stop condition: + +- README starts claiming Claude parity, automatic trigger, or broad writes. + +### Phase 3: Codex Skill Routing Copy + +Deliverables: + +- update `skills/codex-workflows/SKILL.md` with "use CWF when..." trigger boundaries; +- add anti-triggers for trivial fixes, UI/copy taste, live research, and external writes; +- keep same-conversation result return as default. + +Verification: + +- `git diff --check` +- manual trigger-case review against this guide +- `bash scripts/smoke-cli.sh` if package contents change. + +Stop condition: + +- the skill would route too many ordinary Codex tasks into CWF. + +### Phase 4: Dynamic Workflow Productization + +Deliverables: + +- generated `workflow.js` preview flow; +- built-in dynamic templates for repo audit, adversarial review, migration planning, safe fix loop, root-cause investigation, rule mining, tournament selection, triage quarantine, and rubric evaluation; +- stronger docs for budget and cost expectations. + +Verification: + +- fixture dynamic runs; +- controlled real-smoke dynamic run; +- `npm run check`; +- `bash scripts/smoke-cli.sh`; +- Reasonix final review. + +Stop condition: + +- generated scripts can execute without preview/approval or exceed permission caps. + +## Future Goal Prompt + +Use this if the next step is to productize this guide into README/skill routing. + +```text +/goal +Outcome: +Productize the CWF usage decision guide in /Users/sunny/Work/CODEX/codex-workflows so public users and Codex skills can tell when to use CWF, which workflow surface to choose, and when to stay in direct Codex. + +Allowed writes: +- README.md +- README.zh-CN.md +- docs/workflow-catalog.md +- docs/WHEN_TO_USE_CWF.md +- skills/codex-workflows/SKILL.md +- tests/docs or lightweight validation files only if needed + +Forbidden: +- Do not change runtime behavior, workflow execution code, package publishing config, GitHub Actions behavior, credentials, external posting, or generated artifacts outside the repo. +- Do not claim exact Claude Dynamic Workflows parity. +- Do not imply CWF should handle trivial tasks, UI/copy taste work, live research, unrestricted writes, deploys, databases, credentials, payments, or permissions. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- Reasonix/v4Pro final review focused on overclaiming, trigger boundaries, and public-doc clarity + +Constraints: +- Keep CWF framed as a Codex-native workflow/evidence layer, not a separate agent platform. +- Preserve same-conversation result return as the default Codex UX. +- Preserve safe write boundaries: gated writes, patch policy, approval, verification, rollback. +- Keep dynamic JS framed as preview-first and permission-scoped. + +Iteration policy: +- First inspect current README, Chinese README, workflow catalog, and skill docs. +- Make the smallest docs edits that create one clear decision path. +- After each review finding, fix blocker/high issues before expanding copy. +- Do not add new runtime scope during this goal. + +Stop/Pause conditions: +- Stop complete when docs are updated, validation passes, and Reasonix has no blocker/high findings. +- Pause and ask Ender if the docs would require changing product positioning, adding new runtime behavior, or approving external writes. +- Stop as blocked after three repeated validation/review failures with the same root cause. +``` diff --git a/docs/WORKER_APP_THREADS_PLAN.md b/docs/WORKER_APP_THREADS_PLAN.md index 6059a14..8030793 100644 --- a/docs/WORKER_APP_THREADS_PLAN.md +++ b/docs/WORKER_APP_THREADS_PLAN.md @@ -171,11 +171,22 @@ If the probe cannot complete setup before a turn exists, keep that separate as ` Timeout tuning: - `options.timeoutMs` is the overall worker deadline; app-thread setup, turn start, and result reading must not exceed it cumulatively. +- `CWF_APP_THREAD_TRANSPORT` selects how app-thread workers reach Codex app-server. The default is `stdio`, which starts a fresh local `codex app-server` process per probe/worker path and avoids stale persistent-daemon quota routing. Set `CWF_APP_THREAD_TRANSPORT=daemon` only when intentionally testing the long-lived `~/.codex/app-server-control` socket. +- `CWF_APP_THREAD_MODEL` optionally pins the Codex Desktop model used by both the app-thread execution probe and worker turns. By default, app-thread workers use `gpt-5.3-codex-spark` so lightweight worker turns stay on the Codex quota lane instead of the host default premium lane. Set `CWF_APP_THREAD_MODEL=host-default` to opt back into the host default model. +- `CWF_APP_THREAD_MODEL_PROVIDER` optionally pins the thread-start model provider, for example `openai`. +- `CWF_APP_THREAD_REASONING_EFFORT` optionally pins the worker/probe turn effort, for example `low`, `medium`, `high`, or `xhigh` when supported by the selected Desktop model. The default app-thread effort is `low`. - `CWF_APP_THREAD_WORKER_REQUEST_TIMEOUT_MS` caps individual app-server setup/start requests within that overall deadline. - `CWF_APP_THREAD_RESULT_TIMEOUT_MS` caps worker result polling within the remaining overall deadline. - `CWF_APP_THREAD_CLOSE_TIMEOUT_MS` caps best-effort transport close; close failures must not hide an already collected worker result. +- `CWF_APP_THREAD_DIAGNOSTICS_MAX_BYTES` caps the number of session-log bytes read for fallback diagnostics, with a hard maximum of 1 MiB even if the env value is larger. The adapter reads only Codex session `.jsonl` files under `CODEX_HOME/sessions` and otherwise records only the basename. - Invalid timeout env values fall back to defaults instead of producing `NaN` or immediate accidental timeouts. +Execution diagnostics: + +- When `thread/read` exposes a Codex session path but no assistant response, the app-thread adapter records best-effort diagnostics from the session log: thread status, model, effort, coarse quota availability, `last_agent_message`, and the session log filename. +- A Desktop-visible thread can be created even when the model channel cannot respond. If diagnostics show `quota_unavailable=true`, the adapter must report `app-thread-execution-unavailable` and use `fallback_worker_adapter` when configured. Public worker output must not include raw account balance values. +- If GPT-5.5/GPT-5.4 app-thread runs appear to use a premium quota lane while `codex exec` or fresh stdio app-server runs use the normal Codex quota lane, treat it as persistent-daemon quota routing drift, not proof that the user account lacks Codex quota. Restart the daemon or use the default stdio transport before falling back. + ### Safety Invariants - `thread/list` must never choose the initiating/current conversation. diff --git a/docs/cwf-complete-state/ACCEPTANCE.md b/docs/cwf-complete-state/ACCEPTANCE.md new file mode 100644 index 0000000..37f6ac0 --- /dev/null +++ b/docs/cwf-complete-state/ACCEPTANCE.md @@ -0,0 +1,89 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete-state acceptance matrix +coverage: Evidence-bound acceptance criteria for each complete-state phase. +not_complete_for: Runtime implementation, exact Claude parity, hosted scheduling, unrestricted JS, non-Codex routing, production deploys, database/credential/payment/permission writes. +verification_level: docs-only +real_smoke_status: requires_approval +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Derived from reviewed complete-state and usage plans. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# Acceptance Matrix: CWF Complete-State + +## Phase A: Intent To Previewed `workflow.js` + +- [ ] A user request can produce a saved `workflow.js` artifact. + - Test: fixture or local run creates script plus preview artifact. +- [ ] Generated script cannot run before `approve-dynamic`. + - Test: dynamic run pauses at gate. +- [ ] Invalid generated script fails before execution. + - Test: forbidden imports/process/fetch/shell cases fail validation. +- [ ] Existing dynamic workflow smoke still passes. + - Test: `npm run check`, `bash scripts/smoke-cli.sh`, controlled dynamic real-smoke. + +## Phase B: Same-Conversation Result Return + +- [ ] A CWF run launched from Codex returns a concise summary in the initiating conversation. + - Manual evidence: local skill-wrapper smoke or documented app-host fallback. +- [ ] `--new-thread` remains explicit. + - Test: docs and tests show no default new-thread behavior. +- [ ] CLI-only users still work. + - Test: `cwf result RUN_ID`. + +## Phase C: Worker Visibility + +- [ ] `cwf desktop check` distinguishes schema availability from real execution. + - Test: probe thread returns fixed JSON. +- [ ] Read-only worker app threads appear in Desktop when available. + - Manual evidence: controlled live smoke with thread ids and turn ids. +- [ ] SDK fallback is explicit when app-thread execution is unavailable. + - Test: status/result show fallback reason. + +## Phase D: Write-Capable Dynamic Workers + +- [ ] Dynamic `safePatch` creates `artifacts/dynamic-proposed.patch` and `artifacts/dynamic-safe-patch.json`. + - Test: fixture run and `tests/dynamic-workflow.test.ts`. +- [ ] Forbidden path patch is rejected before target changes. + - Test: forbidden-path fixture leaves target unchanged. +- [ ] Verification failure marks the run failed. + - Test: failing verification fixture cannot return PASS. +- [ ] Controlled real-smoke modifies only allowed paths. + - Manual evidence: target diff summary and verification output after Ender GO. + +## Phase E: Built-In Dynamic Modes + +- [ ] Each mode has a template and plain-English preview. + - Test: template files and preview snapshots. +- [ ] Each mode has fixture coverage. + - Test: focused template tests. +- [ ] At least two modes have controlled real-smoke evidence. + - Manual evidence: run ids and result summaries. +- [ ] Untrusted-input modes enforce quarantine. + - Test: reader workers cannot perform gated writes or external actions. + +## Phase F: Save, Reuse, Package + +- [ ] Saved workflow cannot silently change without SHA mismatch warning. + - Test: trust metadata test. +- [ ] Saved workflow appears in local discovery only after explicit enable. + - Test: registry test. +- [ ] Remote workflows require inspect/install/enable before run. + - Test: direct URL run remains invalid. + +## Phase G: Public Polish And Release + +- [ ] Public docs explain current, preview, and planned surfaces. + - Test: source audit over README, Chinese README, workflow catalog, and complete-state docs. +- [ ] Public docs do not claim exact Claude parity. + - Test: source audit for exact parity / automatic trigger / unrestricted JS / ungated writes. +- [ ] CLI smoke covers the stable command surface. + - Test: `bash scripts/smoke-cli.sh`. +- [ ] CI passes after push. + - Manual evidence: GitHub Actions success. diff --git a/docs/cwf-complete-state/CURRENT_VS_COMPLETE.md b/docs/cwf-complete-state/CURRENT_VS_COMPLETE.md new file mode 100644 index 0000000..ea2752c --- /dev/null +++ b/docs/cwf-complete-state/CURRENT_VS_COMPLETE.md @@ -0,0 +1,57 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF current-vs-complete gap +coverage: Self-contained current state and remaining gap summary for CWF complete-state goal execution. +not_complete_for: Runtime implementation, exact Claude parity, hosted scheduling, unrestricted JS, non-Codex routing, production deploys, database/credential/payment/permission writes. +verification_level: docs-only +real_smoke_status: not_required +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-complete-state-delivery-pack +review_notes: Added to resolve Reasonix high finding that the delivery pack lacked current-vs-target context; rereview approved with no unresolved blocker/high/real medium findings. +review_owner: Codex +review_due: 2026-06-06 +--- + +# Current vs Complete + +Use this before starting any CWF complete-state goal. It prevents future goal-mode runs from rebuilding already-completed pieces. + +## Snapshot + +| Layer | Current state | Complete state | Next action | +|---|---|---|---| +| Static workflows | Stable CLI workflows exist. | Keep as reliable repeatable base. | No rebuild needed. | +| Safe writes | v1.10 safe write workers exist for gated bounded patch flow; dynamic `cwf.safePatch.apply` now reuses the same parent-applied policy/verification path. | Dynamic workflows can call the same guarded safePatch path. | Keep expanding only through explicit write policies and focused fixtures. | +| Dynamic JS runtime | v1.11 preview branch supports local `workflow.js` preview, approval gate, AST policy, child runtime, CWF APIs, and `cwf dynamic generate`. | Codex can generate the script from user intent and run it through the same guarded path. | Productize after local/CI smoke and review. | +| Same-conversation return | Skill wrapper/manual result handoff is the intended default. | Runs launched from Codex reliably return concise result summaries to the initiating conversation. | Phase B productizes this path. | +| Worker visibility | App-thread worker path exists with capability/probe constraints. | Read-only workers can be Desktop-visible threads when execution preflight succeeds. | Phase C hardens and documents this as a user-facing surface. | +| Write worker visibility | Safe writes run through isolated patch application, not Desktop app-thread writes. | Write workers remain safePatch/inherit-session controlled; no hidden Desktop direct writes. | Keep Desktop app-thread writes refused until Codex exposes stable approval/write support. | +| Built-in modes | Static catalog plus two local dynamic templates: `change-summary` and `docs-change-check`. | Dynamic catalog can grow toward deep research, repo audit, migration, adversarial review, safe fix loop, root cause, rule mining, tournament, triage quarantine, and rubric eval. | Add future templates one at a time with fixture and smoke coverage. | +| Save/reuse | Local YAML registry exists; dynamic JS can also be saved under local SHA-bound trust metadata and run by id. | Approved dynamic scripts can become local templates or skills with trust metadata. | Keep remote/public registry behavior inspect-first. | +| Native UI parity | CLI status/watch/artifacts; no Claude `/workflows` panel. | Codex-native best effort: same-conversation summaries, visible worker threads, artifact links, optional explicit new thread. | Do not claim exact Claude UI parity. | + +## Already Built: Do Not Rebuild + +- static YAML workflow engine; +- workflow registry/list/show/validate; +- run store, status, watch, result, list/show/latest; +- reducer envelopes and artifact manifests; +- gated safe write path for bounded patches; +- preview-first local dynamic JS execution on the v1.11 preview branch; +- app-thread preflight concept and fallback recording. + +## Still To Build + +1. Productize and release the Phase A `cwf dynamic generate` preview after local/CI smoke and review. +2. Default same-conversation result return polish. +3. Worker-thread visibility as a documented read-only execution path. +4. Additional dynamic templates beyond the first two implemented-preview templates. +5. Broader save/reuse packaging as skills. +7. Public docs and skill routing that explain stable vs preview vs planned surfaces. + +## Human Rule + +If a future goal starts by rebuilding the static engine, the safe-write engine, or the v1.11 local dynamic JS runtime, it is probably doing the wrong job. diff --git a/docs/cwf-complete-state/GOAL_PROMPTS.md b/docs/cwf-complete-state/GOAL_PROMPTS.md new file mode 100644 index 0000000..0766184 --- /dev/null +++ b/docs/cwf-complete-state/GOAL_PROMPTS.md @@ -0,0 +1,331 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete-state staged goal prompts +coverage: Copy-ready staged goal prompts for implementing the complete-state roadmap from Phase A through Phase G. +not_complete_for: A single all-in-one goal, exact Claude parity, unrestricted JS, hosted scheduling, non-Codex routing, production deploys, database/credential/payment/permission writes. +verification_level: docs-only +real_smoke_status: requires_approval +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Derived from reviewed complete-state and usage plans. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# Goal Prompts: CWF Complete-State + +Use one goal at a time. The whole roadmap is intentionally not a single `/goal`. + +## Phase A: Intent To Preview + +Current state: + +- Local dynamic `workflow.js` execution already exists on the v1.11 preview branch. +- The existing runtime already covers preview, approval gate, AST policy, child execution, and initial CWF APIs. +- The current working tree already contains a Phase A MVP for `cwf dynamic generate`, pending acceptance/commit/release. +- This phase adds or hardens the missing Codex-generated authoring step from user intent to previewable script. +- Do not rebuild the dynamic execution runtime unless a failing test proves a gap in the existing path. +- If `cwf dynamic generate` already exists, treat this goal as acceptance hardening, docs alignment, and verification rather than a rewrite. + +```text +/goal +Outcome: +Build Phase A of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: given a user request, Codex can generate a preview-first workflow.js artifact for CWF, validate it, render a human-readable preview, and stop at approve-dynamic before execution. + +Boundaries: +Allowed writes: +- src/dynamic-workflow.ts +- src/cli.ts +- src/workflow-suggestion.ts or a new focused generator module +- tests for dynamic workflow generation and validation +- fixtures/dynamic/ +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/cwf-complete-state/ +- docs/JS_DYNAMIC_WORKFLOWS_PLAN.md only if wording must stay aligned + +Forbidden: +- Do not add unrestricted Node.js execution. +- Do not run generated scripts without preview and approval. +- Do not add non-Codex model routing. +- Do not add hosted queues, marketplace execution, production deploys, credentials, payments, database writes, or permissions changes. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- controlled dynamic real-smoke showing generated script preview, approval gate, successful read-only execution, and no target diff mutation +- Reasonix/v4Pro final review focused on overclaiming, sandbox escape, and approval bypass + +Constraints: +- Generated workflow.js must use only the allowed cwf API surface. +- Preview must show agents, permissions, budget, stop rules, and write intent. +- Failure must happen before execution for forbidden APIs. +- Existing YAML workflows and v1.10 safe writes must remain compatible. + +Iteration policy: +- Work in one vertical slice: generate -> preview -> approve gate -> existing dynamic execution. +- After every failing validation, fix the root cause and rerun the narrow test before broad tests. +- Keep user-facing text clear enough for non-CWF experts. + +Stop/Pause conditions: +- Stop complete when verification passes and Reasonix has no blocker/high findings. +- Pause for Ender if implementation requires changing public positioning, expanding write permissions, or adding a new external dependency. +- Stop as blocked after three repeated failures with the same root cause. +``` + +## Phase B: Same-Conversation Result Return + +```text +/goal +Outcome: +Build Phase B of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: a CWF run launched from Codex returns a concise result summary and artifact links to the initiating Codex conversation by default, while keeping --new-thread explicit. + +Boundaries: +Allowed writes: +- skills/codex-workflows/SKILL.md +- src/cli.ts +- src/desktop-bridge.ts +- tests for handoff/result behavior +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/cwf-complete-state/ +- README.md and README.zh-CN.md if command docs change + +Forbidden: +- Do not guess the current Codex thread from thread/list. +- Do not make Desktop required for CLI users. +- Do not default to creating a new Desktop thread. +- Do not change workflow execution semantics. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- manual same-conversation handoff smoke or documented app-host fallback +- Reasonix/v4Pro final review + +Constraints: +- CLI artifacts remain source of truth. +- App-server unavailable must produce clear fallback, not failure for completed CLI runs. +- Result summary must include run id, verdict, key findings, verification gaps, and artifact paths. + +Iteration policy: +- Start from existing `cwf desktop result --print` and skill behavior. +- Add tests before broadening UX. +- Keep new-thread behavior opt-in. + +Stop/Pause conditions: +- Stop complete when same-conversation result path is documented and verified. +- Pause if Codex host APIs cannot address the initiating thread safely. +- Stop as blocked after three repeated failures with the same root cause. +``` + +## Phase C: Worker Visibility + +```text +/goal +Outcome: +Build Phase C of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: read-only CWF workers can use Codex Desktop-visible worker threads when app-server execution is actually available, and fall back explicitly when it is not. + +Boundaries: +Allowed writes: +- src/adapters/worker-adapter.ts +- src/desktop-bridge.ts +- src/cli.ts +- tests/worker-adapter.test.ts +- tests/desktop-bridge.test.ts +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/cwf-complete-state/ +- docs/WORKER_APP_THREADS_PLAN.md if behavior changes +- README.md and README.zh-CN.md if user commands change + +Forbidden: +- Do not require Codex Desktop for normal CLI workflows. +- Do not guess the current thread from thread/list. +- Do not create hidden worker threads without recording metadata. +- Do not allow Desktop app-thread writes in this phase. +- Do not mask app-thread execution failure as success. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- cwf desktop check +- controlled app-thread real-smoke when Codex Desktop app-server is available +- Reasonix/v4Pro final review + +Constraints: +- Execution preflight must prove a thread can run and return the expected probe response. +- Worker runtime metadata must include adapter, thread id, turn id, sandbox, approval policy, fallback status, and fallback reason. +- SDK fallback must remain clear and safe. + +Iteration policy: +- First harden fake app-server tests. +- Then verify local CLI behavior. +- Run live app-thread smoke only after deterministic tests pass. + +Stop/Pause conditions: +- Stop complete when read-only workers create visible threads in controlled smoke or clearly fall back when unavailable. +- Pause for Ender if Codex host APIs do not expose a reliable execution path. +- Stop as blocked after three repeated app-server failures with the same root cause. +``` + +## Phase D: Write-Capable Dynamic Workers + +```text +/goal +Outcome: +Build Phase D of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: dynamic workflows can request safe write work only through a guarded safePatch path or parent-capped inherit-session, with no direct JavaScript writes. + +Boundaries: +Allowed writes: +- src/dynamic-workflow.ts +- src/safe-write.ts +- src/phase-engine.ts only if safe-write integration requires it +- tests/dynamic-workflow.test.ts +- tests/safe-write.test.ts +- tests/phase-engine.test.ts +- fixtures/dynamic/ +- fixtures/workflows/ +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/cwf-complete-state/ +- docs/WRITE_WORKERS_PLAN.md if behavior changes + +Forbidden: +- Do not let dynamic JavaScript write files directly. +- Do not bypass approve-dynamic or approve-write gates. +- Do not allow patches outside allowed_paths. +- Do not touch credentials, deployments, databases, payments, permissions, or external messages. +- Do not report PASS after verification failure. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- fixture showing dynamic safePatch creates `artifacts/dynamic-proposed.patch` and `artifacts/dynamic-safe-patch.json` +- fixture showing forbidden path rejection leaves target unchanged +- fixture showing verification failure fails the run +- controlled real-smoke modifying only allowed paths +- Reasonix/v4Pro final review + +Constraints: +- safePatch must reuse v1.10 path policy, drift check, git apply --check --3way, verification, and rollback evidence. +- inherit-session must require generated-current-session origin, matching SHA, and known parent permission cap. +- All write results must appear in artifact manifest and final report. + +Iteration policy: +- Implement safePatch before expanding inherit-session behavior. +- Keep every write test narrow and target-diff checked. +- Treat any ambiguous write boundary as a stop condition. + +Stop/Pause conditions: +- Stop complete when write-capable dynamic workflows pass all safety tests and one controlled real-smoke. +- Pause for Ender if the implementation needs broader permissions than safePatch or parent-capped inherit-session. +- Stop as blocked after three repeated write-safety failures with the same root cause. +``` + +## Phase E-F: Built-In Modes And Save/Reuse + +Sequencing note: + +Implement built-in modes and fixture coverage first. Add save/reuse only after template execution is stable enough that trust metadata has something concrete to bind to. + +```text +/goal +Outcome: +Build Phases E and F of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: ship reusable dynamic workflow templates for high-value tasks and allow approved workflows to be saved/reused with trust metadata. + +Boundaries: +Allowed writes: +- workflows/ or a dedicated dynamic templates directory +- src/workflow-registry.ts +- src/dynamic-workflow.ts +- tests for templates, registry, trust metadata, SHA mismatch, and no direct URL run +- docs/workflow-catalog.md +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/cwf-complete-state/ +- README.md and README.zh-CN.md + +Forbidden: +- Do not execute remote workflows directly by URL. +- Do not enable write-capable templates by default. +- Do not bypass inspect/install/enable. +- Do not add non-Codex model routing or hosted marketplace behavior. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- fixture runs for every template +- controlled real-smoke for at least two templates +- Reasonix/v4Pro final review + +Constraints: +- Templates must declare capabilities and budgets. +- Saved workflows must bind source SHA and origin. +- Dynamic templates must still pass preview, approval, AST policy, and child runtime constraints. + +Iteration policy: +- Add one template at a time with tests. +- Do not add save/reuse until template execution is stable. +- Keep remote/public registry behavior inspect-first. + +Stop/Pause conditions: +- Stop complete when templates are discoverable, test-covered, and safe by default. +- Pause if save/reuse needs a trust model change beyond existing registry docs. +- Stop as blocked after three repeated failures with the same root cause. +``` + +## Phase G: Public Polish And Release + +```text +/goal +Outcome: +Build Phase G of the CWF complete-state roadmap in /Users/sunny/Work/CODEX/codex-workflows: public docs, Chinese docs, workflow catalog, skill routing, release notes, and smoke coverage present CWF's complete-state UX clearly without overclaiming shipped capabilities. + +Boundaries: +Allowed writes: +- README.md +- README.zh-CN.md +- RELEASE_NOTES.md +- docs/CWF_COMPLETE_STATE_PLAN.md +- docs/WHEN_TO_USE_CWF.md +- docs/cwf-complete-state/ +- docs/workflow-catalog.md +- docs/claude-vs-codex-workflows.md +- skills/codex-workflows/SKILL.md +- scripts/smoke-cli.sh only if stable commands are added +- tests for docs/CLI smoke only if needed + +Forbidden: +- Do not change runtime semantics in this phase. +- Do not claim exact Claude Dynamic Workflows parity. +- Do not imply generated dynamic workflows, worker threads, safe writes, or GitHub posting are available beyond their verified availability label. +- Do not add non-Codex model routing. +- Do not add external writes or publishing automation. + +Verification: +- git diff --check +- npm run check +- bash scripts/smoke-cli.sh +- source audit for overclaiming phrases such as exact parity, automatic trigger, unrestricted JavaScript, or ungated writes +- Reasonix/v4Pro final review +- GitHub CI success after push + +Constraints: +- Public docs must separate current stable, implemented preview, and planned capabilities. +- Chinese README should be the default public entry if project convention keeps Chinese-first docs. +- Skill routing must say when not to use CWF. + +Iteration policy: +- Update one public surface at a time. +- After each docs surface, check whether it contradicts the complete-state plan. +- Keep release notes evidence-backed. + +Stop/Pause conditions: +- Stop complete when public docs and skill routing are aligned, local validation passes, and CI is green. +- Pause for Ender if product positioning changes or public release timing needs a decision. +- Stop as blocked after three repeated review findings about the same overclaim. +``` diff --git a/docs/cwf-complete-state/PRD.md b/docs/cwf-complete-state/PRD.md new file mode 100644 index 0000000..06231ed --- /dev/null +++ b/docs/cwf-complete-state/PRD.md @@ -0,0 +1,99 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete-state PRD +coverage: Product requirements for the complete CWF dynamic workflow experience across multiple implementation phases. +not_complete_for: One-shot implementation, exact Claude parity, hosted managed agents, unrestricted JavaScript, non-Codex routing, production deploy automation, or broad autonomous writes. +verification_level: docs-only +real_smoke_status: not_required +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Derived from reviewed complete-state and usage plans. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# PRD: CWF Complete-State + +## Problem + +Long Codex conversations can solve hard work, but they are a poor place to hold every phase, worker output, hypothesis, budget, gate, and artifact. + +The failure modes are familiar: + +- the agent finishes too early; +- one theory dominates because the same agent is judging itself; +- constraints disappear after many turns; +- raw logs and worker findings pollute the main conversation; +- untrusted input and high-permission actions mix; +- large runs spend too much time or token budget without a hard stop. + +Claude Dynamic Workflows are compelling because they create a task-specific harness. CWF should deliver the same kind of outcome for Codex users, without copying Claude's product internals or bypassing Codex-native permissions. + +## Target Users + +- Codex users doing complex repo audits, reviews, migrations, investigations, or controlled fixes. +- Maintainers who need repeatable workflows with artifacts and evidence. +- Skill authors who want a safe local workflow layer around Codex. +- Public users comparing CWF with Claude Dynamic Workflows. + +## Product Goal + +CWF is complete when a user can ask: + +> Run a dynamic workflow to audit this repo for auth risks and fix only the small safe issues after review. + +And the system can: + +1. decide that CWF is the right tool; +2. generate a task-specific `workflow.js`; +3. show a readable preview before execution; +4. pause for approval; +5. run through constrained CWF APIs and Codex workers; +6. surface read-only workers as Desktop threads when app-thread execution is available; +7. keep writes behind `safePatch` or tightly capped trusted `inherit-session`; +8. store full run evidence; +9. return the final reduced result to the initiating Codex conversation; +10. let the user save the workflow as a local template or skill. + +## Goals + +- Intent-to-workflow: Codex can turn a user request into a previewable workflow harness. +- Preview-first safety: the user sees purpose, phases, workers, budgets, permissions, write policy, and stop rules before execution. +- Same-conversation result: the default user experience returns the result to the initiating Codex thread. +- Worker visibility: read-only workers can be visible Codex Desktop threads when real app-thread execution is available. +- Safe writes: dynamic workflows cannot write directly; bounded writes use policy, gates, patch checks, verification, and rollback evidence. +- Built-in modes: common patterns are available without users designing every workflow. +- Save/reuse: approved workflows can become local templates or skills with trust metadata. + +## Non-Goals + +- No exact Claude product parity claim. +- No unrestricted JavaScript runtime. +- No hidden writes. +- No direct JavaScript filesystem, network, shell, package import, or target repo access. +- No non-Codex model routing in the public core. +- No hosted queue, scheduler, daemon, or managed-agent platform in this roadmap. +- No production deploys, database writes, credentials, payments, permissions, or external messages without a separate high-risk plan and explicit approval. + +## User Stories + +1. As a Codex user, I can ask for a complex workflow in plain language and receive a preview before anything runs. +2. As a cautious user, I can approve or reject the generated workflow before execution. +3. As a reviewer, I can inspect worker outputs, run artifacts, and the reducer result after completion. +4. As a Codex Desktop user, I can see read-only worker threads when the host supports real execution. +5. As a maintainer, I can keep write-capable work bounded to allowed paths and verified commands. +6. As a repeat user, I can save a good workflow as a reusable local template. +7. As a public user, I can tell which features are stable, preview, or planned. + +## Success Criteria + +- Generated workflow preview exists before execution. +- Invalid generated scripts fail before execution. +- Same-conversation result return is the default Codex UX. +- Read-only app-thread workers have real execution preflight and explicit fallback. +- Safe writes cannot bypass gate, path policy, patch check, verification, or rollback evidence. +- Dynamic modes cover research, audit, migration, adversarial review, safe fix loop, root-cause investigation, rule mining, tournament selection, quarantine triage, and rubric eval. +- Docs never imply exact Claude parity, unrestricted writes, or shipped behavior that is only planned. diff --git a/docs/cwf-complete-state/README.md b/docs/cwf-complete-state/README.md new file mode 100644 index 0000000..785808f --- /dev/null +++ b/docs/cwf-complete-state/README.md @@ -0,0 +1,63 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete-state delivery pack +coverage: Index for the PRD, SPEC, acceptance matrix, and staged goal prompts that turn the CWF complete-state plan into implementable phases. +not_complete_for: Runtime implementation, exact Claude parity, hosted scheduling, unrestricted JavaScript, non-Codex model routing, production deploy automation, or broad autonomous writes. +verification_level: docs-only +real_smoke_status: not_required +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Based on the reviewed CWF complete-state plan and usage guide; no blocker/high/real medium issues in the source plan. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# CWF Complete-State Delivery Pack + +This folder is the handoff pack for making Codex Flow feel like Claude-style dynamic workflows while staying Codex-native. + +Use this pack when a future Codex goal needs the concrete PRD, SPEC, acceptance criteria, or phase prompt without reading the long roadmap first. + +## Files + +| File | Use it for | +|---|---| +| `PRD.md` | Product intent: who this is for, what "complete" means, what not to build. | +| `SPEC.md` | Runtime contract: flow, APIs, safety boundaries, result return, save/reuse. | +| `CURRENT_VS_COMPLETE.md` | What already exists, what complete state still needs, and what future goals must not rebuild. | +| `ACCEPTANCE.md` | Evidence-bound checklist for each phase. | +| `GOAL_PROMPTS.md` | Copy-ready staged `/goal` prompts from Phase A through Phase G. | + +## Source Of Truth + +This pack is extracted from: + +- `docs/CWF_COMPLETE_STATE_PLAN.md` +- `docs/WHEN_TO_USE_CWF.md` +- trq212's "A harness for every task" breakdown +- MinLi's Chinese annotated dynamic workflow breakdown + +## Human Summary + +CWF complete-state means: + +1. Codex decides CWF is worth using. +2. Codex generates a task-specific `workflow.js`. +3. CWF previews the plan, workers, budgets, permissions, and stop rules. +4. The user approves before execution. +5. Workers run through Codex-native execution paths. +6. Read-only workers may become visible Desktop threads when available. +7. Writes only happen through `safePatch` or tightly capped trusted `inherit-session`. +8. The initiating Codex conversation receives the final reduced result. +9. Good workflows can be saved as local templates or skills. + +This is not an exact Claude clone. It is the Codex-native version: Codex remains the brain and permission boundary; CWF is the plan/evidence/gate/reducer layer. + +## Before Opening A Goal + +Read `CURRENT_VS_COMPLETE.md` first. The most important boundary is: + +> v1.11 already has preview-first local dynamic `workflow.js` execution on the current preview branch. Phase A adds Codex-generated workflow authoring; it should not rebuild the dynamic runtime. diff --git a/docs/cwf-complete-state/SPEC.md b/docs/cwf-complete-state/SPEC.md new file mode 100644 index 0000000..99c5db2 --- /dev/null +++ b/docs/cwf-complete-state/SPEC.md @@ -0,0 +1,163 @@ +--- +half_life: 30d +archive_at: 2026-07-06 +scope_type: roadmap +scope_name: CWF complete-state SPEC +coverage: Runtime and safety contract for implementing the complete CWF dynamic workflow roadmap. +not_complete_for: Exact implementation details for every phase, hosted scheduling, unrestricted JS, non-Codex routing, production deploys, database/credential/payment/permission writes. +verification_level: docs-only +real_smoke_status: not_required +review_status: reviewed +reviewer: reasonix-v4pro +review_command: crb delegate --mode final-review --json review-payload-for-cwf-planning-docs-after-trq212-minli +review_notes: Derived from reviewed complete-state and usage plans. +review_owner: Codex +review_due: resolved 2026-06-06 +--- + +# SPEC: CWF Complete-State + +## Runtime Flow + +```text +user asks for complex workflow + -> Codex decides CWF is appropriate + -> Codex generates workflow.js from intent + -> CWF validates AST and capability use + -> CWF renders preview and budget/write summary + -> user approves approve-dynamic + -> CWF child runtime executes through cwf APIs only + -> workers run through Codex-native adapters + -> safe writes go through safePatch or capped inherit-session + -> reducer produces result and artifacts + -> initiating Codex conversation receives summary + artifact links + -> user may save workflow as template/skill +``` + +## Capability Surface + +Required `cwf` APIs: + +- `cwf.git.changedFiles` +- `cwf.git.diff` +- `cwf.agent.run` +- `cwf.map` +- `cwf.artifacts.write` +- `cwf.report.summarize` +- `cwf.write.safePatch` +- `cwf.verify.run` +- `cwf.classify.route` +- `cwf.tournament.run` +- `cwf.loop.until` +- `cwf.quarantine.read` +- `cwf.template.save` + +## Runtime Controls + +- source SHA binding; +- origin trust enum; +- AST policy gate; +- Node Permission Model child; +- no target repo read from the child; +- no network, shell, child process, or package import from workflow JS; +- max agents; +- max concurrency; +- wall-clock timeout; +- output byte limit; +- token usage recording where available; +- gate before dynamic execution; +- gate before writes; +- failure summary. + +## Result Return Contract + +Default: + +- result returns to the initiating Codex conversation when launched from Codex. + +Optional: + +- `--new-thread` creates a separate coordinator/result thread only when explicitly requested; +- worker app threads are visible only when app-server execution is available and preflight proves real execution; +- CLI-only users still get `cwf result RUN_ID`. + +Forbidden: + +- do not guess the current thread from `thread/list`; +- do not make Desktop required for CLI users; +- do not hide fallback status. + +## Write Contract + +Dynamic JS never writes directly. + +Allowed write routes: + +1. `safePatch` + - isolated writer target; + - proposed patch artifact; + - `allowed_paths`; + - `forbidden_paths`; + - drift check; + - `git apply --check --3way`; + - verification; + - rollback artifact. + +2. `inherit-session` + - generated-current-session origin only; + - approved script SHA only; + - never exceeds parent sandbox or approval policy; + - records runtime metadata; + - still bounded by task prompt and artifacts. + +Forbidden write routes: + +- direct Desktop app-thread writes without stable Codex approval support; +- remote untrusted dynamic scripts with write permissions; +- external irreversible writes. + +## Quarantine Contract + +Quarantine is mandatory when a workflow reads untrusted public content, customer messages, third-party issues, Slack/Discord exports, web pages, or arbitrary uploaded files. + +Worker classes: + +- Reader workers read untrusted content and stay read-only. +- Verifier workers check reader outputs against rubric, source quality, duplication, or policy. +- Actor workers perform any proposed action only after gate, path policy, safePatch, or explicit external approval. + +Safety invariant: + +> The worker that reads untrusted content is not the worker that writes, posts, deletes, merges, deploys, or changes permissions. + +## Built-In Dynamic Patterns + +| Pattern | Use when | Shape | +|---|---|---| +| Classify-and-act | Items need routing. | Classifier labels; branches execute specific read-only or gated actions. | +| Fan-out-and-synthesize | Independent files, claims, or hypotheses need separate context. | `cwf.map` workers; reducer merges. | +| Adversarial verification | A proposal needs skeptical checking. | Verifier/challenger workers before final synthesis. | +| Generate-and-filter | Many candidates need dedupe and rubric filtering. | Generator workers propose; filters score. | +| Tournament | Ranking or selection benefits from comparison. | Pairwise judging until top candidates remain. | +| Loop-until-done | The amount of work is unknown. | Repeat until explicit stop condition or budget cap. | +| Quarantine triage | Inputs are untrusted and actions may be high privilege. | Isolated readers; gated actors. | +| Rule mining | Repeated corrections should become durable rules. | Mine, cluster, adversarially verify, propose rule updates. | + +## Built-In Modes + +- `deep-research` +- `repo-audit` +- `migration-plan` +- `adversarial-review` +- `safe-fix-loop` +- `root-cause-investigation` +- `rule-mining` +- `tournament-selection` +- `triage-quarantine` +- `eval-and-rubric` + +## Availability Labels + +- **Stable public core**: current package surface with CI-safe smoke. +- **Implemented preview**: implemented and tested on current branch, but not fully productized. +- **Planned**: roadmap only; not a shipped command or safety guarantee. diff --git a/docs/workflow-catalog.md b/docs/workflow-catalog.md index fd06b40..c988ad0 100644 --- a/docs/workflow-catalog.md +++ b/docs/workflow-catalog.md @@ -14,6 +14,8 @@ Bundled review workflows: The `doc-refresh` workflow is the bundled user-facing exception: it is write-capable, documentation-only, gated, and requires preview artifacts plus explicit approval before its Codex write phase. Its `direct-docs` mode is a docs/readme/release-note policy preset; the writer still runs in an isolated target and CWF applies only a checked patch. v1.10 also supports `write_policy.mode: patch` for bounded non-doc workflows and fixtures: the writer runs in an isolated target, CWF extracts `artifacts/proposed.patch`, checks allowed/forbidden paths, runs `git apply --check --3way`, applies, then records verification and rollback artifacts. +v1.11 also supports local dynamic JavaScript workflow harnesses through `cwf dynamic run`. They are not part of the YAML registry and are never run directly as unrestricted Node.js. CWF copies the script into run artifacts, records SHA-256, renders a preview, waits for `approve-dynamic`, and executes only through a permissioned child process plus parent CWF JSON-RPC APIs. Dynamic workflows can now be generated from intent, discovered from local template folders, saved with SHA-bound trust metadata, and run by id. Remote URL execution is intentionally rejected until a script has been inspected and saved locally. + ## diff-review Review a tracked git diff from correctness, tests, and safety perspectives. @@ -168,6 +170,43 @@ cwf approve approve-write cwf resume ``` +## dynamic-js + +Run a preview-first local JavaScript workflow harness. + +Use when: + +- you need a task-specific orchestration harness instead of a reusable YAML workflow +- the script can stay inside `cwf.git`, `cwf.agent.run`, `cwf.safePatch`, `cwf.map`, `cwf.artifacts`, and `cwf.report` +- you want artifact-backed preview, capabilities, budget, events, worker outputs, and final report + +Do not use when: + +- the script needs direct `fs`, `process`, shell, network, package imports, or target repo access +- the workflow is remote, copied from an untrusted source, or hash-mismatched +- the task requires JavaScript itself to write files directly instead of submitting a guarded `safePatch` +- `inherit-session` would exceed the parent Codex permission cap + +Run: + +```bash +cwf dynamic list +cwf dynamic show change-summary +cwf dynamic generate --goal "Summarize this repo diff" --target +cwf dynamic run change-summary --target +cwf dynamic run fixtures/dynamic/read-only.workflow.js --target +cwf approve approve-dynamic +cwf resume +cwf dynamic save ./workflow.js --id local-review +``` + +Built-in dynamic templates: + +- `change-summary`: read-only summary of changed files and diff size. +- `docs-change-check`: read-only documentation-scope check for README/docs changes. + +`cwf.safePatch.apply` is available only as a guarded parent-applied patch path. The dynamic script must declare `metadata.safe_patch_policy` so the write policy is visible in preview, and the runtime `write_policy` must exactly match that metadata. CWF stores `dynamic-proposed.patch`, enforces `allowed_paths` and `forbidden_paths`, runs `git apply --check --3way`, applies through the parent, runs verification commands, records rollback evidence, and reverse-applies the patch if verification fails. + ## Choosing Quickly -Use `diff-review` for code correctness, `repo-audit` for maintainability and project health, `implementation-plan` for plan quality, `research-crosscheck` for factual/source discipline, `release-review` for ship readiness, and `doc-refresh` only for gated documentation writes. +Use `diff-review` for code correctness, `repo-audit` for maintainability and project health, `implementation-plan` for plan quality, `research-crosscheck` for factual/source discipline, `release-review` for ship readiness, `doc-refresh` only for gated documentation writes, and `dynamic-js` for approved local JavaScript orchestration harnesses, generated previews, or SHA-trusted saved templates. diff --git a/fixtures/dynamic/read-only.workflow.js b/fixtures/dynamic/read-only.workflow.js new file mode 100644 index 0000000..70b2308 --- /dev/null +++ b/fixtures/dynamic/read-only.workflow.js @@ -0,0 +1,19 @@ +export default async function workflow(cwf) { + const files = await cwf.git.changedFiles(); + const reviews = await cwf.map( + files, + async (file, index) => + cwf.agent.run({ + id: `review-${index}`, + role: "reviewer", + prompt: `Review ${file} for correctness and test risk.`, + permissions: "read-only", + }), + { concurrency: 2 }, + ); + await cwf.artifacts.write({ + name: "fixture-note.md", + content: "Dynamic fixture wrote this artifact through parent CWF JSON-RPC.\n", + }); + return cwf.report.summarize(reviews); +} diff --git a/fixtures/dynamic/safe-patch-verification-fail.workflow.js b/fixtures/dynamic/safe-patch-verification-fail.workflow.js new file mode 100644 index 0000000..8d912b8 --- /dev/null +++ b/fixtures/dynamic/safe-patch-verification-fail.workflow.js @@ -0,0 +1,28 @@ +export const metadata = { + "id": "safe-patch-verification-fail-fixture", + "title": "Safe Patch Verification Fail Fixture", + "version": "1.0.0", + "permissions": ["safePatch"], + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": ["test -f src/generated/missing.js"] + } +}; + +export default async function workflow(cwf) { + const result = await cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\nnew file mode 100644\nindex 0000000..42d3b06\n--- /dev/null\n+++ b/src/generated/value.js\n@@ -0,0 +1 @@\n+export const value = 42;\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: ["test -f src/generated/missing.js"] + } + }); + return { + template: "safe-patch-verification-fail-fixture", + safe_patch: result + }; +} diff --git a/fixtures/dynamic/safe-patch.workflow.js b/fixtures/dynamic/safe-patch.workflow.js new file mode 100644 index 0000000..c3d3924 --- /dev/null +++ b/fixtures/dynamic/safe-patch.workflow.js @@ -0,0 +1,28 @@ +export const metadata = { + "id": "safe-patch-fixture", + "title": "Safe Patch Fixture", + "version": "1.0.0", + "permissions": ["safePatch"], + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": ["test -f src/generated/value.js"] + } +}; + +export default async function workflow(cwf) { + const result = await cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\nnew file mode 100644\nindex 0000000..42d3b06\n--- /dev/null\n+++ b/src/generated/value.js\n@@ -0,0 +1 @@\n+export const value = 42;\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: ["test -f src/generated/value.js"] + } + }); + return { + template: "safe-patch-fixture", + safe_patch: result + }; +} diff --git a/package-lock.json b/package-lock.json index 89d0b68..18f93e9 100644 --- a/package-lock.json +++ b/package-lock.json @@ -9,6 +9,7 @@ "version": "1.0.0", "dependencies": { "@openai/codex-sdk": "^0.136.0", + "acorn": "^8.16.0", "yaml": "^2.8.1" }, "bin": { @@ -1118,6 +1119,18 @@ "url": "https://opencollective.com/vitest" } }, + "node_modules/acorn": { + "version": "8.16.0", + "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.16.0.tgz", + "integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==", + "license": "MIT", + "bin": { + "acorn": "bin/acorn" + }, + "engines": { + "node": ">=0.4.0" + } + }, "node_modules/assertion-error": { "version": "2.0.1", "resolved": "https://registry.npmjs.org/assertion-error/-/assertion-error-2.0.1.tgz", diff --git a/package.json b/package.json index 2f80cd7..0c191fb 100644 --- a/package.json +++ b/package.json @@ -27,6 +27,7 @@ }, "dependencies": { "@openai/codex-sdk": "^0.136.0", + "acorn": "^8.16.0", "yaml": "^2.8.1" }, "devDependencies": { diff --git a/scripts/smoke-cli.sh b/scripts/smoke-cli.sh index b0e8257..be260e6 100755 --- a/scripts/smoke-cli.sh +++ b/scripts/smoke-cli.sh @@ -14,12 +14,20 @@ echo "==> cwf help" node dist/cli.js --help >/tmp/cwf-help-smoke.txt grep -q "cwf workflows validate" /tmp/cwf-help-smoke.txt grep -q "cwf run --target " /tmp/cwf-help-smoke.txt +grep -q "cwf dynamic list" /tmp/cwf-help-smoke.txt +grep -q "cwf dynamic generate" /tmp/cwf-help-smoke.txt +grep -q "cwf dynamic run --target " /tmp/cwf-help-smoke.txt echo "==> workflow registry smoke" node dist/cli.js workflows list node dist/cli.js workflows show diff-review node dist/cli.js workflows validate +echo "==> dynamic workflow registry smoke" +node dist/cli.js dynamic list +node dist/cli.js dynamic show change-summary +node dist/cli.js dynamic show docs-change-check + echo "==> workflow validation smoke" node dist/cli.js validate workflows/diff-review.yaml node dist/cli.js validate fixtures/workflows/gated-diff-review.yaml @@ -35,6 +43,154 @@ if node dist/cli.js validate fixtures/workflows/write-without-gate.yaml >/tmp/cw fi grep -q "writes:true" /tmp/cwf-write-without-gate.txt +echo "==> dynamic workflow preview smoke" +tmp_dynamic_target=$(mktemp -d /tmp/cwf-dynamic-target-XXXXXX) +mkdir -p "$tmp_dynamic_target/src" +printf '{"name":"dynamic-smoke","version":"0.0.0"}\n' > "$tmp_dynamic_target/package.json" +printf 'export const answer = 42;\n' > "$tmp_dynamic_target/src/calc.js" +git -C "$tmp_dynamic_target" init >/dev/null +git -C "$tmp_dynamic_target" config user.email codex-workflows@example.invalid +git -C "$tmp_dynamic_target" config user.name codex-workflows +git -C "$tmp_dynamic_target" add . +git -C "$tmp_dynamic_target" commit -m baseline >/dev/null +printf 'export const answer = 0;\n' > "$tmp_dynamic_target/src/calc.js" +node dist/cli.js dynamic run fixtures/dynamic/read-only.workflow.js --target "$tmp_dynamic_target" >/tmp/cwf-dynamic-preview.txt +grep -q "Approve: cwf approve" /tmp/cwf-dynamic-preview.txt +dynamic_run_id=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-preview.txt) +test -n "$dynamic_run_id" +test -f "$HOME/.codex-workflows/runs/$dynamic_run_id/artifacts/dynamic-preview.md" +test -f "$HOME/.codex-workflows/runs/$dynamic_run_id/artifacts/workflow.sha256" +grep -q "Node Permission Model child process" "$HOME/.codex-workflows/runs/$dynamic_run_id/artifacts/dynamic-preview.md" +rm -rf "$tmp_dynamic_target" "$HOME/.codex-workflows/runs/$dynamic_run_id" /tmp/cwf-dynamic-preview.txt + +echo "==> dynamic generate smoke" +tmp_generate_target=$(mktemp -d /tmp/cwf-dynamic-generate-target-XXXXXX) +mkdir -p "$tmp_generate_target/src" +printf '{"name":"dynamic-generate-smoke","version":"0.0.0"}\n' > "$tmp_generate_target/package.json" +printf 'export const generated = true;\n' > "$tmp_generate_target/src/app.js" +git -C "$tmp_generate_target" init >/dev/null +git -C "$tmp_generate_target" config user.email codex-workflows@example.invalid +git -C "$tmp_generate_target" config user.name codex-workflows +git -C "$tmp_generate_target" add . +git -C "$tmp_generate_target" commit -m baseline >/dev/null +printf 'export const generated = false;\n' > "$tmp_generate_target/src/app.js" +tmp_generate_dir=$(mktemp -d /tmp/cwf-dynamic-generate-XXXXXX) +tmp_generated_workflow="$tmp_generate_dir/generated.workflow.js" +node dist/cli.js dynamic generate --goal "Summarize the current fixture diff" --target "$tmp_generate_target" --output "$tmp_generated_workflow" >/tmp/cwf-dynamic-generate.txt +grep -q "Generated: $tmp_generated_workflow" /tmp/cwf-dynamic-generate.txt +generated_run_id=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-generate.txt) +test -n "$generated_run_id" +grep -q "Summarize the current fixture diff" "$HOME/.codex-workflows/runs/$generated_run_id/artifacts/dynamic-preview.md" +rm -rf "$tmp_generate_target" "$tmp_generate_dir" "$HOME/.codex-workflows/runs/$generated_run_id" /tmp/cwf-dynamic-generate.txt + +echo "==> dynamic remote URL rejection smoke" +tmp_remote_target=$(mktemp -d /tmp/cwf-dynamic-remote-target-XXXXXX) +git -C "$tmp_remote_target" init >/dev/null +git -C "$tmp_remote_target" config user.email codex-workflows@example.invalid +git -C "$tmp_remote_target" config user.name codex-workflows +printf '{"name":"dynamic-remote-smoke","version":"0.0.0"}\n' > "$tmp_remote_target/package.json" +git -C "$tmp_remote_target" add . +git -C "$tmp_remote_target" commit -m baseline >/dev/null +if node dist/cli.js dynamic run https://example.com/workflow.js --target "$tmp_remote_target" >/tmp/cwf-dynamic-remote.txt 2>&1; then + echo "Expected remote dynamic workflow URL to fail, but it passed." >&2 + cat /tmp/cwf-dynamic-remote.txt >&2 + exit 1 +fi +grep -q "cannot run directly by URL" /tmp/cwf-dynamic-remote.txt +rm -rf "$tmp_remote_target" /tmp/cwf-dynamic-remote.txt + +echo "==> dynamic template execution smoke" +tmp_template_target=$(mktemp -d /tmp/cwf-dynamic-template-target-XXXXXX) +mkdir -p "$tmp_template_target/docs" +printf '# Template smoke\n' > "$tmp_template_target/README.md" +printf '# Notes\n' > "$tmp_template_target/docs/note.md" +git -C "$tmp_template_target" init >/dev/null +git -C "$tmp_template_target" config user.email codex-workflows@example.invalid +git -C "$tmp_template_target" config user.name codex-workflows +git -C "$tmp_template_target" add . +git -C "$tmp_template_target" commit -m baseline >/dev/null +printf '# Template smoke updated\n' > "$tmp_template_target/README.md" +node dist/cli.js dynamic run change-summary --target "$tmp_template_target" --approve >/tmp/cwf-dynamic-template-a.txt +template_run_a=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-template-a.txt) +test -n "$template_run_a" +grep -q "Status: completed" /tmp/cwf-dynamic-template-a.txt +grep -q '"template": "change-summary"' "$HOME/.codex-workflows/runs/$template_run_a/artifacts/dynamic-final.json" +node dist/cli.js dynamic run docs-change-check --target "$tmp_template_target" --approve >/tmp/cwf-dynamic-template-b.txt +template_run_b=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-template-b.txt) +test -n "$template_run_b" +grep -q "Status: completed" /tmp/cwf-dynamic-template-b.txt +grep -q '"template": "docs-change-check"' "$HOME/.codex-workflows/runs/$template_run_b/artifacts/dynamic-final.json" +rm -rf "$tmp_template_target" "$HOME/.codex-workflows/runs/$template_run_a" "$HOME/.codex-workflows/runs/$template_run_b" /tmp/cwf-dynamic-template-a.txt /tmp/cwf-dynamic-template-b.txt + +echo "==> dynamic save/reuse trust smoke" +tmp_trust_target=$(mktemp -d /tmp/cwf-dynamic-trust-target-XXXXXX) +trusted_id="trusted-change-$$" +mkdir -p "$tmp_trust_target/src" +printf '{"name":"dynamic-trust-smoke","version":"0.0.0"}\n' > "$tmp_trust_target/package.json" +printf 'export const trusted = true;\n' > "$tmp_trust_target/src/app.js" +git -C "$tmp_trust_target" init >/dev/null +git -C "$tmp_trust_target" config user.email codex-workflows@example.invalid +git -C "$tmp_trust_target" config user.name codex-workflows +git -C "$tmp_trust_target" add . +git -C "$tmp_trust_target" commit -m baseline >/dev/null +printf 'export const trusted = false;\n' > "$tmp_trust_target/src/app.js" +node dist/cli.js dynamic save workflows/dynamic/change-summary.workflow.js --id "$trusted_id" >/tmp/cwf-dynamic-save.txt +grep -q "Saved dynamic workflow: $trusted_id" /tmp/cwf-dynamic-save.txt +node dist/cli.js dynamic run "$trusted_id" --target "$tmp_trust_target" --approve >/tmp/cwf-dynamic-saved-run.txt +trusted_run=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-saved-run.txt) +test -n "$trusted_run" +grep -q "Status: completed" /tmp/cwf-dynamic-saved-run.txt +printf '\n// tampered\n' >> "$HOME/.codex-workflows/dynamic/$trusted_id.workflow.js" +if node dist/cli.js dynamic run "$trusted_id" --target "$tmp_trust_target" --approve >/tmp/cwf-dynamic-tampered.txt 2>&1; then + echo "Expected tampered trusted dynamic workflow to fail, but it passed." >&2 + cat /tmp/cwf-dynamic-tampered.txt >&2 + exit 1 +fi +grep -q "SHA mismatch" /tmp/cwf-dynamic-tampered.txt +rm -rf "$tmp_trust_target" "$HOME/.codex-workflows/runs/$trusted_run" "$HOME/.codex-workflows/dynamic/$trusted_id.workflow.js" "$HOME/.codex-workflows/dynamic/$trusted_id.trust.json" /tmp/cwf-dynamic-save.txt /tmp/cwf-dynamic-saved-run.txt /tmp/cwf-dynamic-tampered.txt + +echo "==> dynamic safePatch execution smoke" +tmp_safe_patch_target=$(mktemp -d /tmp/cwf-dynamic-safe-patch-target-XXXXXX) +mkdir -p "$tmp_safe_patch_target/src" +printf '{"name":"safe-patch-smoke","version":"0.0.0"}\n' > "$tmp_safe_patch_target/package.json" +git -C "$tmp_safe_patch_target" init >/dev/null +git -C "$tmp_safe_patch_target" config user.email codex-workflows@example.invalid +git -C "$tmp_safe_patch_target" config user.name codex-workflows +git -C "$tmp_safe_patch_target" add . +git -C "$tmp_safe_patch_target" commit -m baseline >/dev/null +node dist/cli.js dynamic run fixtures/dynamic/safe-patch.workflow.js --target "$tmp_safe_patch_target" --approve >/tmp/cwf-dynamic-safe-patch.txt +safe_patch_run=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-safe-patch.txt) +test -n "$safe_patch_run" +grep -q "Status: completed" /tmp/cwf-dynamic-safe-patch.txt +test -f "$tmp_safe_patch_target/src/generated/value.js" +grep -q "export const value = 42;" "$tmp_safe_patch_target/src/generated/value.js" +grep -q '"status": "passed"' "$HOME/.codex-workflows/runs/$safe_patch_run/artifacts/dynamic-safe-patch.json" +rm -rf "$tmp_safe_patch_target" "$HOME/.codex-workflows/runs/$safe_patch_run" /tmp/cwf-dynamic-safe-patch.txt + +echo "==> dynamic safePatch rollback smoke" +tmp_safe_patch_fail_target=$(mktemp -d /tmp/cwf-dynamic-safe-patch-fail-target-XXXXXX) +mkdir -p "$tmp_safe_patch_fail_target/src" +printf '{"name":"safe-patch-fail-smoke","version":"0.0.0"}\n' > "$tmp_safe_patch_fail_target/package.json" +git -C "$tmp_safe_patch_fail_target" init >/dev/null +git -C "$tmp_safe_patch_fail_target" config user.email codex-workflows@example.invalid +git -C "$tmp_safe_patch_fail_target" config user.name codex-workflows +git -C "$tmp_safe_patch_fail_target" add . +git -C "$tmp_safe_patch_fail_target" commit -m baseline >/dev/null +node dist/cli.js dynamic run fixtures/dynamic/safe-patch-verification-fail.workflow.js --target "$tmp_safe_patch_fail_target" >/tmp/cwf-dynamic-safe-patch-fail.txt +safe_patch_fail_run=$(sed -n 's/^Run ID: //p' /tmp/cwf-dynamic-safe-patch-fail.txt) +test -n "$safe_patch_fail_run" +node dist/cli.js approve "$safe_patch_fail_run" approve-dynamic >/tmp/cwf-dynamic-safe-patch-fail-approve.txt +if node dist/cli.js resume "$safe_patch_fail_run" >/tmp/cwf-dynamic-safe-patch-fail-resume.txt 2>&1; then + echo "Expected dynamic safePatch verification failure to fail resume, but it passed." >&2 + cat /tmp/cwf-dynamic-safe-patch-fail.txt >&2 + exit 1 +fi +test ! -f "$tmp_safe_patch_fail_target/src/generated/value.js" +grep -q '"status": "failed"' "$HOME/.codex-workflows/runs/$safe_patch_fail_run/artifacts/dynamic-safe-patch.json" +grep -q '"rollback"' "$HOME/.codex-workflows/runs/$safe_patch_fail_run/artifacts/dynamic-safe-patch.json" +grep -q '"status": "passed"' "$HOME/.codex-workflows/runs/$safe_patch_fail_run/artifacts/dynamic-safe-patch.json" +rm -rf "$tmp_safe_patch_fail_target" "$HOME/.codex-workflows/runs/$safe_patch_fail_run" /tmp/cwf-dynamic-safe-patch-fail.txt /tmp/cwf-dynamic-safe-patch-fail-approve.txt /tmp/cwf-dynamic-safe-patch-fail-resume.txt + echo "==> github-pr artifact smoke" tmp_target=$(mktemp -d /tmp/cwf-gh-target-XXXXXX) mkdir -p "$tmp_target/src" diff --git a/skills/codex-workflows/SKILL.md b/skills/codex-workflows/SKILL.md index 3e022e2..01cc1e0 100644 --- a/skills/codex-workflows/SKILL.md +++ b/skills/codex-workflows/SKILL.md @@ -1,6 +1,6 @@ --- name: codex-workflows -description: Run public Codex-native workflow specs for repeatable multi-worker engineering tasks, including gated documentation refresh, PR-ready artifacts, and safe workflow spec suggestions. +description: Run public Codex-native workflow specs and approved local JavaScript dynamic workflows for repeatable multi-worker engineering tasks, including gated documentation refresh, PR-ready artifacts, and safe workflow spec suggestions. when_to_use: "run a workflow, audit a diff, review a branch with multiple perspectives, coordinate Codex workers, repeatable repo audit, gated documentation refresh, GitHub PR artifact, workflow suggestion, compare Codex workflow behavior to Claude Dynamic Workflows" metadata: version: "1.0.0" @@ -29,7 +29,7 @@ This public skill is Codex-native: ## Current Workflows -The bundled review workflows are read-only: `diff-review`, `repo-audit`, `implementation-plan`, `research-crosscheck`, and `release-review`. The bundled user-facing write-capable workflow is `doc-refresh`, which is documentation-only and must pause at a gate before writing. v1.10 also supports bounded patch-mode write workflows when a workflow declares `write_policy.mode: patch`, `allowed_paths`, `forbidden_paths`, and optional `verification_commands`. +The bundled review workflows are read-only: `diff-review`, `repo-audit`, `implementation-plan`, `research-crosscheck`, and `release-review`. The bundled user-facing write-capable workflow is `doc-refresh`, which is documentation-only and must pause at a gate before writing. v1.10 also supports bounded patch-mode write workflows when a workflow declares `write_policy.mode: patch`, `allowed_paths`, `forbidden_paths`, and optional `verification_commands`. v1.11 supports local dynamic JavaScript workflows through `cwf dynamic run`; these scripts are previewed, approved, AST-gated, and executed in a Node Permission Model child process that can only use parent CWF JSON-RPC APIs. The implemented dynamic preview also supports `cwf dynamic generate`, local `dynamic list/show`, `dynamic save` with SHA-bound trust metadata, and guarded `cwf.safePatch.apply`. ```bash cwf validate workflows/diff-review.yaml @@ -46,6 +46,12 @@ cwf run release-review --target cwf run doc-refresh --target cwf run workflows/diff-review.yaml --target cwf run workflows/diff-review.yaml --target --background +cwf dynamic list +cwf dynamic show change-summary +cwf dynamic generate --goal "" --target +cwf dynamic run change-summary --target +cwf dynamic run fixtures/dynamic/read-only.workflow.js --target +cwf dynamic save ./workflow.js --id local-review cwf status cwf watch cwf latest --target @@ -62,9 +68,9 @@ cwf suggest-workflow --from-run cwf cancel ``` -Bundled workflows are read-only by default. Review workflows inspect a target git diff from independent Codex worker perspectives and reduce the findings into a stable reduced JSON envelope plus one saved Markdown result. `doc-refresh` is the narrow exception: it creates pre-write artifacts, waits for explicit approval, then runs its writer in an isolated target with the `direct-docs` policy preset. All write workflows extract `artifacts/proposed.patch`, enforce `write_policy`, run `git apply --check --3way`, apply, and then record verification plus rollback artifacts. If workflow verification fails after apply, CWF attempts to reverse-apply the proposed patch before returning a failed run. +Bundled workflows are read-only by default. Review workflows inspect a target git diff from independent Codex worker perspectives and reduce the findings into a stable reduced JSON envelope plus one saved Markdown result. `doc-refresh` is the narrow exception: it creates pre-write artifacts, waits for explicit approval, then runs its writer in an isolated target with the `direct-docs` policy preset. All write workflows extract `artifacts/proposed.patch`, enforce `write_policy`, run `git apply --check --3way`, apply, and then record verification plus rollback artifacts. If workflow verification fails after apply, CWF attempts to reverse-apply the proposed patch before returning a failed run. Dynamic JavaScript workflows are not registry YAML and not unrestricted `node workflow.js`; use them only for local, approved harnesses that stay inside `cwf.git`, `cwf.agent.run`, `cwf.safePatch`, `cwf.map`, `cwf.artifacts`, and `cwf.report`. Dynamic `safePatch` requires `metadata.safe_patch_policy` in the script preview and rejects runtime policy widening. Remote dynamic workflow URLs must not run directly; inspect and save a local trusted copy first. -Use `docs/workflow-catalog.md` to choose the workflow. Use `diff-review` for code correctness, `repo-audit` for maintainability and project health, `implementation-plan` for plan quality, `research-crosscheck` for factual/source discipline, `release-review` for ship readiness, and `doc-refresh` only for gated documentation writes. +Use `docs/workflow-catalog.md` to choose the workflow. Use `diff-review` for code correctness, `repo-audit` for maintainability and project health, `implementation-plan` for plan quality, `research-crosscheck` for factual/source discipline, `release-review` for ship readiness, `doc-refresh` only for gated documentation writes, and `dynamic-js` for approved local JavaScript orchestration, generated previews, or SHA-trusted templates. Prefer `cwf run diff-review --target ` when the local workflow registry can resolve it. Direct path usage remains supported with `cwf run workflows/diff-review.yaml --target `. diff --git a/src/adapters/codex-worker.ts b/src/adapters/codex-worker.ts index 5b94441..a4fc231 100644 --- a/src/adapters/codex-worker.ts +++ b/src/adapters/codex-worker.ts @@ -35,6 +35,9 @@ export const WORKER_OUTPUT_SCHEMA = { }, }; +export type CodexWorkerSandboxMode = "read-only" | "workspace-write" | "danger-full-access"; +export type CodexWorkerApprovalPolicy = "never" | "on-request" | "on-failure" | "untrusted"; + export type RunCodexWorkerOptions = { target: string; timeoutMs: number; @@ -42,6 +45,8 @@ export type RunCodexWorkerOptions = { requestedAdapter?: WorkerAdapterName; fallbackUsed?: boolean; fallbackReason?: string; + sandboxMode?: CodexWorkerSandboxMode; + approvalPolicy?: CodexWorkerApprovalPolicy; }; export async function runCodexWorker( @@ -51,7 +56,9 @@ export async function runCodexWorker( ): Promise { const started = Date.now(); const startedAt = new Date(started).toISOString(); - const prompt = buildWorkerPrompt(worker, context); + const sandboxMode = options.sandboxMode ?? "read-only"; + const approvalPolicy = options.approvalPolicy ?? "never"; + const prompt = buildWorkerPrompt(worker, context, sandboxMode); try { const codex = new Codex({ @@ -59,8 +66,8 @@ export async function runCodexWorker( }); const thread = codex.startThread({ workingDirectory: options.target, - sandboxMode: "read-only", - approvalPolicy: "never", + sandboxMode, + approvalPolicy, modelReasoningEffort: "low", webSearchMode: "disabled", webSearchEnabled: false, @@ -182,6 +189,8 @@ function isFinding(value: unknown): boolean { } function buildSdkRuntime(worker: WorkflowWorker, options: RunCodexWorkerOptions): WorkerResult["runtime"] { + const sandboxMode = options.sandboxMode ?? "read-only"; + const approvalPolicy = options.approvalPolicy ?? "never"; return { adapter: "codex-sdk-headless", requested_adapter: options.requestedAdapter, @@ -190,13 +199,14 @@ function buildSdkRuntime(worker: WorkflowWorker, options: RunCodexWorkerOptions) fallback_reason: options.fallbackReason, agent_role: worker.perspective || worker.id, transcript_read: false, - sandbox: "read-only", - approval_policy: "never", + sandbox: sandboxMode, + approval_policy: approvalPolicy, }; } -export function buildWorkerPrompt(worker: WorkflowWorker, context: DiffContext): string { - return `You are a read-only Codex worker inside the public codex-workflows diff-review MVP. +export function buildWorkerPrompt(worker: WorkflowWorker, context: DiffContext, sandboxMode: CodexWorkerSandboxMode = "read-only"): string { + const canWrite = sandboxMode !== "read-only"; + return `You are a ${canWrite ? "write-capable" : "read-only"} Codex worker inside the public codex-workflows runtime. Worker id: ${worker.id} Perspective: ${worker.perspective} @@ -205,7 +215,7 @@ Task: ${worker.prompt} Rules: -- Do not modify files. +- ${canWrite ? "Modify files only when the task explicitly asks for it, and keep changes scoped to the worker prompt." : "Do not modify files."} - Review only the supplied context and diff. - Prefer concrete findings with file/diff evidence. - Do not invent line numbers if the diff does not provide them. diff --git a/src/adapters/worker-adapter.ts b/src/adapters/worker-adapter.ts index 0df5c5b..be34fa8 100644 --- a/src/adapters/worker-adapter.ts +++ b/src/adapters/worker-adapter.ts @@ -1,5 +1,8 @@ +import { open, realpath, stat } from "node:fs/promises"; +import { homedir } from "node:os"; +import { basename, join, sep } from "node:path"; import { setTimeout as sleep } from "node:timers/promises"; -import { checkDesktopCapability, createDefaultAppServerTransport, type AppServerTransport } from "../desktop-bridge.js"; +import { checkDesktopCapability, createDefaultAppServerTransport, createStdioAppServerTransport, type AppServerTransport } from "../desktop-bridge.js"; import type { DesktopCapabilitySummary, DiffContext, WorkerAdapterName, WorkerResult, WorkflowRuntime, WorkflowWorker } from "../types.js"; import { buildWorkerPrompt, parseWorkerOutput, runCodexWorker, type RunCodexWorkerOptions } from "./codex-worker.js"; @@ -58,6 +61,10 @@ export const defaultWorkerAdapters: WorkerAdapterRegistry = { const appThreadExecutionProbes = new Map>(); const appServerFactoryIds = new WeakMap, number>(); let nextAppServerFactoryId = 1; +const defaultAppThreadModel = "gpt-5.3-codex-spark"; +const defaultAppThreadReasoningEffort = "low"; +const appThreadProbeText = 'Return exactly {"probe":"cwf-app-thread-ok"} and nothing else.'; +const appThreadDiagnosticsMaxBytes = 1_048_576; export async function runWorkerWithAdapter( worker: WorkflowWorker, @@ -175,7 +182,8 @@ async function runCodexAppThreadWorker( } await ensureAppThreadExecutionAvailable(context, options, codexPath); - const appServer = options.appServer ?? (options.appServerFactory ?? (() => createDefaultAppServerTransport()))(codexPath); + const appServer = options.appServer ?? (options.appServerFactory ?? createAppThreadAppServerTransport)(codexPath); + const modelOverride = appThreadModelOverride(); const started = Date.now(); const startedAt = new Date(started).toISOString(); @@ -191,14 +199,14 @@ async function runCodexAppThreadWorker( await appServerRequestWithTimeout(appServer, "initialize", buildInitializeParams(), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); await appServerNotifyWithTimeout(appServer, "initialized", remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); - const thread = await appServerRequestWithTimeout(appServer, "thread/start", buildWorkerThreadStartParams(context), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); + const thread = await appServerRequestWithTimeout(appServer, "thread/start", buildWorkerThreadStartParams(context, modelOverride), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); threadId = extractId(thread, "thread"); if (!threadId) { throw new Error("thread/start did not return thread.id"); } await appServerRequestWithTimeout(appServer, "thread/name/set", buildWorkerThreadNameParams(threadId, worker, options), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); - const turn = await appServerRequestWithTimeout(appServer, "turn/start", buildWorkerTurnStartParams(threadId, prompt, context), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); + const turn = await appServerRequestWithTimeout(appServer, "turn/start", buildWorkerTurnStartParams(threadId, prompt, context, modelOverride), remainingWorkerTimeoutMs(deadline, options.timeoutMs, requestTimeoutMs)); turnId = extractId(turn, "turn"); const directRaw = extractWorkerRaw(turn, turnId) ?? ""; const readResult = await readAppThreadWorkerRaw( @@ -248,6 +256,9 @@ async function runCodexAppThreadWorker( transcript_read: transcriptRead, sandbox: "read-only", approval_policy: "never", + model: modelOverride.model, + model_provider: modelOverride.modelProvider, + reasoning_effort: modelOverride.reasoningEffort, result_return_path: "worker-envelope", }, }; @@ -284,6 +295,9 @@ async function runCodexAppThreadWorker( transcript_read: transcriptRead, sandbox: "read-only", approval_policy: "never", + model: modelOverride.model, + model_provider: modelOverride.modelProvider, + reasoning_effort: modelOverride.reasoningEffort, result_return_path: "worker-envelope", }, }; @@ -311,7 +325,8 @@ async function ensureAppThreadExecutionAvailable(context: DiffContext, options: } async function runAppThreadExecutionProbe(context: DiffContext, options: WorkerAdapterOptions, codexPath: string): Promise { - const appServer = (options.appServerFactory ?? (() => createDefaultAppServerTransport()))(codexPath); + const appServer = (options.appServerFactory ?? createAppThreadAppServerTransport)(codexPath); + const modelOverride = appThreadModelOverride(); const probeTimeoutMs = appThreadProbeTimeoutMs(options); const deadline = Date.now() + probeTimeoutMs; let threadId: string | undefined; @@ -325,7 +340,8 @@ async function runAppThreadExecutionProbe(context: DiffContext, options: WorkerA sandbox: "read-only", ephemeral: false, threadSource: "user", - baseInstructions: "You are a Codex Flow app-thread execution probe. Reply with the requested tiny JSON only.", + baseInstructions: "You are a Codex Flow app-thread execution probe. Return only the exact JSON object requested by the user.", + ...buildThreadModelParams(modelOverride), }, remainingProbeTimeoutMs(deadline, probeTimeoutMs)); threadId = extractId(thread, "thread"); if (!threadId) { @@ -337,10 +353,12 @@ async function runAppThreadExecutionProbe(context: DiffContext, options: WorkerA }, remainingProbeTimeoutMs(deadline, probeTimeoutMs)); const turn = await appServerRequestWithTimeout(appServer, "turn/start", { threadId, - input: [{ type: "text", text: "{\"probe\":\"cwf-app-thread-ok\"}", text_elements: [] }], + input: [{ type: "text", text: appThreadProbeText, text_elements: [] }], cwd: context.target, approvalPolicy: "never", sandboxPolicy: { type: "readOnly", networkAccess: false }, + outputSchema: appThreadProbeOutputSchema(), + ...buildTurnModelParams(modelOverride), }, remainingProbeTimeoutMs(deadline, probeTimeoutMs)); turnId = extractId(turn, "turn"); if (!turnId) { @@ -403,8 +421,15 @@ function remainingProbeTimeoutMs(deadline: number, originalTimeoutMs: number): n } function appThreadProbeCacheKey(codexPath: string, appServerFactory: WorkerAdapterOptions["appServerFactory"]): string { + const modelOverride = appThreadModelOverride(); + const transport = appServerFactory ? "factory" : appThreadTransportMode(); + const modelKey = [ + modelOverride.model ? `model=${modelOverride.model}` : undefined, + modelOverride.modelProvider ? `provider=${modelOverride.modelProvider}` : undefined, + modelOverride.reasoningEffort ? `effort=${modelOverride.reasoningEffort}` : undefined, + ].filter(Boolean).join(","); if (!appServerFactory) { - return `${codexPath}:default`; + return `${codexPath}:${transport}:${modelKey || "host-default"}`; } let id = appServerFactoryIds.get(appServerFactory); if (!id) { @@ -412,7 +437,18 @@ function appThreadProbeCacheKey(codexPath: string, appServerFactory: WorkerAdapt nextAppServerFactoryId += 1; appServerFactoryIds.set(appServerFactory, id); } - return `${codexPath}:factory:${id}`; + return `${codexPath}:factory:${id}:${modelKey || "host-default"}`; +} + +function createAppThreadAppServerTransport(codexPath: string): AppServerTransport { + return appThreadTransportMode() === "daemon" + ? createDefaultAppServerTransport() + : createStdioAppServerTransport(codexPath); +} + +function appThreadTransportMode(): "stdio" | "daemon" { + const value = process.env.CWF_APP_THREAD_TRANSPORT?.trim().toLowerCase(); + return value === "daemon" || value === "socket" || value === "control-socket" ? "daemon" : "stdio"; } async function readAppThreadProbeRaw( @@ -427,6 +463,8 @@ async function readAppThreadProbeRaw( return directRaw; } let lastError: string | undefined; + let lastDiagnostics: string | undefined; + const diagnosticsCache = new Map(); while (Date.now() <= deadline) { try { const read = await appServerRequestWithTimeout( @@ -440,15 +478,18 @@ async function readAppThreadProbeRaw( if (readRaw) { return readRaw; } + lastDiagnostics = await extractAppThreadReadDiagnostics(read, turnId, diagnosticsCache) ?? lastDiagnostics; } catch (error) { - lastError = error instanceof Error ? error.message : String(error); + lastError = safeAppServerError(error); } const remaining = deadline - Date.now(); if (remaining > 0) { await sleep(Math.min(250, remaining)); } } - throw new Error(`model execution channel did not return a readable assistant response${lastError ? `; last thread/read error: ${lastError}` : ""}`); + throw new Error( + `model execution channel did not return a readable assistant response${lastError ? `; last thread/read error: ${lastError}` : ""}${lastDiagnostics ? `; diagnostics: ${lastDiagnostics}` : ""}`, + ); } async function appServerRequestWithTimeout( @@ -527,6 +568,17 @@ function assertAppThreadProbeResponse(raw: string, threadId: string | undefined, ); } +function appThreadProbeOutputSchema(): Record { + return { + type: "object", + additionalProperties: false, + required: ["probe"], + properties: { + probe: { type: "string", const: "cwf-app-thread-ok" }, + }, + }; +} + function buildInitializeParams(): Record { return { clientInfo: { name: "codex-flow", title: "Codex Flow", version: "1.7.0" }, @@ -538,7 +590,43 @@ function buildInitializeParams(): Record { }; } -function buildWorkerThreadStartParams(context: DiffContext): Record { +type AppThreadModelOverride = { + model?: string; + modelProvider?: string; + reasoningEffort?: string; +}; + +function appThreadModelOverride(): AppThreadModelOverride { + const model = cleanEnvString("CWF_APP_THREAD_MODEL"); + const reasoningEffort = cleanEnvString("CWF_APP_THREAD_REASONING_EFFORT"); + const useHostDefault = model === "host-default" || model === "default"; + return { + model: useHostDefault ? undefined : model ?? defaultAppThreadModel, + modelProvider: cleanEnvString("CWF_APP_THREAD_MODEL_PROVIDER"), + reasoningEffort: useHostDefault && !reasoningEffort ? undefined : reasoningEffort ?? defaultAppThreadReasoningEffort, + }; +} + +function cleanEnvString(name: string): string | undefined { + const value = process.env[name]?.trim(); + return value ? value : undefined; +} + +function buildThreadModelParams(modelOverride: AppThreadModelOverride): Record { + return { + ...(modelOverride.model ? { model: modelOverride.model } : {}), + ...(modelOverride.modelProvider ? { modelProvider: modelOverride.modelProvider } : {}), + }; +} + +function buildTurnModelParams(modelOverride: AppThreadModelOverride): Record { + return { + ...(modelOverride.model ? { model: modelOverride.model } : {}), + ...(modelOverride.reasoningEffort ? { effort: modelOverride.reasoningEffort } : {}), + }; +} + +function buildWorkerThreadStartParams(context: DiffContext, modelOverride: AppThreadModelOverride): Record { return { cwd: context.target, approvalPolicy: "never", @@ -546,6 +634,7 @@ function buildWorkerThreadStartParams(context: DiffContext): Record { +function buildWorkerTurnStartParams(threadId: string, prompt: string, context: DiffContext, modelOverride: AppThreadModelOverride): Record { return { threadId, input: [{ type: "text", text: prompt, text_elements: [] }], cwd: context.target, approvalPolicy: "never", sandboxPolicy: { type: "readOnly", networkAccess: false }, + ...buildTurnModelParams(modelOverride), }; } @@ -580,6 +670,8 @@ async function readAppThreadWorkerRaw( } const deadline = Date.now() + Math.max(1, Math.min(timeoutMs, timeoutEnvMs("CWF_APP_THREAD_RESULT_TIMEOUT_MS", 120000))); let lastError: string | undefined; + let lastDiagnostics: string | undefined; + const diagnosticsCache = new Map(); while (Date.now() <= deadline) { try { const read = await appServerRequestWithTimeout( @@ -593,8 +685,9 @@ async function readAppThreadWorkerRaw( if (readRaw) { return { raw: readRaw, transcriptRead: true }; } + lastDiagnostics = await extractAppThreadReadDiagnostics(read, turnId, diagnosticsCache) ?? lastDiagnostics; } catch (error) { - lastError = error instanceof Error ? error.message : String(error); + lastError = safeAppServerError(error); } const remaining = deadline - Date.now(); if (remaining > 0) { @@ -602,9 +695,168 @@ async function readAppThreadWorkerRaw( } } if (lastError) { - throw new Error(formatAppThreadExecutionUnavailable(`worker read failed; last thread/read error: ${lastError}`, threadId, turnId)); + throw new Error(formatAppThreadExecutionUnavailable(`worker read failed; last thread/read error: ${lastError}${lastDiagnostics ? `; diagnostics: ${lastDiagnostics}` : ""}`, threadId, turnId)); + } + throw new Error(formatAppThreadExecutionUnavailable(`worker did not return an assistant response${lastDiagnostics ? `; diagnostics: ${lastDiagnostics}` : ""}`, threadId, turnId)); +} + +async function extractAppThreadReadDiagnostics(value: unknown, turnId?: string, diagnosticsCache = new Map()): Promise { + if (!value || typeof value !== "object") { + return undefined; + } + const record = value as Record; + const thread = record.thread && typeof record.thread === "object" + ? record.thread as Record + : record; + const parts: string[] = []; + const status = thread.status; + if (status && typeof status === "object") { + const type = (status as { type?: unknown }).type; + if (typeof type === "string") { + parts.push(`thread_status=${type}`); + } + } + const turns = thread.turns; + if (Array.isArray(turns)) { + parts.push(`turns=${turns.length}`); + } + const path = typeof thread.path === "string" ? thread.path : undefined; + if (path) { + const cacheKey = `${path}:${turnId ?? ""}`; + let sessionDiagnostics: string | undefined; + if (diagnosticsCache.has(cacheKey)) { + sessionDiagnostics = diagnosticsCache.get(cacheKey); + } else { + sessionDiagnostics = await extractAppThreadSessionDiagnostics(path, turnId); + if (sessionDiagnostics && isTerminalSessionDiagnostics(sessionDiagnostics)) { + diagnosticsCache.set(cacheKey, sessionDiagnostics); + } + } + if (sessionDiagnostics) { + parts.push(sessionDiagnostics); + } } - throw new Error(formatAppThreadExecutionUnavailable("worker did not return an assistant response", threadId, turnId)); + return parts.length > 0 ? parts.join("; ") : undefined; +} + +function isTerminalSessionDiagnostics(diagnostics: string): boolean { + return diagnostics.includes("quota_unavailable=") || diagnostics.includes("last_agent_message="); +} + +function safeAppServerError(error: unknown): string { + const message = error instanceof Error ? error.message : String(error); + if (/timed out after \d+ms/.test(message)) { + return message; + } + return "thread-read-failed"; +} + +async function extractAppThreadSessionDiagnostics(path: string, turnId?: string): Promise { + const safePath = await safeCodexSessionLogPath(path); + if (!safePath) { + return path ? `session_log=${sessionLogName(path)}` : undefined; + } + try { + const text = await readSessionLogTail(safePath); + let model: string | undefined; + let effort: string | undefined; + let credits: string | undefined; + let lastAgentMessage: string | undefined; + for (const line of text.split("\n")) { + if (!line.trim()) { + continue; + } + let entry: unknown; + try { + entry = JSON.parse(line); + } catch { + continue; + } + if (!entry || typeof entry !== "object") { + continue; + } + const event = entry as { type?: unknown; payload?: Record }; + const payload = event.payload; + if (!payload || typeof payload !== "object") { + continue; + } + if (turnId && payload.turn_id !== turnId) { + continue; + } + if (event.type === "turn_context") { + model = typeof payload.model === "string" ? payload.model : model; + effort = typeof payload.effort === "string" ? payload.effort : effort; + } + if (event.type === "event_msg" && payload.type === "token_count") { + const rateLimits = payload.rate_limits; + if (rateLimits && typeof rateLimits === "object") { + const rate = rateLimits as { credits?: { has_credits?: unknown } }; + if (rate.credits?.has_credits === false) { + credits = "quota_unavailable=true"; + } + } + } + if (event.type === "event_msg" && payload.type === "task_complete") { + lastAgentMessage = payload.last_agent_message === null + ? "null" + : typeof payload.last_agent_message === "string" ? "present" : lastAgentMessage; + } + } + const parts = [ + model ? `model=${model}` : undefined, + effort ? `effort=${effort}` : undefined, + credits, + lastAgentMessage ? `last_agent_message=${lastAgentMessage}` : undefined, + `session_log=${sessionLogName(safePath)}`, + ].filter(Boolean); + return parts.length > 0 ? parts.join("; ") : undefined; + } catch { + return safePath ? `session_log=${sessionLogName(safePath)}` : undefined; + } +} + +async function safeCodexSessionLogPath(path: string): Promise { + if (!path.endsWith(".jsonl")) { + return undefined; + } + try { + const root = await realpath(join(process.env.CODEX_HOME || join(homedir(), ".codex"), "sessions")); + const resolved = await realpath(path); + const info = await stat(resolved); + if (!info.isFile()) { + return undefined; + } + if (resolved === root || resolved.startsWith(`${root}${sep}`)) { + return resolved; + } + return undefined; + } catch { + return undefined; + } +} + +async function readSessionLogTail(path: string): Promise { + const info = await stat(path); + const maxBytes = Math.min(timeoutEnvMs("CWF_APP_THREAD_DIAGNOSTICS_MAX_BYTES", appThreadDiagnosticsMaxBytes), appThreadDiagnosticsMaxBytes); + const length = Math.min(info.size, maxBytes); + const start = Math.max(0, info.size - length); + const buffer = Buffer.alloc(length); + const file = await open(path, "r"); + try { + await file.read(buffer, 0, length, start); + } finally { + await file.close(); + } + const text = buffer.toString("utf8"); + if (start === 0) { + return text; + } + const firstNewline = text.indexOf("\n"); + return firstNewline >= 0 ? text.slice(firstNewline + 1) : text; +} + +function sessionLogName(path: string): string { + return basename(path) || path; } function timeoutEnvMs(name: string, fallbackMs: number): number { diff --git a/src/cli.ts b/src/cli.ts index 7c3958a..4eb417c 100644 --- a/src/cli.ts +++ b/src/cli.ts @@ -6,6 +6,15 @@ import { fileURLToPath } from "node:url"; import { spawn } from "node:child_process"; import { setTimeout as sleep } from "node:timers/promises"; import { checkDesktopCapability, formatDesktopCheck, handleDesktopResult } from "./desktop-bridge.js"; +import { resumeDynamicWorkflow, startDynamicWorkflow, type DynamicWorkflowOrigin, type ParentPermissionCap } from "./dynamic-workflow.js"; +import { generateDynamicWorkflowFromIntent } from "./dynamic-workflow-generator.js"; +import { + formatDynamicWorkflowList, + formatDynamicWorkflowShow, + listDynamicWorkflowEntries, + resolveDynamicWorkflowReference, + saveDynamicWorkflow, +} from "./dynamic-workflow-registry.js"; import { handleGithubPr, type GitHubPrFormat } from "./github-pr.js"; import { loadWorkflowSpec } from "./workflow-loader.js"; import { executeWorkflow, runWorkflow } from "./phase-engine.js"; @@ -27,6 +36,7 @@ type ParsedArgs = { command?: string; workflowPath?: string; workflowSubcommand?: string; + dynamicSubcommand?: string; desktopSubcommand?: string; workflowRef?: string; target?: string; @@ -49,6 +59,11 @@ type ParsedArgs = { goal?: string; fromRunId?: string; output?: string; + id?: string; + origin?: DynamicWorkflowOrigin; + parentSandbox?: ParentPermissionCap["sandbox"]; + parentApproval?: ParentPermissionCap["approval_policy"]; + approve?: boolean; }; async function main(argv: string[]): Promise { @@ -124,6 +139,98 @@ async function main(argv: string[]): Promise { throw new Error("Usage: cwf workflows [workflow-id-or-path]"); } + if (args.command === "dynamic") { + if (args.dynamicSubcommand === "list") { + console.log(formatDynamicWorkflowList(await listDynamicWorkflowEntries())); + return; + } + if (args.dynamicSubcommand === "show") { + if (!args.workflowRef) { + throw new Error("Usage: cwf dynamic show "); + } + const resolved = await resolveDynamicWorkflowReference(args.workflowRef); + console.log(formatDynamicWorkflowShow(resolved.entry)); + return; + } + if (args.dynamicSubcommand === "save") { + if (!args.workflowRef || !args.id) { + throw new Error("Usage: cwf dynamic save --id "); + } + const entry = await saveDynamicWorkflow({ sourcePath: args.workflowRef, id: args.id }); + console.log(`Saved dynamic workflow: ${entry.id}`); + console.log(`Path: ${entry.path}`); + console.log(`SHA-256: ${entry.source_sha256}`); + console.log(`Run: cwf dynamic run ${entry.id} --target `); + return; + } + if (args.dynamicSubcommand === "generate") { + if (!args.goal) { + throw new Error('Usage: cwf dynamic generate --goal "" --target [--output ]'); + } + if (!args.target) { + throw new Error('Please pass --target . Example: cwf dynamic generate --goal "Audit auth risks" --target .'); + } + const target = resolve(args.target); + await assertPathExists(target, "target repo"); + const generated = await generateDynamicWorkflowFromIntent({ goal: args.goal, output: args.output }); + const store = await startDynamicWorkflow({ + scriptPath: generated.path, + target, + origin: "generated-current-session", + parentPermissionCap: args.parentSandbox || args.parentApproval + ? { + sandbox: args.parentSandbox ?? "unknown", + approval_policy: args.parentApproval ?? "unknown", + } + : undefined, + preview: generated.preview, + }); + console.log(`Generated: ${generated.path}`); + console.log(`Run ID: ${store.runId}`); + console.log(`Run dir: ${store.runDir}`); + console.log(`Preview: ${store.runDir}/artifacts/dynamic-preview.md`); + console.log(`Approve: cwf approve ${store.runId} approve-dynamic`); + console.log(`Resume: cwf resume ${store.runId}`); + return; + } + if (args.dynamicSubcommand !== "run") { + throw new Error('Usage: cwf dynamic ...'); + } + if (!args.workflowPath) { + throw new Error("Usage: cwf dynamic run --target [--approve]"); + } + if (!args.target) { + throw new Error("Please pass --target . Example: cwf dynamic run workflow.js --target ."); + } + const target = resolve(args.target); + await assertPathExists(target, "target repo"); + const resolvedDynamic = await resolveDynamicWorkflowReference(args.workflowPath); + const store = await startDynamicWorkflow({ + scriptPath: resolvedDynamic.path, + target, + origin: args.origin ?? resolvedDynamic.origin, + parentPermissionCap: args.parentSandbox || args.parentApproval + ? { + sandbox: args.parentSandbox ?? "unknown", + approval_policy: args.parentApproval ?? "unknown", + } + : undefined, + approve: Boolean(args.approve), + }); + console.log(`Run ID: ${store.runId}`); + console.log(`Run dir: ${store.runDir}`); + const state = await store.readState(); + if (state.status === "waiting") { + console.log(`Preview: ${store.runDir}/artifacts/dynamic-preview.md`); + console.log(`Approve: cwf approve ${store.runId} approve-dynamic`); + console.log(`Resume: cwf resume ${store.runId}`); + } else { + console.log(`Status: ${state.status}`); + console.log(`Result: cwf result ${store.runId}`); + } + return; + } + if (args.command === "__run") { if (!args.runId || !args.workflowPath || !args.target) { throw new Error("Usage: cwf __run --target "); @@ -291,6 +398,13 @@ async function main(argv: string[]): Promise { const store = RunStore.fromRunId(args.runId); const spec = await store.readWorkflow(); const state = await store.readState(); + if (spec.id === "dynamic-js") { + await resumeDynamicWorkflow({ store, target: state.target }); + const nextState = await store.readState(); + const workerResults = await readWorkerResults(store.runDir); + console.log(formatStatus(nextState, workerResults)); + return; + } await executeWorkflow({ spec, specPath: `${store.runDir}/workflow.json`, target: state.target, store, resume: true }); const nextState = await store.readState(); const workerResults = await readWorkerResults(store.runDir); @@ -360,6 +474,40 @@ function parseArgs(argv: string[]): ParsedArgs { } else if (command === "workflows") { parsed.workflowSubcommand = first; parsed.workflowRef = second; + } else if (command === "dynamic") { + parsed.dynamicSubcommand = first; + if (first === "run") { + parsed.workflowPath = second; + } else if (first === "show" || first === "save") { + parsed.workflowRef = second; + } + for (let index = 0; index < rest.length; index += 1) { + const token = rest[index]; + if (token === "--target") { + parsed.target = rest[index + 1]; + index += 1; + } else if (token === "--goal") { + parsed.goal = rest[index + 1]; + index += 1; + } else if (token === "--output") { + parsed.output = rest[index + 1]; + index += 1; + } else if (token === "--id") { + parsed.id = rest[index + 1]; + index += 1; + } else if (token === "--approve") { + parsed.approve = true; + } else if (token === "--origin") { + parsed.origin = parseDynamicOrigin(rest[index + 1]); + index += 1; + } else if (token === "--parent-sandbox") { + parsed.parentSandbox = parseParentSandbox(rest[index + 1]); + index += 1; + } else if (token === "--parent-approval") { + parsed.parentApproval = parseParentApproval(rest[index + 1]); + index += 1; + } + } } else if (command === "__run") { parsed.runId = first; parsed.workflowPath = second; @@ -476,6 +624,11 @@ Usage: cwf workflows show cwf workflows validate [workflow-id-or-path] cwf run --target [--background] + cwf dynamic list + cwf dynamic show + cwf dynamic save --id + cwf dynamic generate --goal "" --target [--output ] + cwf dynamic run --target [--approve] cwf desktop check cwf desktop result [--thread ] [--new-thread] [--print] cwf github-pr [--format comment|review] [--post --repo --pr ] @@ -496,6 +649,9 @@ Common flow: cwf validate workflows/diff-review.yaml cwf workflows list cwf run diff-review --target . --background + cwf dynamic list + cwf dynamic generate --goal "Audit this repo for auth risks" --target . + cwf dynamic run change-summary --target . cwf watch cwf latest cwf result @@ -763,6 +919,30 @@ function parseStatus(value?: string): PhaseStatus { return value as PhaseStatus; } +function parseDynamicOrigin(value?: string): DynamicWorkflowOrigin { + const origins: DynamicWorkflowOrigin[] = ["generated-current-session", "local-trust-record", "copied-local", "remote", "registry", "packaged", "unknown"]; + if (!value || !origins.includes(value as DynamicWorkflowOrigin)) { + throw new Error(`Invalid --origin value: ${value ?? ""}`); + } + return value as DynamicWorkflowOrigin; +} + +function parseParentSandbox(value?: string): ParentPermissionCap["sandbox"] { + const sandboxes: ParentPermissionCap["sandbox"][] = ["read-only", "workspace-write", "danger-full-access", "unknown"]; + if (!value || !sandboxes.includes(value as ParentPermissionCap["sandbox"])) { + throw new Error(`Invalid --parent-sandbox value: ${value ?? ""}`); + } + return value as ParentPermissionCap["sandbox"]; +} + +function parseParentApproval(value?: string): ParentPermissionCap["approval_policy"] { + const policies: ParentPermissionCap["approval_policy"][] = ["never", "on-request", "on-failure", "untrusted", "unknown"]; + if (!value || !policies.includes(value as ParentPermissionCap["approval_policy"])) { + throw new Error(`Invalid --parent-approval value: ${value ?? ""}`); + } + return value as ParentPermissionCap["approval_policy"]; +} + function parseGithubFormat(value?: string): GitHubPrFormat { if (value !== "comment" && value !== "review") { throw new Error(`Invalid --format value: ${value ?? ""}. Expected comment or review.`); diff --git a/src/desktop-bridge.ts b/src/desktop-bridge.ts index f0af429..3280138 100644 --- a/src/desktop-bridge.ts +++ b/src/desktop-bridge.ts @@ -12,6 +12,7 @@ export const DESKTOP_REQUIRED_METHODS = [ "thread/start", "thread/name/set", "thread/list", + "thread/read", "turn/start", ] as const; @@ -54,6 +55,10 @@ export function createDefaultAppServerTransport(): AppServerTransport { return new UnixSocketWebSocketAppServerTransport(); } +export function createStdioAppServerTransport(codexPath = process.env.CWF_CODEX_PATH || "codex"): AppServerTransport { + return new StdioAppServerTransport(codexPath); +} + export async function checkDesktopCapability(codexPath = process.env.CWF_CODEX_PATH || "codex"): Promise { const methods = Object.fromEntries(DESKTOP_REQUIRED_METHODS.map((method) => [method, false])) as Record; let codexCliVersion: string | undefined; @@ -612,6 +617,148 @@ class UnixSocketWebSocketAppServerTransport implements AppServerTransport { } } +class StdioAppServerTransport implements AppServerTransport { + private nextId = 1; + private readonly child: ReturnType; + private readonly pending = new Map void; reject: (error: Error) => void }>(); + private buffer = ""; + private closed = false; + + constructor(codexPath: string) { + this.child = spawn(codexPath, ["app-server"], { stdio: ["pipe", "pipe", "pipe"] }); + const stdout = this.child.stdout; + const stderr = this.child.stderr; + if (!stdout || !stderr || !this.child.stdin) { + throw new Error("app-server stdio pipes are unavailable"); + } + stdout.setEncoding("utf8"); + stdout.on("data", (chunk: string) => this.onStdout(chunk)); + stderr.setEncoding("utf8"); + stderr.on("data", () => {}); + this.child.on("error", (error) => this.fail(error)); + this.child.on("close", (code) => { + this.closed = true; + if (this.pending.size > 0) { + this.fail(new Error(`app-server stdio exited before completing pending requests${code === null ? "" : `: exit ${code}`}`)); + } + }); + } + + request(method: string, params?: unknown): Promise { + const id = this.nextId++; + const response = new Promise((resolve, reject) => { + this.pending.set(id, { + resolve: (value) => { + resolve(value); + }, + reject: (error) => { + reject(error); + }, + }); + }); + try { + this.writeJson({ id, method, params }); + } catch (error) { + this.pending.delete(id); + return Promise.reject(error instanceof Error ? error : new Error(String(error))); + } + return response; + } + + notify(method: string, params?: unknown): void { + this.writeJson({ method, params }); + } + + async close(): Promise { + if (this.closed) { + return; + } + const closed = new Promise((resolve) => { + this.child.once("close", () => resolve()); + }); + this.child.stdin?.end(); + const terminateTimer = setTimeout(() => { + if (!this.closed) { + this.child.kill(); + } + }, 250); + const timeout = new Promise((resolve) => { + setTimeout(resolve, 1000); + }); + try { + await Promise.race([closed, timeout]); + if (!this.closed) { + this.child.kill("SIGKILL"); + await Promise.race([ + closed, + new Promise((resolve) => setTimeout(resolve, 250)), + ]); + } + } finally { + clearTimeout(terminateTimer); + } + } + + private writeJson(value: unknown): void { + const stdin = this.child.stdin; + if (this.closed || !stdin?.writable) { + throw new Error("app-server stdio is not writable"); + } + stdin.write(`${JSON.stringify(value)}\n`); + } + + private onStdout(chunk: string): void { + this.buffer += chunk; + if (Buffer.byteLength(this.buffer, "utf8") > APP_SERVER_MAX_FRAME_BYTES) { + this.buffer = ""; + this.fail(new Error(`app-server stdio output exceeds ${APP_SERVER_MAX_FRAME_BYTES} bytes without a complete response frame`)); + this.child.kill("SIGKILL"); + return; + } + for (;;) { + const newline = this.buffer.indexOf("\n"); + if (newline < 0) { + return; + } + const line = this.buffer.slice(0, newline).trim(); + this.buffer = this.buffer.slice(newline + 1); + if (!line) { + continue; + } + this.handleMessage(line); + } + } + + private handleMessage(raw: string): void { + let message: { id?: number; result?: unknown; error?: { message?: string } }; + try { + message = JSON.parse(raw) as { id?: number; result?: unknown; error?: { message?: string } }; + } catch { + return; + } + if (typeof message.id !== "number") { + return; + } + const waiter = this.pending.get(message.id); + if (!waiter) { + return; + } + this.pending.delete(message.id); + if (message.error) { + waiter.reject(new Error(message.error.message ?? JSON.stringify(message.error))); + } else { + waiter.resolve(message.result); + } + } + + private fail(error: Error): void { + for (const waiter of this.pending.values()) { + waiter.reject(error); + } + this.pending.clear(); + } +} + async function appendManifestArtifact(runDir: string, id: string, path: string, description: string): Promise { const manifestPath = join(runDir, "artifacts", "manifest.json"); try { diff --git a/src/dynamic-workflow-generator.ts b/src/dynamic-workflow-generator.ts new file mode 100644 index 0000000..498aa32 --- /dev/null +++ b/src/dynamic-workflow-generator.ts @@ -0,0 +1,120 @@ +import { mkdir, writeFile } from "node:fs/promises"; +import { dirname, join, resolve } from "node:path"; +import { CODEX_WORKFLOWS_ROOT } from "./run-index.js"; + +export type DynamicIntentPreview = { + goal: string; + agents: Array<{ + id: string; + role: string; + permissions: "read-only" | "safePatch" | "inherit-session"; + purpose: string; + }>; + write_intent: string; + stop_rules: string[]; +}; + +export type GenerateDynamicWorkflowOptions = { + goal: string; + output?: string; + suggestionsRoot?: string; + now?: Date; +}; + +export type GenerateDynamicWorkflowResult = { + path: string; + source: string; + preview: DynamicIntentPreview; +}; + +export async function generateDynamicWorkflowFromIntent(options: GenerateDynamicWorkflowOptions): Promise { + const goal = normalizeGoal(options.goal); + const preview = buildIntentPreview(goal); + const source = renderGeneratedWorkflowSource(preview); + const outputPath = options.output ? resolve(options.output) : defaultGeneratedWorkflowPath(goal, options.suggestionsRoot, options.now); + await mkdir(dirname(outputPath), { recursive: true }); + await writeFile(outputPath, source, { flag: "wx" }); + return { path: outputPath, source, preview }; +} + +export function buildIntentPreview(goal: string): DynamicIntentPreview { + return { + goal, + agents: [ + { + id: "intent-review", + role: "targeted repo reviewer", + permissions: "read-only", + purpose: "Review the current target diff against the user request and report findings without modifying files.", + }, + ], + write_intent: "read-only: generated Phase A workflows may inspect diff context and write run artifacts only; they do not modify the target repo.", + stop_rules: [ + "Pause at approve-dynamic before executing the generated workflow.js.", + "Fail before execution if AST policy rejects forbidden APIs or direct shell-like strings.", + "Fail the run if a read-only worker changes the target diff.", + ], + }; +} + +function renderGeneratedWorkflowSource(preview: DynamicIntentPreview): string { + const goalExpression = charCodeExpression(preview.goal); + return `export const metadata = ${JSON.stringify( + { + kind: "cwf-generated-dynamic-workflow", + version: 1, + generator: "intent-to-preview", + permissions: ["read-only"], + }, + null, + 2, + )}; + +export default async function workflow(cwf) { + const goal = ${goalExpression}; + const changedFiles = await cwf.git.changedFiles(); + const diff = await cwf.git.diff(); + await cwf.artifacts.write({ + name: "intent.md", + content: "# Dynamic Workflow Intent\\n\\n" + goal + "\\n\\n## Changed Files\\n\\n" + JSON.stringify(changedFiles, null, 2) + "\\n" + }); + const review = await cwf.agent.run({ + id: "intent-review", + role: "targeted repo reviewer", + permissions: "read-only", + prompt: "User request:\\n" + goal + "\\n\\nChanged files JSON:\\n" + JSON.stringify(changedFiles, null, 2) + "\\n\\nDiff:\\n" + diff + "\\n\\nReturn correctness, safety, and verification findings. Do not modify files." + }); + return cwf.report.summarize([review]); +} +`; +} + +function normalizeGoal(goal: string): string { + const normalized = goal.replace(/\s+/g, " ").trim(); + if (!normalized) { + throw new Error('Usage: cwf dynamic generate --goal "" --target [--output ]'); + } + if (normalized.length > 2000) { + throw new Error("dynamic workflow generation goal must be 2000 characters or fewer"); + } + return normalized; +} + +function defaultGeneratedWorkflowPath(goal: string, suggestionsRoot = join(CODEX_WORKFLOWS_ROOT, "dynamic"), now = new Date()): string { + const stamp = now.toISOString().replace(/[-:.TZ]/g, "").slice(0, 14); + return join(suggestionsRoot, `${stamp}-${slugify(goal)}.workflow.js`); +} + +function slugify(value: string): string { + const slug = value + .toLowerCase() + .replace(/[^a-z0-9]+/g, "-") + .replace(/^-+|-+$/g, "") + .slice(0, 48) + .replace(/-+$/g, ""); + return slug || "workflow"; +} + +function charCodeExpression(value: string): string { + return `String.fromCodePoint(${Array.from(value).map((char) => char.codePointAt(0) ?? 0).join(", ")})`; +} diff --git a/src/dynamic-workflow-registry.ts b/src/dynamic-workflow-registry.ts new file mode 100644 index 0000000..5f02cae --- /dev/null +++ b/src/dynamic-workflow-registry.ts @@ -0,0 +1,268 @@ +import { createHash } from "node:crypto"; +import { constants } from "node:fs"; +import { access, copyFile, mkdir, readdir, readFile, writeFile } from "node:fs/promises"; +import { homedir } from "node:os"; +import { basename, extname, isAbsolute, join, resolve } from "node:path"; +import { validateDynamicWorkflowSource, type DynamicWorkflowOrigin } from "./dynamic-workflow.js"; + +export type DynamicWorkflowRegistryOptions = { + cwd?: string; + homeDir?: string; +}; + +export type DynamicWorkflowTrustMetadata = { + id: string; + source_sha256: string; + origin: DynamicWorkflowOrigin; + saved_at: string; + source_path: string; +}; + +export type DynamicWorkflowEntry = { + id: string; + title: string; + version: string; + path: string; + search_path: string; + source_sha256: string; + origin: DynamicWorkflowOrigin; + trust_state: "packaged" | "local-trust-record" | "untrusted-local"; + capabilities: { + writes: boolean; + permissions: string[]; + }; +}; + +export type ResolvedDynamicWorkflow = { + entry: DynamicWorkflowEntry; + path: string; + origin: DynamicWorkflowOrigin; +}; + +export type SaveDynamicWorkflowOptions = DynamicWorkflowRegistryOptions & { + sourcePath: string; + id: string; + now?: Date; +}; + +export function dynamicWorkflowSearchPaths(options: DynamicWorkflowRegistryOptions = {}): string[] { + const cwd = options.cwd ?? process.cwd(); + const home = options.homeDir ?? homedir(); + return [resolve(cwd, "workflows", "dynamic"), resolve(cwd, ".codex-flow", "dynamic-workflows"), resolve(home, ".codex-workflows", "dynamic")]; +} + +export async function listDynamicWorkflowEntries(options: DynamicWorkflowRegistryOptions = {}): Promise { + const entries: DynamicWorkflowEntry[] = []; + for (const searchPath of dynamicWorkflowSearchPaths(options)) { + let files: string[]; + try { + files = await readdir(searchPath); + } catch { + continue; + } + for (const file of files) { + const path = join(searchPath, file); + if (!isDynamicWorkflowFile(path)) { + continue; + } + const source = await readFile(path, "utf8"); + validateDynamicWorkflowSource(source); + entries.push(await dynamicEntryFromSource(path, searchPath, source, options)); + } + } + assertUniqueDynamicWorkflowIds(entries); + return entries.sort((left, right) => left.id.localeCompare(right.id) || left.path.localeCompare(right.path)); +} + +export async function resolveDynamicWorkflowReference(reference: string, options: DynamicWorkflowRegistryOptions = {}): Promise { + if (looksLikeRemoteReference(reference)) { + throw new Error("Remote dynamic workflows cannot run directly by URL. Inspect and save a local trusted copy first."); + } + if (looksLikeDynamicWorkflowPath(reference)) { + const path = resolve(options.cwd ?? process.cwd(), reference); + const source = await readFile(path, "utf8"); + validateDynamicWorkflowSource(source); + const entry = await dynamicEntryFromSource(path, resolve(path, ".."), source, options); + return { entry, path, origin: entry.origin }; + } + const entries = await listDynamicWorkflowEntries(options); + const entry = entries.find((item) => item.id === reference); + if (!entry) { + throw new Error(`Unknown dynamic workflow id: ${reference}. Try: cwf dynamic list`); + } + if (entry.trust_state === "untrusted-local") { + throw new Error(`Dynamic workflow ${reference} is untrusted-local. Run it by explicit path or save it with cwf dynamic save before using its id.`); + } + return { entry, path: entry.path, origin: entry.origin }; +} + +export async function saveDynamicWorkflow(options: SaveDynamicWorkflowOptions): Promise { + const sourcePath = resolve(options.sourcePath); + const source = await readFile(sourcePath, "utf8"); + validateDynamicWorkflowSource(source); + const id = normalizeDynamicWorkflowId(options.id); + const root = join(options.homeDir ?? homedir(), ".codex-workflows", "dynamic"); + await mkdir(root, { recursive: true }); + const destination = join(root, `${id}.workflow.js`); + await copyFile(sourcePath, destination, constants.COPYFILE_EXCL); + const sha = sha256(source); + const trust: DynamicWorkflowTrustMetadata = { + id, + source_sha256: sha, + origin: "local-trust-record", + saved_at: (options.now ?? new Date()).toISOString(), + source_path: sourcePath, + }; + await writeFile(join(root, `${id}.trust.json`), `${JSON.stringify(trust, null, 2)}\n`, { flag: "wx" }); + return dynamicEntryFromSource(destination, root, source, options); +} + +export function formatDynamicWorkflowList(entries: DynamicWorkflowEntry[]): string { + if (entries.length === 0) { + return "No dynamic workflows found."; + } + const lines = ["Dynamic workflow ID Version Trust Title Path"]; + for (const entry of entries) { + lines.push(`${pad(entry.id, 30)} ${pad(entry.version, 14)} ${pad(entry.trust_state, 21)} ${pad(entry.title, 29)} ${entry.path}`); + } + return lines.join("\n"); +} + +export function formatDynamicWorkflowShow(entry: DynamicWorkflowEntry): string { + return [ + `Dynamic workflow ID: ${entry.id}`, + `Title: ${entry.title}`, + `Version: ${entry.version}`, + `Path: ${entry.path}`, + `Origin: ${entry.origin}`, + `Trust: ${entry.trust_state}`, + `SHA-256: ${entry.source_sha256}`, + `Capabilities: writes=${entry.capabilities.writes}`, + `Permissions: ${entry.capabilities.permissions.length > 0 ? entry.capabilities.permissions.join(", ") : "read-only"}`, + ].join("\n"); +} + +async function dynamicEntryFromSource( + path: string, + searchPath: string, + source: string, + options: DynamicWorkflowRegistryOptions, +): Promise { + const sha = sha256(source); + const declared = parseDeclaredMetadata(source); + const trust = await readTrustMetadata(path); + if (trust && trust.source_sha256 !== sha) { + throw new Error(`Dynamic workflow trust metadata SHA mismatch for ${path}`); + } + const packaged = resolve(searchPath) === resolve(options.cwd ?? process.cwd(), "workflows", "dynamic"); + const id = trust?.id ?? normalizeDynamicWorkflowId(stringValue(declared.id) ?? basename(path).replace(/\.workflow\.js$/, "")); + const permissions = Array.isArray(declared.permissions) + ? declared.permissions.filter((value): value is string => typeof value === "string") + : requestedPermissions(source); + return { + id, + title: stringValue(declared.title) ?? titleFromId(id), + version: stringValue(declared.version) ?? "1.0.0", + path, + search_path: searchPath, + source_sha256: sha, + origin: trust?.origin ?? (packaged ? "packaged" : "copied-local"), + trust_state: trust ? "local-trust-record" : packaged ? "packaged" : "untrusted-local", + capabilities: { + writes: permissions.includes("safePatch") || permissions.includes("inherit-session"), + permissions, + }, + }; +} + +async function readTrustMetadata(path: string): Promise { + const trustPath = path.replace(/\.workflow\.js$/, ".trust.json"); + try { + await access(trustPath); + return JSON.parse(await readFile(trustPath, "utf8")) as DynamicWorkflowTrustMetadata; + } catch { + return undefined; + } +} + +function parseDeclaredMetadata(source: string): Record { + const match = /export\s+const\s+metadata\s*=\s*(\{[\s\S]*?\});/.exec(source); + if (!match) { + return {}; + } + try { + return JSON.parse(match[1]) as Record; + } catch { + return {}; + } +} + +function requestedPermissions(source: string): string[] { + const values = new Set(["read-only"]); + if (source.includes("safePatch")) { + values.add("safePatch"); + } + if (source.includes("inherit-session")) { + values.add("inherit-session"); + } + return [...values]; +} + +function looksLikeRemoteReference(reference: string): boolean { + return /^https?:\/\//i.test(reference); +} + +function looksLikeDynamicWorkflowPath(reference: string): boolean { + return ( + isAbsolute(reference) || + reference.startsWith(".") || + reference.includes("/") || + reference.includes("\\") || + extname(reference) === ".js" || + reference.endsWith(".workflow.js") + ); +} + +function isDynamicWorkflowFile(path: string): boolean { + return path.endsWith(".workflow.js"); +} + +function normalizeDynamicWorkflowId(value: string): string { + const normalized = value.trim(); + if (!/^[a-z0-9][a-z0-9._-]{0,79}$/i.test(normalized)) { + throw new Error("dynamic workflow id must start with a letter or number and contain only letters, numbers, dots, underscores, or dashes"); + } + return normalized; +} + +function assertUniqueDynamicWorkflowIds(entries: DynamicWorkflowEntry[]): void { + const byId = new Map(); + for (const entry of entries) { + byId.set(entry.id, [...(byId.get(entry.id) ?? []), entry]); + } + for (const [id, matches] of byId.entries()) { + if (matches.length > 1) { + throw new Error(`Duplicate dynamic workflow id "${id}" found in:\n${matches.map((entry) => `- ${entry.path}`).join("\n")}`); + } + } +} + +function stringValue(value: unknown): string | undefined { + return typeof value === "string" && value.length > 0 ? value : undefined; +} + +function titleFromId(id: string): string { + return id + .split(/[-_.]+/g) + .filter(Boolean) + .map((part) => `${part[0]?.toUpperCase() ?? ""}${part.slice(1)}`) + .join(" "); +} + +function sha256(source: string): string { + return createHash("sha256").update(source).digest("hex"); +} + +function pad(value: string, width: number): string { + return value.length >= width ? value : `${value}${" ".repeat(width - value.length)}`; +} diff --git a/src/dynamic-workflow.ts b/src/dynamic-workflow.ts new file mode 100644 index 0000000..c91e996 --- /dev/null +++ b/src/dynamic-workflow.ts @@ -0,0 +1,1234 @@ +import { spawn } from "node:child_process"; +import { createHash } from "node:crypto"; +import { readFileSync } from "node:fs"; +import { mkdir, readFile, writeFile } from "node:fs/promises"; +import { basename, join, resolve } from "node:path"; +import { parse } from "acorn"; +import { collectDiffContext, currentDiffHash } from "./adapters/command-step.js"; +import { runWorkerWithAdapter, type WorkerAdapterOptions } from "./adapters/worker-adapter.js"; +import type { DynamicIntentPreview } from "./dynamic-workflow-generator.js"; +import { buildFailureSummary, DEFAULT_FAILURE_POLICY } from "./run-index.js"; +import { RunStore } from "./run-store.js"; +import { applySafePatch, revertAppliedPatch, runVerificationCommands, type VerificationResult } from "./safe-write.js"; +import type { ArtifactRef, DiffContext, WorkerResult, WorkflowSpec, WorkflowWorker, WritePolicy } from "./types.js"; + +export type DynamicWorkflowOrigin = + | "generated-current-session" + | "local-trust-record" + | "copied-local" + | "remote" + | "registry" + | "packaged" + | "unknown"; + +export type DynamicPermissionProfile = "read-only" | "safePatch" | "inherit-session"; + +export type ParentPermissionCap = { + sandbox: "read-only" | "workspace-write" | "danger-full-access" | "unknown"; + approval_policy: "never" | "on-request" | "on-failure" | "untrusted" | "unknown"; +}; + +export type DynamicWorkflowBudget = { + max_agents: number; + max_concurrency: number; + timeout_ms: number; + output_bytes: number; +}; + +export type DynamicWorkflowMetadata = { + source_path: string; + artifact_script_path: string; + source_sha256: string; + origin: DynamicWorkflowOrigin; + origin_session_id?: string; + declared_safe_patch_policy?: WritePolicy; + parent_permission_cap: ParentPermissionCap; + budget: DynamicWorkflowBudget; + preview?: DynamicIntentPreview; + capabilities: DynamicWorkflowCapabilities; + ast_policy: { + status: "passed"; + parser: "acorn"; + rejected_patterns: string[]; + }; +}; + +export type DynamicWorkflowCapabilities = { + uses: string[]; + requested_permissions: DynamicPermissionProfile[]; + inherit_session_allowed: boolean; + inherit_session_reason: string; + app_thread_inherit_session_status: "read-only-only" | "inherit-session-degraded-to-read-only"; +}; + +export type DynamicWorkerRunner = ( + worker: WorkflowWorker, + context: DiffContext, + options: WorkerAdapterOptions, +) => Promise; + +export type DynamicWorkflowOptions = { + scriptPath: string; + target: string; + runsRoot?: string; + origin?: DynamicWorkflowOrigin; + originSessionId?: string; + parentPermissionCap?: ParentPermissionCap; + budget?: Partial; + preview?: DynamicIntentPreview; + workerRunner?: DynamicWorkerRunner; + approve?: boolean; +}; + +const DEFAULT_DYNAMIC_BUDGET: DynamicWorkflowBudget = { + max_agents: 8, + max_concurrency: 4, + timeout_ms: 120000, + output_bytes: 128000, +}; + +const FORBIDDEN_IDENTIFIERS = new Set([ + "require", + "eval", + "Function", + "globalThis", + "process", + "fetch", + "Reflect", + "Proxy", + "setTimeout", + "setInterval", + "setImmediate", + "queueMicrotask", +]); + +const FORBIDDEN_MEMBER_NAMES = new Set(["constructor", "prototype", "__proto__", "env"]); +const ALLOWED_CALL_ROOTS = new Set(["cwf", "JSON", "Math", "Array", "Object", "Number", "String", "Boolean", "Promise"]); +const SHELL_STRING_PATTERNS = [/rm\s+-rf/, /\bcurl\s+/, /\bbash\s+/, /\bsh\s+/, /\bnode\s+/, /\bpython(?:3)?\s+/]; + +export async function startDynamicWorkflow(options: DynamicWorkflowOptions): Promise { + const target = resolve(options.target); + const sourcePath = resolve(options.scriptPath); + const source = await readFile(sourcePath, "utf8"); + validateDynamicWorkflowSource(source); + const store = await createDynamicStore(sourcePath, target, options.runsRoot); + await writeDynamicPreview(store, sourcePath, source, target, options); + if (options.approve) { + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store, target, workerRunner: options.workerRunner }); + } + return store; +} + +export async function resumeDynamicWorkflow(options: { + store: RunStore; + target?: string; + workerRunner?: DynamicWorkerRunner; +}): Promise { + const store = options.store; + const state = await store.readState(); + if (state.status === "rejected") { + throw new Error("Dynamic workflow was rejected and cannot be resumed."); + } + const gate = state.phases.find((phase) => phase.id === "approve-dynamic"); + if (gate?.status !== "approved") { + throw new Error("Dynamic workflow must be approved before execution."); + } + if (state.phases.find((phase) => phase.id === "dynamic-execute")?.status === "completed") { + return store; + } + + const target = resolve(options.target ?? state.target); + const metadata = await readDynamicMetadata(store); + await store.updatePhase("dynamic-execute", "running"); + const context = await store.readContext(); + const startedHash = await currentDiffHash(target); + const events: unknown[] = []; + const workerResults: WorkerResult[] = []; + try { + const finalResult = await executeDynamicChild({ + store, + target, + context, + metadata, + workerRunner: options.workerRunner, + events, + workerResults, + }); + const completedHash = await currentDiffHash(target); + await writeDynamicExecutionArtifacts(store, metadata, events, workerResults, finalResult, startedHash, completedHash); + await store.updatePhase("dynamic-execute", "completed"); + await store.writeResult(renderDynamicResult(metadata, workerResults, finalResult, completedHash !== startedHash)); + await store.writeArtifactManifest(buildDynamicArtifactManifest(store, workerResults)); + await store.appendEvent("run.completed", { run_id: store.runId }); + return store; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + await writeDynamicExecutionArtifacts(store, metadata, events, workerResults, { error: message }, startedHash, await currentDiffHash(target).catch(() => startedHash)); + await store.updatePhase("dynamic-execute", "failed", message); + const failed = await store.readState(); + failed.status = "failed"; + failed.error = message; + failed.failure_summary = buildFailureSummary(failed, message); + await store.writeState(failed); + await store.appendEvent("run.failed", { error: message }); + throw error; + } +} + +export function validateDynamicWorkflowSource(source: string): void { + const ast = parse(source, { + ecmaVersion: "latest", + sourceType: "module", + allowAwaitOutsideFunction: false, + }) as unknown as AcornNode; + const topLevel = Array.isArray(ast.body) ? ast.body : []; + let defaultExportCount = 0; + for (const statement of topLevel) { + if (statement.type === "ImportDeclaration") { + throw new Error("dynamic workflow cannot use imports or dynamic import"); + } + if (statement.type === "ExportDefaultDeclaration") { + defaultExportCount += 1; + const declaration = statement.declaration; + if (!declaration || (declaration.type !== "FunctionDeclaration" && declaration.type !== "ArrowFunctionExpression")) { + throw new Error("dynamic workflow default export must be an async function"); + } + if (!declaration.async) { + throw new Error("dynamic workflow default export must be async"); + } + continue; + } + if (statement.type === "ExportNamedDeclaration" && isAllowedMetadataExport(statement)) { + continue; + } + if (statement.type === "EmptyStatement") { + continue; + } + throw new Error("dynamic workflow only allows metadata exports and one async default workflow export at top level"); + } + if (defaultExportCount !== 1) { + throw new Error("dynamic workflow must export exactly one async default function"); + } + walkAst(ast, undefined); +} + +async function createDynamicStore(scriptPath: string, target: string, runsRoot?: string): Promise { + const spec = createDynamicWorkflowSpec(scriptPath); + return RunStore.create(spec, target, runsRoot); +} + +function createDynamicWorkflowSpec(scriptPath: string): WorkflowSpec { + return { + id: "dynamic-js", + version: "1.11.0", + title: `Dynamic JS Workflow: ${basename(scriptPath)}`, + tags: ["dynamic", "javascript", "v1.11"], + inputs: { + target: { type: "path", required: true }, + }, + capabilities: { writes: false }, + requires: { target: "git-repo" }, + defaults: { sandbox: "read-only", timeout_ms: DEFAULT_DYNAMIC_BUDGET.timeout_ms }, + phases: [ + { id: "collect", kind: "command" }, + { id: "dynamic-preview", kind: "command" }, + { id: "approve-dynamic", kind: "gate", prompt: "Approve dynamic JavaScript workflow execution.", requires_approval: true }, + { id: "dynamic-execute", kind: "command" }, + ], + artifacts: ["result.md", "artifacts/dynamic-preview.md", "artifacts/dynamic-events.jsonl"], + }; +} + +async function writeDynamicPreview( + store: RunStore, + sourcePath: string, + source: string, + target: string, + options: DynamicWorkflowOptions, +): Promise { + await store.updatePhase("collect", "running"); + const context = { + ...(await collectDiffContext(target)), + tracked_diff_hash: await currentDiffHash(target), + }; + await store.writeContext(context); + await store.updatePhase("collect", "completed"); + + await store.updatePhase("dynamic-preview", "running"); + const artifactScriptPath = join(store.runDir, "artifacts", "workflow.js"); + await mkdir(join(store.runDir, "artifacts"), { recursive: true }); + await writeFile(artifactScriptPath, source); + const sourceSha256 = sha256(source); + const parentPermissionCap = options.parentPermissionCap ?? inferParentPermissionCap(); + const budget = { ...DEFAULT_DYNAMIC_BUDGET, ...options.budget }; + const origin = options.origin ?? "copied-local"; + const declaredSafePatchPolicy = declaredSafePatchPolicyFromSource(source); + const capabilities = buildDynamicCapabilities(source, origin, parentPermissionCap); + if (capabilities.requested_permissions.includes("safePatch") && !declaredSafePatchPolicy) { + throw new Error("dynamic safePatch requires export const metadata.safe_patch_policy so the write policy is visible in preview"); + } + const metadata: DynamicWorkflowMetadata = { + source_path: sourcePath, + artifact_script_path: artifactScriptPath, + source_sha256: sourceSha256, + origin, + origin_session_id: options.originSessionId, + declared_safe_patch_policy: declaredSafePatchPolicy, + parent_permission_cap: parentPermissionCap, + budget, + preview: options.preview, + capabilities, + ast_policy: { + status: "passed", + parser: "acorn", + rejected_patterns: [ + "imports", + "dynamic import", + "require", + "eval", + "Function", + "globalThis", + "process", + "fetch", + "constructor/prototype escapes", + "direct shell strings", + "non-cwf call roots", + ], + }, + }; + await writeJsonFile(join(store.runDir, "artifacts", "dynamic-workflow.json"), metadata); + await writeJsonFile(join(store.runDir, "artifacts", "dynamic-capabilities.json"), capabilities); + await writeJsonFile(join(store.runDir, "artifacts", "dynamic-budget.json"), budget); + await writeFile(join(store.runDir, "artifacts", "workflow.sha256"), `${sourceSha256}\n`); + await writeFile(join(store.runDir, "artifacts", "dynamic-preview.md"), renderDynamicPreview(metadata, context)); + await store.appendEvent("dynamic.preview", { + source_sha256: sourceSha256, + origin, + requested_permissions: capabilities.requested_permissions, + inherit_session_allowed: capabilities.inherit_session_allowed, + }); + await store.updatePhase("dynamic-preview", "completed"); + await store.waitAtGate("approve-dynamic", "Approve dynamic JavaScript workflow execution after reviewing artifacts/dynamic-preview.md."); +} + +async function executeDynamicChild(options: { + store: RunStore; + target: string; + context: DiffContext; + metadata: DynamicWorkflowMetadata; + workerRunner?: DynamicWorkerRunner; + events: unknown[]; + workerResults: WorkerResult[]; +}): Promise { + const childPath = join(options.store.runDir, "artifacts", "dynamic-child.mjs"); + await writeFile(childPath, DYNAMIC_CHILD_SOURCE); + const child = spawn(process.execPath, [ + "--permission", + `--allow-fs-read=${options.store.runDir}`, + childPath, + options.metadata.artifact_script_path, + ], { + cwd: options.store.runDir, + env: { + CWF_TARGET: options.target, + CWF_RUN_DIR: options.store.runDir, + CWF_MAX_CONCURRENCY: String(safePositiveNumber(options.metadata.budget.max_concurrency, DEFAULT_DYNAMIC_BUDGET.max_concurrency)), + }, + stdio: ["pipe", "pipe", "pipe"], + }); + + let stderr = ""; + let stdoutBuffer = ""; + let finalResult: unknown; + let childError: Error | undefined; + child.stderr.setEncoding("utf8"); + child.stderr.on("data", (chunk) => { + stderr += chunk; + }); + + child.stdout.setEncoding("utf8"); + child.stdout.on("data", (chunk) => { + stdoutBuffer += chunk; + const lines = stdoutBuffer.split(/\r?\n/); + stdoutBuffer = lines.pop() ?? ""; + for (const line of lines) { + if (!line.trim()) { + continue; + } + void handleChildMessage(JSON.parse(line) as ChildRequest).catch((error) => { + childError = error instanceof Error ? error : new Error(String(error)); + child.kill("SIGTERM"); + }); + } + }); + + const exit = new Promise((resolveExit, rejectExit) => { + const timeout = setTimeout(() => { + child.kill("SIGTERM"); + rejectExit(new Error("dynamic workflow timed out")); + }, options.metadata.budget.timeout_ms); + child.on("error", (error) => { + clearTimeout(timeout); + rejectExit(error); + }); + child.on("exit", (code) => { + clearTimeout(timeout); + if (childError) { + rejectExit(childError); + } else if (code === 0) { + resolveExit(); + } else { + rejectExit(new Error(`dynamic child exited with code ${code}${stderr.trim() ? `: ${stderr.trim()}` : ""}`)); + } + }); + }); + + async function handleChildMessage(message: ChildRequest): Promise { + if (message.type === "event") { + options.events.push(message.payload); + await appendDynamicEvent(options.store, message.payload); + return; + } + if (message.type !== "request") { + return; + } + try { + const result = await handleRuntimeRequest(message.method, message.params, options); + if (message.method === "report.final") { + finalResult = result; + } + child.stdin.write(`${JSON.stringify({ type: "response", id: message.id, result })}\n`); + } catch (error) { + const errorMessage = error instanceof Error ? error.message : String(error); + child.stdin.write(`${JSON.stringify({ type: "response", id: message.id, error: errorMessage })}\n`); + } + } + + await exit; + return finalResult; +} + +async function handleRuntimeRequest( + method: string, + params: unknown, + options: { + store: RunStore; + target: string; + context: DiffContext; + metadata: DynamicWorkflowMetadata; + workerRunner?: DynamicWorkerRunner; + events: unknown[]; + workerResults: WorkerResult[]; + }, +): Promise { + if (method === "git.changedFiles") { + return options.context.changed_files; + } + if (method === "git.diff") { + return options.context.diff; + } + if (method === "artifacts.write") { + const record = asRecord(params, "artifacts.write params"); + const name = safeArtifactName(expectString(record.name, "artifacts.write.name")); + const content = expectString(record.content, "artifacts.write.content"); + if (content.length > options.metadata.budget.output_bytes) { + throw new Error("artifact content exceeds dynamic workflow output budget"); + } + const path = join(options.store.runDir, "artifacts", `dynamic-${name}`); + await writeFile(path, content); + await options.store.appendEvent("artifact.generated", { path }); + return { path }; + } + if (method === "safePatch.apply") { + return applyDynamicSafePatch(asRecord(params, "safePatch params"), options); + } + if (method === "report.summarize") { + return summarizeDynamicReport(params); + } + if (method === "report.final") { + return params; + } + if (method === "agent.run") { + return runDynamicAgent(asRecord(params, "agent.run params"), options); + } + throw new Error(`unsupported dynamic runtime method: ${method}`); +} + +async function applyDynamicSafePatch( + params: Record, + options: { + store: RunStore; + target: string; + metadata: DynamicWorkflowMetadata; + events: unknown[]; + }, +): Promise<{ + status: "passed"; + changed_files: string[]; + patch_path: string; + verification: VerificationResult[]; +}> { + const patch = expectString(params.patch, "safePatch.patch"); + const writePolicy = expectWritePolicy(params.write_policy); + const artifactsDir = join(options.store.runDir, "artifacts"); + await mkdir(artifactsDir, { recursive: true }); + const patchPath = join(artifactsDir, "dynamic-proposed.patch"); + const resultPath = join(artifactsDir, "dynamic-safe-patch.json"); + await writeFile(patchPath, patch); + await options.store.appendEvent("dynamic.safe_patch.preview", { + patch_path: patchPath, + allowed_paths: writePolicy.allowed_paths, + forbidden_paths: writePolicy.forbidden_paths, + }); + let changedFiles: string[] = []; + let verification: VerificationResult[] = []; + let rollback: { status: "not_required" | "passed" | "failed"; patch_path?: string; error?: string } = { status: "not_required" }; + try { + assertSafePatchPolicyMatchesDeclared(writePolicy, options.metadata.declared_safe_patch_policy); + changedFiles = await applySafePatch(options.target, patch, writePolicy, patchPath); + verification = await runVerificationCommands(options.target, writePolicy.verification_commands); + if (verification.some((item) => item.status === "failed")) { + try { + await revertAppliedPatch(options.target, patchPath); + rollback = { status: "passed", patch_path: patchPath }; + await options.store.appendEvent("dynamic.safe_patch.rollback", rollback); + } catch (rollbackError) { + const rollbackMessage = rollbackError instanceof Error ? rollbackError.message : String(rollbackError); + rollback = { status: "failed", patch_path: patchPath, error: rollbackMessage }; + await options.store.appendEvent("dynamic.safe_patch.rollback", rollback); + throw new Error(`safePatch verification failed and rollback failed: ${rollbackMessage}`); + } + throw new Error("safePatch verification failed; patch was rolled back"); + } + const result = { + status: "passed" as const, + changed_files: changedFiles, + patch_path: patchPath, + verification, + rollback, + }; + await writeJsonFile(resultPath, result); + await options.store.appendEvent("dynamic.safe_patch.applied", result); + return result; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + await writeJsonFile(resultPath, { + status: "failed", + error: message, + changed_files: changedFiles, + patch_path: patchPath, + verification, + rollback, + }); + await options.store.appendEvent("dynamic.safe_patch.failed", { error: message, patch_path: patchPath }); + throw error; + } +} + +async function runDynamicAgent( + params: Record, + options: { + store: RunStore; + target: string; + context: DiffContext; + metadata: DynamicWorkflowMetadata; + workerRunner?: DynamicWorkerRunner; + workerResults: WorkerResult[]; + }, +): Promise { + if (options.workerResults.length >= options.metadata.budget.max_agents) { + throw new Error("dynamic workflow exceeded max_agents budget"); + } + const id = expectString(params.id, "agent.run.id"); + const role = expectString(params.role ?? id, "agent.run.role"); + const prompt = expectString(params.prompt, "agent.run.prompt"); + const permissions = expectPermissionProfile(params.permissions ?? params.sandbox ?? "read-only"); + if (permissions === "safePatch") { + throw new Error("dynamic safePatch workers are recognized but not executable until a v1.10 write_policy is attached to the dynamic run"); + } + if (permissions === "inherit-session") { + assertInheritSessionAllowed(options.metadata); + } + const sandboxMode = permissions === "inherit-session" ? options.metadata.parent_permission_cap.sandbox ?? "read-only" : "read-only"; + const approvalPolicy = permissions === "inherit-session" ? options.metadata.parent_permission_cap.approval_policy ?? "never" : "never"; + + const worker: WorkflowWorker = { + id, + perspective: role, + prompt, + writes: permissions === "inherit-session", + }; + const beforeHash = await currentDiffHash(options.target); + const runner = options.workerRunner ?? runWorkerWithAdapter; + await options.store.upsertWorker(id, "running"); + let result: WorkerResult; + try { + result = await runner(worker, options.context, { + target: options.target, + timeoutMs: options.metadata.budget.timeout_ms, + workflowId: "dynamic-js", + runId: options.store.runId, + runtime: { preferred_worker_adapter: "codex-sdk-headless" }, + sandboxMode: sandboxMode === "unknown" ? "read-only" : sandboxMode, + approvalPolicy: approvalPolicy === "unknown" ? "never" : approvalPolicy, + }); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + await options.store.upsertWorker(id, "failed", message); + throw error; + } + const afterHash = await currentDiffHash(options.target); + if (permissions === "read-only" && afterHash !== beforeHash) { + const failed = { + ...result, + status: "failed" as const, + confidence: "low" as const, + error: "read-only-worker-violation: target diff changed during read-only dynamic agent", + verification: [...result.verification, "read-only-worker-violation: target diff changed"], + }; + await options.store.writeWorkerResult(failed); + options.workerResults.push(failed); + throw new Error(failed.error); + } + const normalized: WorkerResult = { + ...result, + runtime: { + adapter: result.runtime?.adapter ?? "codex-sdk-headless", + fallback_used: result.runtime?.fallback_used ?? false, + agent_role: result.runtime?.agent_role ?? role, + transcript_read: result.runtime?.transcript_read ?? false, + ...result.runtime, + sandbox: sandboxMode === "unknown" ? "read-only" : sandboxMode, + approval_policy: approvalPolicy === "unknown" ? "never" : approvalPolicy, + }, + }; + await options.store.writeWorkerResult(normalized); + options.workerResults.push(normalized); + return normalized; +} + +function assertInheritSessionAllowed(metadata: DynamicWorkflowMetadata): void { + if (!metadata.capabilities.inherit_session_allowed) { + throw new Error(`inherit-session rejected: ${metadata.capabilities.inherit_session_reason}`); + } + if (metadata.parent_permission_cap.sandbox === "read-only" || metadata.parent_permission_cap.sandbox === "unknown") { + throw new Error("inherit-session rejected: parent permission cap is not write-capable"); + } + if (metadata.parent_permission_cap.approval_policy === "unknown") { + throw new Error("inherit-session rejected: parent approval policy is unknown"); + } + const currentHash = sha256ForFileSync(metadata.artifact_script_path); + if (currentHash !== metadata.source_sha256) { + throw new Error("inherit-session rejected: approved script SHA-256 no longer matches"); + } +} + +function buildDynamicCapabilities( + source: string, + origin: DynamicWorkflowOrigin, + parentPermissionCap: ParentPermissionCap, +): DynamicWorkflowCapabilities { + const requested = requestedPermissions(source); + const trustedGenerated = origin === "generated-current-session"; + const parentWriteCapable = parentPermissionCap.sandbox === "workspace-write" || parentPermissionCap.sandbox === "danger-full-access"; + const inheritSessionAllowed = trustedGenerated && parentWriteCapable && parentPermissionCap.approval_policy !== "unknown"; + return { + uses: detectRuntimeUses(source), + requested_permissions: requested, + inherit_session_allowed: inheritSessionAllowed, + inherit_session_reason: inheritSessionAllowed + ? "generated-current-session origin with write-capable parent permission cap" + : "inherit-session requires generated-current-session origin, matching SHA-256, known approval policy, and write-capable parent permission cap", + app_thread_inherit_session_status: requested.includes("inherit-session") ? "inherit-session-degraded-to-read-only" : "read-only-only", + }; +} + +function renderDynamicPreview(metadata: DynamicWorkflowMetadata, context: DiffContext): string { + const broadParent = metadata.parent_permission_cap.sandbox === "workspace-write" || metadata.parent_permission_cap.sandbox === "danger-full-access"; + const agents = metadata.preview?.agents ?? inferPreviewAgents(metadata); + const writeIntent = metadata.preview?.write_intent ?? inferWriteIntent(metadata); + const stopRules = metadata.preview?.stop_rules ?? [ + "Pause at approve-dynamic before dynamic-execute starts.", + "Fail before execution when AST policy rejects forbidden source.", + "Fail read-only workers that mutate the target diff.", + ]; + return [ + "# Dynamic Workflow Preview", + "", + "## Intent", + "", + metadata.preview?.goal ? metadata.preview.goal : "Local dynamic workflow script selected by the user.", + "", + `Source: \`${metadata.source_path}\``, + `Artifact script: \`${metadata.artifact_script_path}\``, + `SHA-256: \`${metadata.source_sha256}\``, + `Origin: \`${metadata.origin}\``, + `Parent sandbox cap: \`${metadata.parent_permission_cap.sandbox}\``, + `Parent approval cap: \`${metadata.parent_permission_cap.approval_policy}\``, + "", + "## Capabilities", + "", + ...(metadata.capabilities.uses.length > 0 ? metadata.capabilities.uses.map((item) => `- \`${item}\``) : ["- none detected"]), + "", + "## Planned Agents", + "", + ...agents.map((agent) => `- \`${agent.id}\` (${agent.role}, ${agent.permissions}): ${agent.purpose}`), + "", + "## Requested Agent Permissions", + "", + ...metadata.capabilities.requested_permissions.map((item) => `- \`${item}\``), + "", + "## Write Intent", + "", + writeIntent, + "", + ...renderSafePatchPolicy(metadata), + "## Budget", + "", + `- max_agents: ${metadata.budget.max_agents}`, + `- max_concurrency: ${metadata.budget.max_concurrency}`, + `- timeout_ms: ${metadata.budget.timeout_ms}`, + `- output_bytes: ${metadata.budget.output_bytes}`, + "", + "## Target Snapshot", + "", + `- Target: \`${context.target}\``, + `- Branch: \`${context.branch}\``, + `- Diff hash: \`${context.diff_hash}\``, + `- Changed files: ${context.changed_files.length > 0 ? context.changed_files.map((file) => `\`${file}\``).join(", ") : "none"}`, + "", + "## Safety", + "", + "- AST policy passed with Acorn before this preview was written.", + "- Execution will run in a Node Permission Model child process.", + "- The child process receives no target repo filesystem permission, no network permission, and no child-process permission.", + "- All git, agent, artifact, and report work must go through parent CWF JSON-RPC.", + `- inherit-session: ${metadata.capabilities.inherit_session_allowed ? "allowed for this generated-current-session script" : `rejected (${metadata.capabilities.inherit_session_reason})`}.`, + `- App-thread inherit-session status: \`${metadata.capabilities.app_thread_inherit_session_status}\`.`, + broadParent + ? "- Broad parent authority detected; this preview is non-skippable and should be compared with the declared task scope before approval." + : "- Parent authority is not broad.", + "", + "## Stop Rules", + "", + ...stopRules.map((rule) => `- ${rule}`), + ].join("\n"); +} + +function inferPreviewAgents(metadata: DynamicWorkflowMetadata): DynamicIntentPreview["agents"] { + if (!metadata.capabilities.uses.includes("cwf.agent.run")) { + return [ + { + id: "none", + role: "no worker agents detected", + permissions: "read-only", + purpose: "This script does not appear to call cwf.agent.run.", + }, + ]; + } + return [ + { + id: "declared-at-runtime", + role: "dynamic worker", + permissions: metadata.capabilities.requested_permissions.includes("inherit-session") + ? "inherit-session" + : metadata.capabilities.requested_permissions.includes("safePatch") + ? "safePatch" + : "read-only", + purpose: "Worker prompts and ids are declared inside workflow.js and executed only after approval.", + }, + ]; +} + +function inferWriteIntent(metadata: DynamicWorkflowMetadata): string { + if (metadata.capabilities.requested_permissions.includes("inherit-session")) { + return "inherit-session requested: allowed only for generated-current-session scripts with matching SHA and parent-capped write permission."; + } + if (metadata.capabilities.requested_permissions.includes("safePatch")) { + return "safePatch requested: executable only when runtime write_policy exactly matches metadata.safe_patch_policy shown in this preview."; + } + return "read-only: workflow may write run artifacts through cwf.artifacts, but must not modify the target repo."; +} + +function renderSafePatchPolicy(metadata: DynamicWorkflowMetadata): string[] { + if (!metadata.capabilities.requested_permissions.includes("safePatch")) { + return []; + } + const policy = metadata.declared_safe_patch_policy; + if (!policy) { + return [ + "## SafePatch Policy", + "", + "- missing: dynamic safePatch execution will be rejected before apply.", + "", + ]; + } + return [ + "## SafePatch Policy", + "", + `- mode: \`${policy.mode}\``, + `- allowed_paths: ${policy.allowed_paths.map((item) => `\`${item}\``).join(", ") || "(none)"}`, + `- forbidden_paths: ${policy.forbidden_paths.map((item) => `\`${item}\``).join(", ") || "(none)"}`, + `- verification_commands: ${policy.verification_commands.map((item) => `\`${item}\``).join(", ") || "(none)"}`, + "", + ]; +} + +async function writeDynamicExecutionArtifacts( + store: RunStore, + metadata: DynamicWorkflowMetadata, + events: unknown[], + workerResults: WorkerResult[], + finalResult: unknown, + beforeHash: string, + afterHash: string, +): Promise { + await mkdir(join(store.runDir, "artifacts"), { recursive: true }); + await writeFile(join(store.runDir, "artifacts", "dynamic-events.jsonl"), events.map((event) => JSON.stringify(event)).join("\n") + (events.length > 0 ? "\n" : "")); + await writeJsonFile(join(store.runDir, "artifacts", "dynamic-final.json"), finalResult); + await writeFile( + join(store.runDir, "artifacts", "dynamic-summary.md"), + [ + "# Dynamic Workflow Summary", + "", + `SHA-256: \`${metadata.source_sha256}\``, + `Origin: \`${metadata.origin}\``, + `Workers: ${workerResults.length}`, + `Target diff changed: ${beforeHash !== afterHash ? "yes" : "no"}`, + "", + "## Workers", + "", + ...(workerResults.length > 0 ? workerResults.map((worker) => `- ${worker.worker_id}: ${worker.status}`) : ["- none"]), + ].join("\n"), + ); +} + +function renderDynamicResult(metadata: DynamicWorkflowMetadata, workerResults: WorkerResult[], finalResult: unknown, targetChanged: boolean): string { + return [ + "# Dynamic JS Workflow Result", + "", + `Origin: \`${metadata.origin}\``, + `SHA-256: \`${metadata.source_sha256}\``, + `Workers: ${workerResults.length}`, + `Target changed: ${targetChanged ? "yes" : "no"}`, + "", + "## Final", + "", + "```json", + JSON.stringify(finalResult, null, 2), + "```", + ].join("\n"); +} + +function buildDynamicArtifactManifest(store: RunStore, workerResults: WorkerResult[]) { + const artifacts: ArtifactRef[] = [ + { id: "workflow", type: "workflow", path: join(store.runDir, "workflow.json"), description: "Dynamic workflow wrapper spec snapshot." }, + { id: "state", type: "state", path: join(store.runDir, "state.json"), description: "Mutable run state and gate decisions." }, + { id: "events", type: "events", path: join(store.runDir, "events.jsonl"), description: "Append-only run and dynamic runtime event log." }, + { id: "context", type: "context", path: join(store.runDir, "context.json"), description: "Collected git diff context exposed through cwf.git." }, + { id: "dynamic-script", type: "generated", path: join(store.runDir, "artifacts", "workflow.js"), description: "Approved JavaScript workflow artifact copy." }, + { id: "dynamic-preview", type: "generated", path: join(store.runDir, "artifacts", "dynamic-preview.md"), description: "Non-skippable preview shown before execution approval." }, + { id: "dynamic-capabilities", type: "generated", path: join(store.runDir, "artifacts", "dynamic-capabilities.json"), description: "Detected runtime capability and permission profile metadata." }, + { id: "dynamic-budget", type: "generated", path: join(store.runDir, "artifacts", "dynamic-budget.json"), description: "Runtime budget fuses for the dynamic workflow." }, + { id: "dynamic-events", type: "generated", path: join(store.runDir, "artifacts", "dynamic-events.jsonl"), description: "Dynamic workflow runtime events emitted by the child process." }, + { id: "dynamic-final", type: "generated", path: join(store.runDir, "artifacts", "dynamic-final.json"), description: "Raw final value returned by the dynamic workflow." }, + { id: "dynamic-proposed-patch", type: "generated", path: join(store.runDir, "artifacts", "dynamic-proposed.patch"), description: "Dynamic safePatch patch proposal, present for safePatch runs." }, + { id: "dynamic-safe-patch", type: "generated", path: join(store.runDir, "artifacts", "dynamic-safe-patch.json"), description: "Dynamic safePatch policy, verification, and rollback result, present for safePatch runs." }, + { id: "result", type: "result", path: join(store.runDir, "result.md"), description: "Human-readable dynamic workflow result." }, + { id: "manifest", type: "manifest", path: join(store.runDir, "artifacts", "manifest.json"), description: "Artifact manifest for this dynamic run." }, + ...workerResults.map((worker) => ({ + id: `worker:${worker.worker_id}`, + type: "worker", + path: join(store.runDir, "workers", `${worker.worker_id}.json`), + description: `Dynamic worker result envelope for ${worker.worker_id}.`, + })), + ]; + return { + version: 1 as const, + run_id: store.runId, + workflow: "dynamic-js", + generated_at: new Date().toISOString(), + artifacts, + }; +} + +async function readDynamicMetadata(store: RunStore): Promise { + return JSON.parse(await readFile(join(store.runDir, "artifacts", "dynamic-workflow.json"), "utf8")) as DynamicWorkflowMetadata; +} + +async function appendDynamicEvent(store: RunStore, payload: unknown): Promise { + await store.appendEvent("dynamic.event", asRecord(payload, "dynamic event")); +} + +function requestedPermissions(source: string): DynamicPermissionProfile[] { + const profiles = new Set(["read-only"]); + if (source.includes("safePatch")) { + profiles.add("safePatch"); + } + if (source.includes("inherit-session")) { + profiles.add("inherit-session"); + } + return [...profiles]; +} + +function detectRuntimeUses(source: string): string[] { + return ["cwf.git", "cwf.agent.run", "cwf.safePatch", "cwf.map", "cwf.artifacts", "cwf.report"].filter((needle) => source.includes(needle)); +} + +function isAllowedMetadataExport(statement: AcornNode): boolean { + const declaration = statement.declaration; + if (!declaration || declaration.type !== "VariableDeclaration") { + return false; + } + return declaration.declarations?.every((item: AcornNode) => item.id?.name === "metadata") ?? false; +} + +function walkAst(node: AcornNode, parent: AcornNode | undefined): void { + if (node.type === "ImportDeclaration" || node.type === "ImportExpression") { + throw new Error("dynamic workflow cannot use imports or dynamic import"); + } + if (node.type === "Identifier" && FORBIDDEN_IDENTIFIERS.has(node.name ?? "")) { + throw new Error(`dynamic workflow cannot access forbidden identifier: ${node.name}`); + } + if (node.type === "MemberExpression") { + const propertyName = memberPropertyName(node); + const root = callRootName(node); + if (root && FORBIDDEN_IDENTIFIERS.has(root)) { + throw new Error(`dynamic workflow cannot access forbidden identifier: ${root}`); + } + if (propertyName && FORBIDDEN_MEMBER_NAMES.has(propertyName)) { + throw new Error(`dynamic workflow cannot access forbidden member: ${propertyName}`); + } + } + if (node.type === "CallExpression") { + const propertyName = node.callee?.type === "MemberExpression" ? memberPropertyName(node.callee) : undefined; + if (propertyName && FORBIDDEN_MEMBER_NAMES.has(propertyName)) { + throw new Error(`dynamic workflow cannot access forbidden member: ${propertyName}`); + } + assertAllowedCallExpression(node); + } + if (node.type === "NewExpression") { + throw new Error("dynamic workflow cannot construct arbitrary objects with new"); + } + if (node.type === "Literal" && typeof node.value === "string" && SHELL_STRING_PATTERNS.some((pattern) => pattern.test(node.value))) { + throw new Error("dynamic workflow cannot contain direct shell command strings"); + } + if (node.type === "TemplateElement") { + const cooked = typeof node.value?.cooked === "string" ? node.value.cooked : ""; + const raw = typeof node.value?.raw === "string" ? node.value.raw : ""; + if (SHELL_STRING_PATTERNS.some((pattern) => pattern.test(cooked) || pattern.test(raw))) { + throw new Error("dynamic workflow cannot contain direct shell command strings"); + } + } + for (const child of childNodes(node)) { + walkAst(child, parent ?? node); + } +} + +function assertAllowedCallExpression(node: AcornNode): void { + const forbiddenMember = findForbiddenMemberName(node.callee); + if (forbiddenMember) { + throw new Error(`dynamic workflow cannot access forbidden member: ${forbiddenMember}`); + } + const root = callRootName(node.callee); + if (!root || !ALLOWED_CALL_ROOTS.has(root)) { + throw new Error(`dynamic workflow call expressions must be rooted in cwf or allowed builtins; saw ${root ?? "unknown"}`); + } +} + +function findForbiddenMemberName(node: AcornNode | undefined): string | undefined { + if (!node) { + return undefined; + } + if (node.type === "MemberExpression") { + const propertyName = memberPropertyName(node); + if (propertyName && FORBIDDEN_MEMBER_NAMES.has(propertyName)) { + return propertyName; + } + } + for (const child of childNodes(node)) { + const found = findForbiddenMemberName(child); + if (found) { + return found; + } + } + return undefined; +} + +function callRootName(node: AcornNode | undefined): string | undefined { + if (!node) { + return undefined; + } + if (node.type === "Identifier") { + return node.name; + } + if (node.type === "MemberExpression" || node.type === "CallExpression") { + return callRootName(node.object ?? node.callee); + } + return undefined; +} + +function memberPropertyName(node: AcornNode): string | undefined { + const property = node.property; + if (!property) { + return undefined; + } + if (property.type === "Identifier") { + return property.name; + } + if (property.type === "Literal" && typeof property.value === "string") { + return property.value; + } + return undefined; +} + +function childNodes(node: AcornNode): AcornNode[] { + const children: AcornNode[] = []; + for (const [key, value] of Object.entries(node)) { + if (key === "parent") { + continue; + } + if (Array.isArray(value)) { + children.push(...value.filter(isAcornNode)); + } else if (isAcornNode(value)) { + children.push(value); + } + } + return children; +} + +function isAcornNode(value: unknown): value is AcornNode { + return Boolean(value && typeof value === "object" && typeof (value as AcornNode).type === "string"); +} + +function inferParentPermissionCap(): ParentPermissionCap { + return { + sandbox: expectParentSandbox(process.env.CWF_PARENT_SANDBOX ?? "unknown"), + approval_policy: expectParentApprovalPolicy(process.env.CWF_PARENT_APPROVAL_POLICY ?? "unknown"), + }; +} + +function expectParentSandbox(value: string): ParentPermissionCap["sandbox"] { + if (value === "read-only" || value === "workspace-write" || value === "danger-full-access" || value === "unknown") { + return value; + } + return "unknown"; +} + +function expectParentApprovalPolicy(value: string): ParentPermissionCap["approval_policy"] { + if (value === "never" || value === "on-request" || value === "on-failure" || value === "untrusted" || value === "unknown") { + return value; + } + return "unknown"; +} + +function expectPermissionProfile(value: unknown): DynamicPermissionProfile { + if (value === "read-only" || value === "safePatch" || value === "inherit-session") { + return value; + } + throw new Error(`agent.run.permissions must be read-only, safePatch, or inherit-session`); +} + +function expectString(value: unknown, path: string): string { + if (typeof value !== "string" || value.length === 0) { + throw new Error(`${path} must be a non-empty string`); + } + return value; +} + +function expectWritePolicy(value: unknown): WritePolicy { + const record = asRecord(value, "safePatch.write_policy"); + if (record.mode !== "patch") { + throw new Error("safePatch.write_policy.mode must be patch"); + } + const allowedPaths = expectStringArray(record.allowed_paths, "safePatch.write_policy.allowed_paths"); + const forbiddenPaths = expectStringArray(record.forbidden_paths, "safePatch.write_policy.forbidden_paths"); + const verificationCommands = expectStringArray(record.verification_commands ?? [], "safePatch.write_policy.verification_commands"); + return { + mode: "patch", + allowed_paths: allowedPaths, + forbidden_paths: forbiddenPaths, + verification_commands: verificationCommands, + }; +} + +function declaredSafePatchPolicyFromSource(source: string): WritePolicy | undefined { + const metadata = parseDeclaredMetadata(source); + if (!("safe_patch_policy" in metadata)) { + return undefined; + } + return expectWritePolicy(metadata.safe_patch_policy); +} + +function parseDeclaredMetadata(source: string): Record { + const match = /export\s+const\s+metadata\s*=\s*(\{[\s\S]*?\});/.exec(source); + if (!match) { + return {}; + } + try { + return JSON.parse(match[1]) as Record; + } catch { + return {}; + } +} + +function assertSafePatchPolicyMatchesDeclared(runtimePolicy: WritePolicy, declaredPolicy: WritePolicy | undefined): void { + if (!declaredPolicy) { + throw new Error("dynamic safePatch rejected: metadata.safe_patch_policy is missing"); + } + if (JSON.stringify(runtimePolicy) !== JSON.stringify(declaredPolicy)) { + throw new Error("dynamic safePatch rejected: runtime write_policy does not match metadata.safe_patch_policy"); + } +} + +function expectStringArray(value: unknown, path: string): string[] { + if (!Array.isArray(value) || value.some((item) => typeof item !== "string" || item.length === 0)) { + throw new Error(`${path} must be an array of non-empty strings`); + } + return value; +} + +function asRecord(value: unknown, path: string): Record { + if (!value || typeof value !== "object" || Array.isArray(value)) { + throw new Error(`${path} must be an object`); + } + return value as Record; +} + +function safeArtifactName(value: string): string { + if (!/^[a-zA-Z0-9._-]+$/.test(value)) { + throw new Error("artifact name must be a simple file name"); + } + return value; +} + +async function writeJsonFile(path: string, value: unknown): Promise { + await writeFile(path, `${JSON.stringify(value, null, 2)}\n`); +} + +function summarizeDynamicReport(params: unknown): string { + const value = asRecord(params, "report.summarize params"); + const results = Array.isArray(value.results) ? value.results : []; + return `Dynamic workflow completed ${results.length} result${results.length === 1 ? "" : "s"}.`; +} + +function sha256(source: string): string { + return createHash("sha256").update(source).digest("hex"); +} + +function sha256ForFileSync(path: string): string { + return createHash("sha256").update(readFileSync(path)).digest("hex"); +} + +function safePositiveNumber(value: unknown, fallback: number): number { + const parsed = Number(value); + return Number.isFinite(parsed) && parsed > 0 ? parsed : fallback; +} + +type ChildRequest = + | { type: "event"; payload: Record } + | { type: "request"; id: number; method: string; params: unknown }; + +type AcornNode = { + type: string; + [key: string]: any; +}; + +const DYNAMIC_CHILD_SOURCE = String.raw` +import { createInterface } from "node:readline"; +import { pathToFileURL } from "node:url"; + +const target = process.env.CWF_TARGET; +if (!process.permission || typeof process.permission.has !== "function") { + throw new Error("dynamic-runtime-unavailable: Node Permission Model is not active"); +} +if (target && process.permission.has("fs.read", target)) { + throw new Error("dynamic-runtime-unavailable: child process unexpectedly has target repo read access"); +} + +let nextId = 1; +const pending = new Map(); +const parsedRuntimeMaxConcurrency = Number(process.env.CWF_MAX_CONCURRENCY); +const runtimeMaxConcurrency = Number.isFinite(parsedRuntimeMaxConcurrency) && parsedRuntimeMaxConcurrency > 0 ? parsedRuntimeMaxConcurrency : 1; +const rl = createInterface({ input: process.stdin }); +rl.on("line", (line) => { + const message = JSON.parse(line); + const entry = pending.get(message.id); + if (!entry) { + return; + } + pending.delete(message.id); + if (message.error) { + entry.reject(new Error(message.error)); + } else { + entry.resolve(message.result); + } +}); + +function request(method, params) { + const id = nextId++; + process.stdout.write(JSON.stringify({ type: "request", id, method, params }) + "\n"); + return new Promise((resolve, reject) => { + pending.set(id, { resolve, reject }); + }); +} + +function emit(payload) { + process.stdout.write(JSON.stringify({ type: "event", payload }) + "\n"); +} + +async function mapWithConcurrency(items, handler, options = {}) { + const requestedValue = Number(options.concurrency || 1); + const requested = Math.max(1, Number.isFinite(requestedValue) ? requestedValue : 1); + const optionMaxValue = Number(options.maxConcurrency); + const localMax = options.maxConcurrency === undefined || !Number.isFinite(optionMaxValue) + ? runtimeMaxConcurrency + : Math.min(optionMaxValue, runtimeMaxConcurrency); + const concurrency = Math.max(1, Math.min(requested, localMax, items.length || 1)); + if (requested > concurrency) { + emit({ type: "budget.concurrency_capped", requested, concurrency, max_concurrency: runtimeMaxConcurrency }); + } + const results = new Array(items.length); + let next = 0; + async function worker() { + while (next < items.length) { + const index = next++; + results[index] = await handler(items[index], index); + } + } + const workers = Array.from({ length: Math.min(concurrency, items.length) }, () => worker()); + await Promise.all(workers); + return results; +} + +const cwf = Object.freeze({ + git: Object.freeze({ + changedFiles: () => request("git.changedFiles", {}), + diff: () => request("git.diff", {}), + }), + agent: Object.freeze({ + run: (params) => request("agent.run", params), + }), + safePatch: Object.freeze({ + apply: (params) => request("safePatch.apply", params), + }), + artifacts: Object.freeze({ + write: (params) => request("artifacts.write", params), + }), + report: Object.freeze({ + summarize: (results) => request("report.summarize", { results }), + }), + map: (items, handler, options = {}) => mapWithConcurrency(items, handler, options), + event: (payload) => emit(payload), +}); + +const module = await import(pathToFileURL(process.argv[2]).href); +if (typeof module.default !== "function") { + throw new Error("dynamic workflow module must export a default function"); +} +const result = await module.default(cwf); +await request("report.final", result); +rl.close(); +process.exit(0); +`; diff --git a/src/run-store.ts b/src/run-store.ts index 8de355c..a6a0259 100644 --- a/src/run-store.ts +++ b/src/run-store.ts @@ -112,6 +112,20 @@ export class RunStore { await this.appendEvent("worker.updated", { worker: id, status, error }); } + async upsertWorker(id: string, status: WorkerStatus, error?: string): Promise { + await this.mutateState((state) => { + let worker = state.workers.find((item) => item.id === id); + if (!worker) { + worker = { id, status: "pending" }; + state.workers.push(worker); + } + applyStatusTimestamps(worker, status); + worker.status = status; + worker.error = error; + }); + await this.appendEvent("worker.updated", { worker: id, status, error }); + } + async waitAtGate(id: string, prompt: string): Promise { const state = await this.mutateState((draft) => { const phase = findPhase(draft, id); @@ -166,6 +180,7 @@ export class RunStore { async writeWorkerResult(result: WorkerResult): Promise { await this.writeJson(join("workers", `${result.worker_id}.json`), result); + await this.upsertWorker(result.worker_id, result.status, result.error); await this.appendEvent("worker.result", { worker: result.worker_id, status: result.status, diff --git a/src/types.ts b/src/types.ts index 48dceee..32467d9 100644 --- a/src/types.ts +++ b/src/types.ts @@ -274,6 +274,9 @@ export type WorkerRuntimeMetadata = { transcript_read: boolean; sandbox?: "read-only" | "workspace-write" | "danger-full-access"; approval_policy?: "never" | "on-request" | "on-failure" | "untrusted"; + model?: string; + model_provider?: string; + reasoning_effort?: string; worktree_path?: string; result_return_path?: "worker-envelope"; }; diff --git a/tests/cli-format.test.ts b/tests/cli-format.test.ts index ce3c5a4..a788530 100644 --- a/tests/cli-format.test.ts +++ b/tests/cli-format.test.ts @@ -11,12 +11,16 @@ describe("CLI output formatting", () => { expect(help).toContain("cwf workflows list"); expect(help).toContain("cwf workflows show "); expect(help).toContain("cwf run --target [--background]"); + expect(help).toContain("cwf dynamic list"); + expect(help).toContain("cwf dynamic save --id "); + expect(help).toContain("cwf dynamic run --target [--approve]"); expect(help).toContain("cwf desktop check"); expect(help).toContain("cwf desktop result [--thread ] [--new-thread] [--print]"); expect(help).toContain("cwf github-pr [--format comment|review] [--post --repo --pr ]"); expect(help).toContain('cwf suggest-workflow --goal "" [--target ] [--output ]'); expect(help).toContain("cwf suggest-workflow --from-run [--output ]"); expect(help).toContain("cwf run diff-review --target . --background"); + expect(help).toContain("cwf dynamic run change-summary --target ."); expect(help).toContain("cwf desktop result --print"); expect(help).toContain("cwf github-pr --format comment"); expect(help).toContain('cwf suggest-workflow --goal "Review docs changes" --target .'); diff --git a/tests/desktop-bridge.test.ts b/tests/desktop-bridge.test.ts index 5923dca..e24aac9 100644 --- a/tests/desktop-bridge.test.ts +++ b/tests/desktop-bridge.test.ts @@ -1,5 +1,5 @@ import { createHash } from "node:crypto"; -import { mkdtemp, readFile, rm } from "node:fs/promises"; +import { chmod, mkdtemp, readFile, rm, writeFile } from "node:fs/promises"; import { createServer, type Server, type Socket } from "node:net"; import { tmpdir } from "node:os"; import { join, resolve } from "node:path"; @@ -12,7 +12,9 @@ import { buildThreadReadRequest, buildThreadStartRequest, buildTurnStartRequest, + checkDesktopCapability, handleDesktopResult, + createStdioAppServerTransport, type AppServerTransport, } from "../src/desktop-bridge.js"; import { DEFAULT_FAILURE_POLICY } from "../src/run-index.js"; @@ -69,6 +71,23 @@ describe("desktop bridge", () => { }); }); + it("marks thread APIs unavailable when the schema lacks thread/read", async () => { + const root = await mkdtemp(join(tmpdir(), "cwf-capability-")); + cleanup.push(root); + const fakeCodex = join(root, "fake-codex"); + const fakeCli = join(root, "fake-codex.mjs"); + await writeFile(fakeCli, fakeCapabilityCodexScript(false)); + await writeFile(fakeCodex, `#!/bin/sh\nexec "${process.execPath}" "${fakeCli}" "$@"\n`); + await chmod(fakeCodex, 0o755); + + const capability = await checkDesktopCapability(fakeCodex); + + expect(capability.codex_cli_available).toBe(true); + expect(capability.schema_available).toBe(true); + expect(capability.required_methods["thread/read"]).toBe(false); + expect(capability.thread_apis_available).toBe(false); + }); + it("writes a local handoff prompt and records print metadata", async () => { const store = await createCompletedRun(); @@ -184,6 +203,76 @@ describe("desktop bridge", () => { } }); + it("posts through a spawned stdio app-server transport", async () => { + const root = await mkdtemp(join(tmpdir(), "cwf-stdio-")); + cleanup.push(root); + const fakeCodex = join(root, "fake-codex"); + const fakeServer = join(root, "fake-server.mjs"); + await writeFile(fakeServer, fakeStdioAppServerScript()); + await writeFile(fakeCodex, `#!/bin/sh\nexec "${process.execPath}" "${fakeServer}" "$@"\n`); + await chmod(fakeCodex, 0o755); + + const appServer = createStdioAppServerTransport(fakeCodex); + try { + await appServer.request("initialize", buildInitializeRequest().params); + await appServer.notify?.("initialized"); + const started = await appServer.request("thread/start", buildThreadStartRequest(createState()).params) as { thread?: { id?: string } }; + const turn = await appServer.request("turn/start", buildTurnStartRequest("thread_stdio", "hello", createState()).params) as { turn?: { id?: string } }; + + expect(started.thread?.id).toBe("thread_stdio"); + expect(turn.turn?.id).toBe("turn_stdio"); + } finally { + await appServer.close?.(); + } + }); + + it("escalates stdio app-server close when the child ignores SIGTERM", async () => { + const root = await mkdtemp(join(tmpdir(), "cwf-stdio-close-")); + cleanup.push(root); + const fakeCodex = join(root, "fake-codex"); + const fakeServer = join(root, "ignore-term-server.mjs"); + const pidFile = join(root, "pid.txt"); + await writeFile(fakeServer, fakeIgnoringStdioAppServerScript()); + await writeFile(fakeCodex, `#!/bin/sh\nexec "${process.execPath}" "${fakeServer}" "$@"\n`); + await chmod(fakeCodex, 0o755); + + const previousPidFile = process.env.CWF_FAKE_STDIO_PID_FILE; + process.env.CWF_FAKE_STDIO_PID_FILE = pidFile; + try { + const appServer = createStdioAppServerTransport(fakeCodex); + const pid = await waitForPidFile(pidFile); + expect(isProcessAlive(pid)).toBe(true); + + await appServer.close?.(); + await new Promise((resolve) => setTimeout(resolve, 100)); + + expect(isProcessAlive(pid)).toBe(false); + } finally { + if (previousPidFile === undefined) { + delete process.env.CWF_FAKE_STDIO_PID_FILE; + } else { + process.env.CWF_FAKE_STDIO_PID_FILE = previousPidFile; + } + } + }); + + it("fails stdio app-server requests when stdout exceeds the frame cap without a newline", async () => { + const root = await mkdtemp(join(tmpdir(), "cwf-stdio-buffer-")); + cleanup.push(root); + const fakeCodex = join(root, "fake-codex"); + const fakeServer = join(root, "no-newline-server.mjs"); + await writeFile(fakeServer, fakeOversizedStdoutAppServerScript()); + await writeFile(fakeCodex, `#!/bin/sh\nexec "${process.execPath}" "${fakeServer}" "$@"\n`); + await chmod(fakeCodex, 0o755); + + const appServer = createStdioAppServerTransport(fakeCodex); + try { + await expect(appServer.request("initialize", {})).rejects.toThrow("app-server stdio output exceeds"); + } finally { + await appServer.close?.(); + } + }); + it("keeps a posted result when thread/list misses the freshly created thread", async () => { const store = await createCompletedRun(); const appServer = new MockAppServer({ threadId: "thread_new", turnId: "turn_new", listedThreadId: "thread_other", readThreadId: "thread_other" }); @@ -351,6 +440,106 @@ function fakeResult(method: string): unknown { return {}; } +function fakeStdioAppServerScript(): string { + return `#!/usr/bin/env node +let buffer = ""; +process.stdin.setEncoding("utf8"); +process.stdin.on("data", (chunk) => { + buffer += chunk; + for (;;) { + const index = buffer.indexOf("\\n"); + if (index < 0) break; + const line = buffer.slice(0, index).trim(); + buffer = buffer.slice(index + 1); + if (!line) continue; + const message = JSON.parse(line); + if (!message.id) continue; + let result = {}; + if (message.method === "thread/start") result = { thread: { id: "thread_stdio" } }; + if (message.method === "turn/start") result = { turn: { id: "turn_stdio" } }; + process.stdout.write(JSON.stringify({ id: message.id, result }) + "\\n"); + } +}); +`; +} + +function fakeCapabilityCodexScript(includeThreadRead: boolean): string { + const methods = [ + "initialize", + "thread/start", + "thread/name/set", + "thread/list", + ...(includeThreadRead ? ["thread/read"] : []), + "turn/start", + ]; + return `#!/usr/bin/env node +import { mkdirSync, writeFileSync } from "node:fs"; +import { join } from "node:path"; + +const args = process.argv.slice(2); +if (args[0] === "--version") { + console.log("codex-cli test"); + process.exit(0); +} +if (args[0] === "app-server" && args[1] === "generate-json-schema") { + const out = args[args.indexOf("--out") + 1]; + mkdirSync(out, { recursive: true }); + writeFileSync(join(out, "ClientRequest.json"), JSON.stringify({ oneOf: ${JSON.stringify(methods.map((method) => ({ properties: { method: { const: method } } })))} })); + process.exit(0); +} +if (args[0] === "app-server" && args[1] === "daemon" && args[2] === "version") { + process.exit(1); +} +process.exit(1); +`; +} + +function fakeIgnoringStdioAppServerScript(): string { + return `#!/usr/bin/env node +import { writeFileSync } from "node:fs"; + +if (process.env.CWF_FAKE_STDIO_PID_FILE) { + writeFileSync(process.env.CWF_FAKE_STDIO_PID_FILE, String(process.pid)); +} +process.on("SIGTERM", () => {}); +process.stdin.resume(); +setInterval(() => {}, 1000); +`; +} + +function fakeOversizedStdoutAppServerScript(): string { + return `#!/usr/bin/env node +process.stdin.resume(); +process.stdout.write("x".repeat(5 * 1024 * 1024 + 1)); +setInterval(() => {}, 1000); +`; +} + +async function waitForPidFile(path: string): Promise { + const deadline = Date.now() + 1000; + while (Date.now() < deadline) { + try { + const pid = Number((await readFile(path, "utf8")).trim()); + if (Number.isInteger(pid) && pid > 0) { + return pid; + } + } catch { + // Keep polling until the child writes its pid. + } + await new Promise((resolve) => setTimeout(resolve, 25)); + } + throw new Error(`Timed out waiting for pid file: ${path}`); +} + +function isProcessAlive(pid: number): boolean { + try { + process.kill(pid, 0); + return true; + } catch { + return false; + } +} + function writeFakeFrame(socket: Socket, opcode: number, payload: Buffer): void { let header: Buffer; if (payload.length < 126) { @@ -455,6 +644,7 @@ function availableCapability(): DesktopCapabilitySummary { "thread/start": true, "thread/name/set": true, "thread/list": true, + "thread/read": true, "turn/start": true, }, thread_apis_available: true, diff --git a/tests/dynamic-workflow-registry.test.ts b/tests/dynamic-workflow-registry.test.ts new file mode 100644 index 0000000..5ee5005 --- /dev/null +++ b/tests/dynamic-workflow-registry.test.ts @@ -0,0 +1,115 @@ +import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import { dirname, join } from "node:path"; +import { afterEach, describe, expect, it } from "vitest"; +import { + formatDynamicWorkflowList, + formatDynamicWorkflowShow, + listDynamicWorkflowEntries, + resolveDynamicWorkflowReference, + saveDynamicWorkflow, +} from "../src/dynamic-workflow-registry.js"; + +const cleanup: string[] = []; + +afterEach(async () => { + while (cleanup.length > 0) { + const path = cleanup.pop(); + if (path) { + await rm(path, { recursive: true, force: true }); + } + } +}); + +describe("dynamic workflow registry", () => { + it("discovers packaged dynamic templates with declared metadata", async () => { + const { cwd, homeDir } = await createRoot(); + const path = join(cwd, "workflows", "dynamic", "local-review.workflow.js"); + await writeDynamicWorkflow(path, dynamicWorkflowSource({ id: "local-review", title: "Local Review" })); + + const entries = await listDynamicWorkflowEntries({ cwd, homeDir }); + const list = formatDynamicWorkflowList(entries); + const show = formatDynamicWorkflowShow(entries[0]); + + expect(entries).toEqual([ + expect.objectContaining({ + id: "local-review", + title: "Local Review", + trust_state: "packaged", + origin: "packaged", + }), + ]); + expect(list).toContain("local-review"); + expect(show).toContain("Trust: packaged"); + }); + + it("saves dynamic workflows with a local trust record and resolves by id", async () => { + const { cwd, homeDir } = await createRoot(); + const sourcePath = join(cwd, "draft.workflow.js"); + await writeDynamicWorkflow(sourcePath, dynamicWorkflowSource({ id: "draft-review", title: "Draft Review" })); + + const saved = await saveDynamicWorkflow({ + sourcePath, + id: "trusted-review", + cwd, + homeDir, + now: new Date("2026-06-07T00:00:00.000Z"), + }); + const trust = JSON.parse(await readFile(saved.path.replace(/\.workflow\.js$/, ".trust.json"), "utf8")) as { source_sha256: string }; + const resolved = await resolveDynamicWorkflowReference("trusted-review", { cwd, homeDir }); + + expect(saved).toEqual(expect.objectContaining({ id: "trusted-review", trust_state: "local-trust-record" })); + expect(trust.source_sha256).toBe(saved.source_sha256); + expect(resolved.path).toBe(saved.path); + expect(resolved.origin).toBe("local-trust-record"); + }); + + it("does not run untrusted local dynamic workflows by id", async () => { + const { cwd, homeDir } = await createRoot(); + const path = join(homeDir, ".codex-workflows", "dynamic", "loose.workflow.js"); + await writeDynamicWorkflow(path, dynamicWorkflowSource({ id: "loose", title: "Loose" })); + + await expect(resolveDynamicWorkflowReference("loose", { cwd, homeDir })).rejects.toThrow("untrusted-local"); + const explicit = await resolveDynamicWorkflowReference(path, { cwd, homeDir }); + expect(explicit.path).toBe(path); + }); + + it("rejects trust metadata SHA mismatches", async () => { + const { cwd, homeDir } = await createRoot(); + const sourcePath = join(cwd, "draft.workflow.js"); + await writeDynamicWorkflow(sourcePath, dynamicWorkflowSource({ id: "draft-review", title: "Draft Review" })); + const saved = await saveDynamicWorkflow({ sourcePath, id: "trusted-review", cwd, homeDir }); + await writeFile(saved.path, dynamicWorkflowSource({ id: "trusted-review", title: "Tampered" })); + + await expect(listDynamicWorkflowEntries({ cwd, homeDir })).rejects.toThrow("SHA mismatch"); + }); + + it("does not run remote dynamic workflow URLs directly", async () => { + await expect(resolveDynamicWorkflowReference("https://example.com/workflow.js")).rejects.toThrow("cannot run directly by URL"); + }); +}); + +async function createRoot(): Promise<{ cwd: string; homeDir: string }> { + const root = await mkdtemp(join(tmpdir(), "cwf-dynamic-registry-")); + cleanup.push(root); + return { cwd: join(root, "project"), homeDir: join(root, "home") }; +} + +async function writeDynamicWorkflow(path: string, value: string): Promise { + await mkdir(dirname(path), { recursive: true }); + await writeFile(path, value); +} + +function dynamicWorkflowSource(options: { id: string; title: string }): string { + return `export const metadata = { + "id": "${options.id}", + "title": "${options.title}", + "version": "1.0.0", + "permissions": ["read-only"] +}; + +export default async function workflow(cwf) { + return cwf.report.summarize([]); +} +`; +} diff --git a/tests/dynamic-workflow.test.ts b/tests/dynamic-workflow.test.ts new file mode 100644 index 0000000..a4d70e5 --- /dev/null +++ b/tests/dynamic-workflow.test.ts @@ -0,0 +1,558 @@ +import { execFile } from "node:child_process"; +import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import { basename, join } from "node:path"; +import { setTimeout as sleep } from "node:timers/promises"; +import { promisify } from "node:util"; +import { afterEach, describe, expect, it } from "vitest"; +import { + resumeDynamicWorkflow, + startDynamicWorkflow, + validateDynamicWorkflowSource, + type DynamicWorkerRunner, +} from "../src/dynamic-workflow.js"; +import { generateDynamicWorkflowFromIntent } from "../src/dynamic-workflow-generator.js"; +import type { ArtifactManifest, WorkerResult } from "../src/types.js"; + +const execFileAsync = promisify(execFile); +const cleanup: string[] = []; + +afterEach(async () => { + while (cleanup.length > 0) { + const path = cleanup.pop(); + if (path) { + await rm(path, { recursive: true, force: true }); + } + } +}); + +describe("dynamic workflow AST policy", () => { + it.each([ + ["import fs from 'node:fs'; export default async function workflow(cwf) { return cwf.report.summarize([]); }", "imports"], + ["export default async function workflow(cwf) { return process.env.HOME; }", "process"], + ["export default async function workflow(cwf) { return globalThis.process; }", "globalThis"], + ["export default async function workflow(cwf) { return (() => {}).constructor('return process')(); }", "constructor"], + ["export default async function workflow(cwf) { return fetch('https://example.com'); }", "fetch"], + ["export default async function workflow(cwf) { setTimeout(() => cwf.report.summarize([]), 0); }", "setTimeout"], + ["export default async function workflow(cwf) { queueMicrotask(() => cwf.report.summarize([])); }", "queueMicrotask"], + ["export default async function workflow(cwf) { return 'rm -rf /tmp/example'; }", "shell"], + ["export default async function workflow(cwf) { return `curl https://example.com`; }", "shell"], + ])("rejects forbidden source: %s", (source, expected) => { + expect(() => validateDynamicWorkflowSource(source)).toThrow(expected); + }); +}); + +describe("dynamic workflow runtime", () => { + it("generates a workflow from intent, writes preview metadata, and pauses before execution", async () => { + const target = await createGitRepoWithDiff(); + const output = join(await tempDir("cwf-generated-dynamic-"), "workflow.js"); + const generated = await generateDynamicWorkflowFromIntent({ + goal: "Audit this repo for auth risks and report only verified findings.", + output, + }); + + validateDynamicWorkflowSource(generated.source); + const store = await startDynamicWorkflow({ + scriptPath: generated.path, + target, + runsRoot: await runsRoot(), + origin: "generated-current-session", + parentPermissionCap: { sandbox: "read-only", approval_policy: "never" }, + preview: generated.preview, + }); + const state = await store.readState(); + const preview = await readFile(join(store.runDir, "artifacts", "dynamic-preview.md"), "utf8"); + const scriptCopy = await readFile(join(store.runDir, "artifacts", "workflow.js"), "utf8"); + + expect(state.status).toBe("waiting"); + expect(state.phases.find((phase) => phase.id === "approve-dynamic")?.status).toBe("waiting"); + expect(state.phases.find((phase) => phase.id === "dynamic-execute")?.status).toBe("pending"); + expect(preview).toContain("Audit this repo for auth risks"); + expect(preview).toContain("## Planned Agents"); + expect(preview).toContain("intent-review"); + expect(preview).toContain("## Write Intent"); + expect(preview).toContain("read-only"); + expect(preview).toContain("## Stop Rules"); + expect(scriptCopy).toContain("String.fromCodePoint"); + }); + + it("keeps shell-like goal text out of direct string literals", async () => { + const generated = await generateDynamicWorkflowFromIntent({ + goal: "Audit bash scripts and curl usage without running commands.", + output: join(await tempDir("cwf-generated-shell-goal-"), "workflow.js"), + }); + + expect(() => validateDynamicWorkflowSource(generated.source)).not.toThrow(); + expect(generated.source).not.toContain("bash scripts"); + expect(generated.source).not.toContain("curl usage"); + }); + + it("rejects oversized generation goals before writing output", async () => { + await expect(generateDynamicWorkflowFromIntent({ + goal: "x".repeat(2001), + output: join(await tempDir("cwf-generated-long-goal-"), "workflow.js"), + })).rejects.toThrow("2000 characters"); + }); + + it("fails closed instead of overwriting an existing generated workflow file", async () => { + const output = join(await tempDir("cwf-generated-collision-"), "workflow.js"); + await writeFile(output, "existing\n"); + + await expect(generateDynamicWorkflowFromIntent({ + goal: "Review current diff.", + output, + })).rejects.toThrow(/EEXIST|file already exists/i); + await expect(readFile(output, "utf8")).resolves.toBe("existing\n"); + }); + + it("uses a safe fallback slug for punctuation-only goals", async () => { + const generated = await generateDynamicWorkflowFromIntent({ + goal: "!!!", + suggestionsRoot: await tempDir("cwf-generated-slug-"), + now: new Date("2026-06-06T00:00:00.000Z"), + }); + + expect(basename(generated.path)).toBe("20260606000000-workflow.workflow.js"); + }); + + it("writes preview artifacts and pauses before starting agents", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export default async function workflow(cwf) { + const files = await cwf.git.changedFiles(); + return cwf.report.summarize(files); +} +`); + + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + const state = await store.readState(); + const preview = await readFile(join(store.runDir, "artifacts", "dynamic-preview.md"), "utf8"); + const scriptCopy = await readFile(join(store.runDir, "artifacts", "workflow.js"), "utf8"); + + expect(state.status).toBe("waiting"); + expect(state.phases.map((phase) => [phase.id, phase.status])).toEqual([ + ["collect", "completed"], + ["dynamic-preview", "completed"], + ["approve-dynamic", "waiting"], + ["dynamic-execute", "pending"], + ]); + expect(preview).toContain("Node Permission Model child process"); + expect(preview).toContain("SHA-256"); + expect(scriptCopy).toContain("cwf.git.changedFiles"); + }); + + it("executes an approved read-only workflow through cwf APIs", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export default async function workflow(cwf) { + const files = await cwf.git.changedFiles(); + const reviews = await cwf.map(files, async (file) => { + return cwf.agent.run({ + id: "review", + role: "reviewer", + prompt: "Review " + file, + permissions: "read-only" + }); + }, { concurrency: 2 }); + await cwf.artifacts.write({ name: "note.md", content: "dynamic artifact ok\\n" }); + return cwf.report.summarize(reviews); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store, workerRunner: fixtureWorker }); + + const state = await store.readState(); + const result = await store.readResult(); + const dynamicArtifact = await readFile(join(store.runDir, "artifacts", "dynamic-note.md"), "utf8"); + const worker = await readFile(join(store.runDir, "workers", "review.json"), "utf8"); + const manifest = JSON.parse(await readFile(join(store.runDir, "artifacts", "manifest.json"), "utf8")) as ArtifactManifest; + + expect(state.status).toBe("completed"); + expect(state.workers).toEqual(expect.arrayContaining([expect.objectContaining({ id: "review", status: "completed" })])); + expect(result).toContain("Dynamic JS Workflow Result"); + expect(dynamicArtifact).toBe("dynamic artifact ok\n"); + expect(worker).toContain('"sandbox": "read-only"'); + expect(manifest.artifacts.map((artifact) => artifact.id)).toEqual(expect.arrayContaining(["dynamic-script", "dynamic-preview", "dynamic-events", "worker:review"])); + }); + + it("rejects read-only dynamic agents that mutate the target", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export default async function workflow(cwf) { + return cwf.agent.run({ id: "mutator", role: "mutator", prompt: "mutate", permissions: "read-only" }); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + + await expect(resumeDynamicWorkflow({ store, workerRunner: mutatingWorker(target) })).rejects.toThrow("read-only-worker-violation"); + const state = await store.readState(); + expect(state.status).toBe("failed"); + }); + + it("rejects inherit-session when origin or parent cap is not trusted", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export default async function workflow(cwf) { + return cwf.agent.run({ id: "writer", role: "writer", prompt: "write", permissions: "inherit-session" }); +} +`); + const store = await startDynamicWorkflow({ + scriptPath: script, + target, + runsRoot: await runsRoot(), + origin: "copied-local", + parentPermissionCap: { sandbox: "workspace-write", approval_policy: "never" }, + }); + await store.approveGate("approve-dynamic"); + + await expect(resumeDynamicWorkflow({ store, workerRunner: fixtureWorker })).rejects.toThrow("inherit-session rejected"); + }); + + it("allows generated-current-session inherit-session under a write-capable parent cap", async () => { + const target = await createGitRepoWithDiff(); + let capturedSandbox: string | undefined; + let capturedApproval: string | undefined; + const capturingWorker: DynamicWorkerRunner = async (worker, _context, options) => { + capturedSandbox = options.sandboxMode; + capturedApproval = options.approvalPolicy; + return workerResult(worker.id, options.target); + }; + const script = await writeScript(` +export default async function workflow(cwf) { + return cwf.agent.run({ id: "writer", role: "writer", prompt: "write", permissions: "inherit-session" }); +} +`); + const store = await startDynamicWorkflow({ + scriptPath: script, + target, + runsRoot: await runsRoot(), + origin: "generated-current-session", + parentPermissionCap: { sandbox: "workspace-write", approval_policy: "never" }, + }); + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store, workerRunner: capturingWorker }); + + const worker = await readFile(join(store.runDir, "workers", "writer.json"), "utf8"); + const preview = await readFile(join(store.runDir, "artifacts", "dynamic-preview.md"), "utf8"); + + expect(capturedSandbox).toBe("workspace-write"); + expect(capturedApproval).toBe("never"); + expect(worker).toContain('"sandbox": "workspace-write"'); + expect(preview).toContain("Broad parent authority detected"); + }); + + it("applies dynamic safePatch through write policy and records artifacts", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export const metadata = { + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": ["test -f src/generated/value.js"] + } +}; + +export default async function workflow(cwf) { + return cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\\nnew file mode 100644\\nindex 0000000..42d3b06\\n--- /dev/null\\n+++ b/src/generated/value.js\\n@@ -0,0 +1 @@\\n+export const value = 42;\\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: ["test -f src/generated/value.js"] + } + }); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store }); + + const state = await store.readState(); + const generated = await readFile(join(target, "src", "generated", "value.js"), "utf8"); + const safePatch = await readFile(join(store.runDir, "artifacts", "dynamic-safe-patch.json"), "utf8"); + const result = await store.readResult(); + const manifest = JSON.parse(await readFile(join(store.runDir, "artifacts", "manifest.json"), "utf8")) as ArtifactManifest; + + expect(state.status).toBe("completed"); + expect(generated).toBe("export const value = 42;\n"); + expect(safePatch).toContain('"status": "passed"'); + expect(result).toContain("Target changed: yes"); + expect(manifest.artifacts.map((artifact) => artifact.id)).toEqual(expect.arrayContaining(["dynamic-proposed-patch", "dynamic-safe-patch"])); + }); + + it("rejects dynamic safePatch outside allowed paths and leaves target unchanged", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export const metadata = { + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": [] + } +}; + +export default async function workflow(cwf) { + return cwf.safePatch.apply({ + patch: "diff --git a/.env b/.env\\nnew file mode 100644\\nindex 0000000..f6420e8\\n--- /dev/null\\n+++ b/.env\\n@@ -0,0 +1 @@\\n+SECRET=value\\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: [] + } + }); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + + await expect(resumeDynamicWorkflow({ store })).rejects.toThrow("outside allowed_paths"); + const state = await store.readState(); + const safePatch = await readFile(join(store.runDir, "artifacts", "dynamic-safe-patch.json"), "utf8"); + + await expect(readFile(join(target, ".env"), "utf8")).rejects.toThrow(); + expect(state.status).toBe("failed"); + expect(safePatch).toContain('"status": "failed"'); + }); + + it("rejects dynamic safePatch when its write policy was not declared in metadata", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export default async function workflow(cwf) { + return cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\\nnew file mode 100644\\nindex 0000000..42d3b06\\n--- /dev/null\\n+++ b/src/generated/value.js\\n@@ -0,0 +1 @@\\n+export const value = 42;\\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: [] + } + }); +} +`); + + await expect(startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() })).rejects.toThrow("metadata.safe_patch_policy"); + }); + + it("rejects dynamic safePatch when runtime policy widens the previewed metadata policy", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export const metadata = { + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": [] + } +}; + +export default async function workflow(cwf) { + return cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\\nnew file mode 100644\\nindex 0000000..42d3b06\\n--- /dev/null\\n+++ b/src/generated/value.js\\n@@ -0,0 +1 @@\\n+export const value = 42;\\n", + write_policy: { + mode: "patch", + allowed_paths: ["**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: [] + } + }); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + + await expect(resumeDynamicWorkflow({ store })).rejects.toThrow("does not match metadata.safe_patch_policy"); + await expect(readFile(join(target, "src", "generated", "value.js"), "utf8")).rejects.toThrow(); + }); + + it("rolls back dynamic safePatch when verification fails and records rollback evidence", async () => { + const target = await createGitRepoWithDiff(); + const script = await writeScript(` +export const metadata = { + "safe_patch_policy": { + "mode": "patch", + "allowed_paths": ["src/generated/**"], + "forbidden_paths": [".env", ".git", ".git/**"], + "verification_commands": ["test -f src/generated/missing.js"] + } +}; + +export default async function workflow(cwf) { + return cwf.safePatch.apply({ + patch: "diff --git a/src/generated/value.js b/src/generated/value.js\\nnew file mode 100644\\nindex 0000000..42d3b06\\n--- /dev/null\\n+++ b/src/generated/value.js\\n@@ -0,0 +1 @@\\n+export const value = 42;\\n", + write_policy: { + mode: "patch", + allowed_paths: ["src/generated/**"], + forbidden_paths: [".env", ".git", ".git/**"], + verification_commands: ["test -f src/generated/missing.js"] + } + }); +} +`); + const store = await startDynamicWorkflow({ scriptPath: script, target, runsRoot: await runsRoot() }); + await store.approveGate("approve-dynamic"); + + await expect(resumeDynamicWorkflow({ store })).rejects.toThrow("safePatch verification failed"); + const state = await store.readState(); + const safePatch = await readFile(join(store.runDir, "artifacts", "dynamic-safe-patch.json"), "utf8"); + + await expect(readFile(join(target, "src", "generated", "value.js"), "utf8")).rejects.toThrow(); + expect(state.status).toBe("failed"); + expect(safePatch).toContain('"status": "failed"'); + expect(safePatch).toContain('"rollback"'); + expect(safePatch).toContain('"status": "passed"'); + }); + + it("caps dynamic map concurrency to the workflow budget", async () => { + const target = await createGitRepoWithDiff(); + let active = 0; + let maxActive = 0; + const delayedWorker: DynamicWorkerRunner = async (worker, _context, options) => { + active += 1; + maxActive = Math.max(maxActive, active); + await sleep(25); + active -= 1; + return workerResult(worker.id, options.target); + }; + const script = await writeScript(` +export default async function workflow(cwf) { + const items = [0, 1, 2, 3, 4]; + const reviews = await cwf.map(items, async (item) => { + return cwf.agent.run({ + id: "parallel-" + item, + role: "reviewer", + prompt: "Review item " + item, + permissions: "read-only" + }); + }, { concurrency: 99 }); + return cwf.report.summarize(reviews); +} +`); + const store = await startDynamicWorkflow({ + scriptPath: script, + target, + runsRoot: await runsRoot(), + budget: { max_concurrency: 2 }, + }); + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store, workerRunner: delayedWorker }); + + const state = await store.readState(); + const events = await readFile(join(store.runDir, "artifacts", "dynamic-events.jsonl"), "utf8"); + + expect(maxActive).toBeLessThanOrEqual(2); + expect(state.workers.filter((worker) => worker.status === "completed")).toHaveLength(5); + expect(events).toContain("budget.concurrency_capped"); + }); + + it("uses the default dynamic concurrency budget when none is provided", async () => { + const target = await createGitRepoWithDiff(); + let active = 0; + let maxActive = 0; + const delayedWorker: DynamicWorkerRunner = async (worker, _context, options) => { + active += 1; + maxActive = Math.max(maxActive, active); + await sleep(25); + active -= 1; + return workerResult(worker.id, options.target); + }; + const script = await writeScript(` +export default async function workflow(cwf) { + const items = [0, 1, 2, 3, 4]; + const reviews = await cwf.map(items, async (item) => { + return cwf.agent.run({ + id: "default-budget-" + item, + role: "reviewer", + prompt: "Review item " + item, + permissions: "read-only" + }); + }, { concurrency: 99 }); + return cwf.report.summarize(reviews); +} +`); + const store = await startDynamicWorkflow({ + scriptPath: script, + target, + runsRoot: await runsRoot(), + }); + await store.approveGate("approve-dynamic"); + await resumeDynamicWorkflow({ store, workerRunner: delayedWorker }); + + expect(maxActive).toBeGreaterThan(1); + expect(maxActive).toBeLessThanOrEqual(4); + }); +}); + +async function createGitRepoWithDiff(): Promise { + const target = await mkdtemp(join(tmpdir(), "cwf-dynamic-target-")); + cleanup.push(target); + await mkdir(join(target, "src"), { recursive: true }); + await writeFile(join(target, "package.json"), `${JSON.stringify({ name: "fixture", version: "0.0.0" }, null, 2)}\n`); + await writeFile(join(target, "src", "calc.js"), "export const answer = 42;\n"); + await git(target, ["init"]); + await git(target, ["config", "user.email", "codex-workflows@example.invalid"]); + await git(target, ["config", "user.name", "codex-workflows"]); + await git(target, ["add", "."]); + await git(target, ["commit", "-m", "baseline"]); + await writeFile(join(target, "src", "calc.js"), "export const answer = 0;\n"); + return target; +} + +async function writeScript(source: string): Promise { + const dir = await mkdtemp(join(tmpdir(), "cwf-dynamic-script-")); + cleanup.push(dir); + const path = join(dir, "workflow.js"); + await writeFile(path, source.trimStart()); + return path; +} + +async function runsRoot(): Promise { + const tmpRoot = join(process.cwd(), ".tmp"); + await mkdir(tmpRoot, { recursive: true }); + const root = await mkdtemp(join(tmpRoot, "cwf-dynamic-runs-")); + cleanup.push(root); + return root; +} + +async function tempDir(prefix: string): Promise { + const dir = await mkdtemp(join(tmpdir(), prefix)); + cleanup.push(dir); + return dir; +} + +async function git(cwd: string, args: string[]): Promise { + await execFileAsync("git", args, { cwd }); +} + +const fixtureWorker: DynamicWorkerRunner = async (worker, _context, options) => workerResult(worker.id, options.target); + +function mutatingWorker(target: string): DynamicWorkerRunner { + return async (worker, _context, options) => { + await writeFile(join(target, "src", "calc.js"), "export const answer = 100;\n"); + return workerResult(worker.id, options.target); + }; +} + +function workerResult(workerId: string, target: string): WorkerResult { + return { + worker_id: workerId, + status: "completed", + confidence: "high", + summary: `reviewed ${target}`, + findings: [], + verification: ["fixture worker"], + artifacts: [], + started_at: "2026-01-01T00:00:00.000Z", + completed_at: "2026-01-01T00:00:01.000Z", + duration_ms: 1000, + prompt: `mock ${workerId}`, + raw: "{}", + raw_fallback: false, + retry_count: 0, + }; +} diff --git a/tests/run-store.test.ts b/tests/run-store.test.ts index 2d65e6d..b71e9ef 100644 --- a/tests/run-store.test.ts +++ b/tests/run-store.test.ts @@ -3,7 +3,7 @@ import { tmpdir } from "node:os"; import { join } from "node:path"; import { afterEach, describe, expect, it } from "vitest"; import { RunStore } from "../src/run-store.js"; -import type { WorkflowSpec } from "../src/types.js"; +import type { WorkerResult, WorkflowSpec } from "../src/types.js"; const cleanup: string[] = []; @@ -98,4 +98,48 @@ describe("RunStore", () => { ["safety", "running"], ]); }); + + it("preserves concurrent dynamic worker result upserts", async () => { + const root = await mkdtemp(join(tmpdir(), "cwf-runs-")); + cleanup.push(root); + const dynamicSpec: WorkflowSpec = { + ...spec, + id: "dynamic-js", + phases: [{ id: "collect", kind: "command" }], + }; + const store = await RunStore.create(dynamicSpec, process.cwd(), root); + + await Promise.all( + Array.from({ length: 5 }, (_, index) => store.writeWorkerResult(workerResult(`dynamic-${index}`))), + ); + const state = await store.readState(); + + expect(state.workers.map((worker) => worker.id).sort()).toEqual([ + "dynamic-0", + "dynamic-1", + "dynamic-2", + "dynamic-3", + "dynamic-4", + ]); + expect(state.workers.every((worker) => worker.status === "completed")).toBe(true); + }); }); + +function workerResult(workerId: string): WorkerResult { + return { + worker_id: workerId, + status: "completed", + confidence: "high", + summary: "ok", + findings: [], + verification: [], + artifacts: [], + started_at: "2026-01-01T00:00:00.000Z", + completed_at: "2026-01-01T00:00:01.000Z", + duration_ms: 1000, + prompt: "test", + raw: "{}", + raw_fallback: false, + retry_count: 0, + }; +} diff --git a/tests/worker-adapter.test.ts b/tests/worker-adapter.test.ts index a02fe8e..14871b3 100644 --- a/tests/worker-adapter.test.ts +++ b/tests/worker-adapter.test.ts @@ -1,3 +1,8 @@ +import { createHash } from "node:crypto"; +import { chmod, mkdir, mkdtemp, readFile, rm, symlink, writeFile } from "node:fs/promises"; +import { createServer, type Server, type Socket } from "node:net"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; import { describe, expect, it, vi } from "vitest"; import { WorkerAdapterUnavailableError, @@ -100,6 +105,146 @@ describe("worker adapters", () => { expect(result.runtime?.transcript_read).toBe(false); }); + it("uses the spawned stdio app-server transport by default", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-stdio-")); + const fakeCodex = join(dir, "fake-codex"); + const fakeServer = join(dir, "fake-app-server.mjs"); + const methodLog = join(dir, "methods.log"); + await writeFile(fakeServer, fakeStdioWorkerAppServerScript()); + await writeFile(fakeCodex, `#!/bin/sh\nexec "${process.execPath}" "${fakeServer}" "$@"\n`); + await chmod(fakeCodex, 0o755); + + try { + await withEnv( + { + CWF_FAKE_STDIO_LOG: methodLog, + CWF_APP_THREAD_TRANSPORT: undefined, + }, + async () => { + const result = await runWorkerWithAdapter(worker, context, { + target: "/repo", + timeoutMs: 5000, + runtime: { + preferred_worker_adapter: "codex-app-thread", + }, + codexPath: fakeCodex, + capability: availableCapability(), + }); + + expect(result.status).toBe("completed"); + expect(result.summary).toBe("stdio app-thread ok"); + expect(result.runtime).toEqual( + expect.objectContaining({ + adapter: "codex-app-thread", + thread_id: "thread_stdio", + turn_id: "turn_stdio_worker", + transcript_read: true, + }), + ); + const methods = await readFile(methodLog, "utf8"); + expect(methods.match(/^thread\/start /gm)).toHaveLength(2); + expect(methods.match(/^turn\/start /gm)).toHaveLength(2); + expect(methods).toContain('"outputSchema"'); + expect(methods).toContain('"cwf-app-thread-ok"'); + expect(methods).toContain("thread/read"); + expect(methods.match(/"model":"gpt-5\.3-codex-spark"/g)).toHaveLength(4); + expect(methods.match(/"effort":"low"/g)).toHaveLength(2); + }, + ); + } finally { + await rm(dir, { recursive: true, force: true }); + } + }); + + it("uses the daemon app-server socket when explicitly requested", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-daemon-")); + const socketPath = join(dir, "app-server.sock"); + const fakeAppServer = await startFakeWorkerWebSocketAppServer(socketPath); + try { + await withEnv( + { + CWF_APP_THREAD_EXECUTION_PREFLIGHT: "0", + CWF_APP_THREAD_TRANSPORT: "daemon", + CWF_APP_SERVER_SOCKET: socketPath, + }, + async () => { + const result = await runWorkerWithAdapter(worker, context, { + target: "/repo", + timeoutMs: 1000, + runtime: { + preferred_worker_adapter: "codex-app-thread", + }, + capability: availableCapability(), + }); + + expect(result.status).toBe("completed"); + expect(result.summary).toBe("daemon app-thread ok"); + expect(result.runtime).toEqual( + expect.objectContaining({ + adapter: "codex-app-thread", + thread_id: "thread_daemon_worker", + turn_id: "turn_daemon_worker", + transcript_read: true, + }), + ); + expect(fakeAppServer.methods).toContain("thread/read"); + }, + ); + } finally { + await fakeAppServer.close(); + await rm(dir, { recursive: true, force: true }); + } + }); + + it("defaults app-thread workers to the Codex Spark quota lane", async () => { + await withAppThreadModelEnv({}, async () => { + const appServer = new FakeWorkerAppServer(); + + const result = await runWorkerWithAdapter(worker, context, { + target: "/repo", + timeoutMs: 1000, + runtime: { + preferred_worker_adapter: "codex-app-thread", + }, + appServer, + capability: availableCapability(), + }); + + expect(result.status).toBe("completed"); + expect(appServer.threadStartParams[0]).toEqual(expect.objectContaining({ model: "gpt-5.3-codex-spark" })); + expect(appServer.turnStartParams[0]).toEqual(expect.objectContaining({ model: "gpt-5.3-codex-spark", effort: "low" })); + expect(result.runtime).toEqual( + expect.objectContaining({ + model: "gpt-5.3-codex-spark", + reasoning_effort: "low", + }), + ); + }); + }); + + it("can opt back into the host default app-thread model", async () => { + await withAppThreadModelEnv({ CWF_APP_THREAD_MODEL: "host-default" }, async () => { + const appServer = new FakeWorkerAppServer(); + + const result = await runWorkerWithAdapter(worker, context, { + target: "/repo", + timeoutMs: 1000, + runtime: { + preferred_worker_adapter: "codex-app-thread", + }, + appServer, + capability: availableCapability(), + }); + + expect(result.status).toBe("completed"); + expect(appServer.threadStartParams[0]).not.toEqual(expect.objectContaining({ model: expect.any(String) })); + expect(appServer.turnStartParams[0]).not.toEqual(expect.objectContaining({ model: expect.any(String) })); + expect(appServer.turnStartParams[0]).not.toEqual(expect.objectContaining({ effort: expect.any(String) })); + expect(result.runtime?.model).toBeUndefined(); + expect(result.runtime?.reasoning_effort).toBeUndefined(); + }); + }); + it("keeps direct turn output even when the overall deadline is exhausted before read", async () => { const now = vi.spyOn(Date, "now"); const times = [0, 0, 1, 2, 3, 4, 5, 11, 12, 12]; @@ -143,6 +288,59 @@ describe("worker adapters", () => { expect(result.runtime).toEqual(expect.objectContaining({ adapter: "codex-app-thread", fallback_used: false })); expect(appServer.threadStarts).toBe(2); expect(appServer.workerThreadStarted).toBe(true); + expect(appServer.turnStartParams[0]).toEqual( + expect.objectContaining({ + input: [{ type: "text", text: 'Return exactly {"probe":"cwf-app-thread-ok"} and nothing else.', text_elements: [] }], + outputSchema: expect.objectContaining({ + type: "object", + additionalProperties: false, + required: ["probe"], + }), + }), + ); + }); + + it("passes explicit app-thread model settings to the execution probe and worker turn", async () => { + const previousModel = process.env.CWF_APP_THREAD_MODEL; + const previousProvider = process.env.CWF_APP_THREAD_MODEL_PROVIDER; + const previousEffort = process.env.CWF_APP_THREAD_REASONING_EFFORT; + process.env.CWF_APP_THREAD_MODEL = "gpt-5.1"; + process.env.CWF_APP_THREAD_MODEL_PROVIDER = "openai"; + process.env.CWF_APP_THREAD_REASONING_EFFORT = "low"; + try { + const appServer = new JsonProbeThenWorkerAppServer(); + + const result = await runWorkerWithAdapter(worker, context, { + target: "/repo", + timeoutMs: 1000, + workflowId: "diff-review", + runId: "run_model_override", + runtime: { + preferred_worker_adapter: "codex-app-thread", + }, + appServerFactory: () => appServer, + capability: availableCapability(), + }); + + expect(result.status).toBe("completed"); + expect(appServer.threadStartParams).toHaveLength(2); + expect(appServer.turnStartParams).toHaveLength(2); + expect(appServer.threadStartParams[0]).toEqual(expect.objectContaining({ model: "gpt-5.1", modelProvider: "openai" })); + expect(appServer.threadStartParams[1]).toEqual(expect.objectContaining({ model: "gpt-5.1", modelProvider: "openai" })); + expect(appServer.turnStartParams[0]).toEqual(expect.objectContaining({ model: "gpt-5.1", effort: "low" })); + expect(appServer.turnStartParams[1]).toEqual(expect.objectContaining({ model: "gpt-5.1", effort: "low" })); + expect(result.runtime).toEqual( + expect.objectContaining({ + model: "gpt-5.1", + model_provider: "openai", + reasoning_effort: "low", + }), + ); + } finally { + restoreEnv("CWF_APP_THREAD_MODEL", previousModel); + restoreEnv("CWF_APP_THREAD_MODEL_PROVIDER", previousProvider); + restoreEnv("CWF_APP_THREAD_REASONING_EFFORT", previousEffort); + } }); it("falls back from app-thread to the SDK adapter when configured", async () => { @@ -247,6 +445,194 @@ describe("worker adapters", () => { expect(result.runtime?.fallback_reason).toContain("turn_id=turn_empty"); }); + it("falls back with model-channel diagnostics when app-thread credits are unavailable", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-test-")); + const previousCodexHome = process.env.CODEX_HOME; + process.env.CODEX_HOME = dir; + const sessionPath = join(dir, "sessions", "2026", "06", "07", "rollout-zero-credits.jsonl"); + await mkdir(join(dir, "sessions", "2026", "06", "07"), { recursive: true }); + await writeFile( + sessionPath, + [ + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_other_before", model: "unrelated-before", effort: "high" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_other_before", rate_limits: { credits: { has_credits: true, balance: "999" } } } }), + JSON.stringify({ type: "turn_context", payload: { model: "unscoped-model", effort: "xhigh" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", rate_limits: { credits: { has_credits: false, balance: "777" } } } }), + JSON.stringify({ type: "event_msg", payload: { type: "task_complete", last_agent_message: "unscoped" } }), + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_zero_credits", model: "gpt-5.4-mini", effort: "low" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_zero_credits", rate_limits: { credits: { has_credits: false, balance: "0" } } } }), + JSON.stringify({ type: "event_msg", payload: { type: "task_complete", turn_id: "turn_zero_credits", last_agent_message: null } }), + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_other_after", model: "unrelated-after", effort: "xhigh" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_other_after", rate_limits: { credits: { has_credits: true, balance: "1000" } } } }), + ].join("\n"), + ); + try { + const result = await runWorkerWithAdapter( + worker, + context, + { + target: "/repo", + timeoutMs: 10, + runtime: { + preferred_worker_adapter: "codex-app-thread", + fallback_worker_adapter: "codex-sdk-headless", + }, + appServer: new ZeroCreditsWorkerAppServer(sessionPath), + capability: availableCapability(), + }, + fallbackRegistry(), + ); + + expect(result.status).toBe("completed"); + expect(result.runtime?.adapter).toBe("codex-sdk-headless"); + expect(result.runtime?.fallback_reason).toContain("quota_unavailable=true"); + expect(result.runtime?.fallback_reason).not.toContain("balance=0"); + expect(result.runtime?.fallback_reason).not.toContain("unrelated-before"); + expect(result.runtime?.fallback_reason).not.toContain("unrelated-after"); + expect(result.runtime?.fallback_reason).not.toContain("unscoped-model"); + expect(result.runtime?.fallback_reason).not.toContain("balance=777"); + expect(result.runtime?.fallback_reason).toContain("model=gpt-5.4-mini"); + expect(result.runtime?.fallback_reason).toContain("session_log=rollout-zero-credits.jsonl"); + expect(result.runtime?.fallback_reason).not.toContain(dir); + } finally { + restoreEnv("CODEX_HOME", previousCodexHome); + await rm(dir, { recursive: true, force: true }); + } + }); + + it("does not read app-thread diagnostics from paths outside Codex sessions", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-test-")); + const outsidePath = join(dir, "rollout-outside.jsonl"); + await writeFile( + outsidePath, + [ + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_zero_credits", model: "gpt-5.4-mini", effort: "low" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_zero_credits", rate_limits: { credits: { has_credits: false, balance: "0" } } } }), + JSON.stringify({ type: "event_msg", payload: { type: "task_complete", turn_id: "turn_zero_credits", last_agent_message: null } }), + ].join("\n"), + ); + try { + const result = await runWorkerWithAdapter( + worker, + context, + { + target: "/repo", + timeoutMs: 10, + runtime: { + preferred_worker_adapter: "codex-app-thread", + fallback_worker_adapter: "codex-sdk-headless", + }, + appServer: new ZeroCreditsWorkerAppServer(outsidePath), + capability: availableCapability(), + }, + fallbackRegistry(), + ); + + expect(result.status).toBe("completed"); + expect(result.runtime?.fallback_reason).toContain("session_log=rollout-outside.jsonl"); + expect(result.runtime?.fallback_reason).not.toContain("quota_unavailable=true"); + expect(result.runtime?.fallback_reason).not.toContain("model=gpt-5.4-mini"); + expect(result.runtime?.fallback_reason).not.toContain(dir); + } finally { + await rm(dir, { recursive: true, force: true }); + } + }); + + it("does not follow symlinked app-thread diagnostics outside Codex sessions", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-symlink-")); + const outsidePath = join(dir, "outside-secret.jsonl"); + const linkedPath = join(dir, "sessions", "2026", "06", "07", "linked.jsonl"); + await mkdir(join(dir, "sessions", "2026", "06", "07"), { recursive: true }); + await writeFile( + outsidePath, + [ + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_zero_credits", model: "outside-secret-model", effort: "low" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_zero_credits", rate_limits: { credits: { has_credits: false, balance: "0" } } } }), + ].join("\n"), + ); + await symlink(outsidePath, linkedPath); + try { + await withEnv({ CODEX_HOME: dir }, async () => { + const result = await runWorkerWithAdapter( + worker, + context, + { + target: "/repo", + timeoutMs: 10, + runtime: { + preferred_worker_adapter: "codex-app-thread", + fallback_worker_adapter: "codex-sdk-headless", + }, + appServer: new ZeroCreditsWorkerAppServer(linkedPath), + capability: availableCapability(), + }, + fallbackRegistry(), + ); + + expect(result.status).toBe("completed"); + expect(result.runtime?.fallback_reason).toContain("session_log=linked.jsonl"); + expect(result.runtime?.fallback_reason).not.toContain("outside-secret-model"); + expect(result.runtime?.fallback_reason).not.toContain("quota_unavailable=true"); + expect(result.runtime?.fallback_reason).not.toContain(outsidePath); + }); + } finally { + await rm(dir, { recursive: true, force: true }); + } + }); + + it("limits app-thread diagnostics to a safe session-log tail", async () => { + const dir = await mkdtemp(join(tmpdir(), "cwf-worker-tail-")); + const sessionPath = join(dir, "sessions", "2026", "06", "07", "rollout-tail.jsonl"); + await mkdir(join(dir, "sessions", "2026", "06", "07"), { recursive: true }); + const unrelated = Array.from({ length: 40 }, (_, index) => JSON.stringify({ + type: "turn_context", + payload: { turn_id: `turn_unrelated_${index}`, model: `unrelated-model-${index}`, effort: "high" }, + })); + await writeFile( + sessionPath, + [ + ...unrelated, + JSON.stringify({ type: "turn_context", payload: { turn_id: "turn_zero_credits", model: "gpt-tail-target", effort: "low" } }), + JSON.stringify({ type: "event_msg", payload: { type: "token_count", turn_id: "turn_zero_credits", rate_limits: { credits: { has_credits: false, balance: "0" } } } }), + JSON.stringify({ type: "event_msg", payload: { type: "task_complete", turn_id: "turn_zero_credits", last_agent_message: null } }), + ].join("\n"), + ); + try { + await withEnv( + { + CODEX_HOME: dir, + CWF_APP_THREAD_DIAGNOSTICS_MAX_BYTES: "700", + }, + async () => { + const result = await runWorkerWithAdapter( + worker, + context, + { + target: "/repo", + timeoutMs: 10, + runtime: { + preferred_worker_adapter: "codex-app-thread", + fallback_worker_adapter: "codex-sdk-headless", + }, + appServer: new ZeroCreditsWorkerAppServer(sessionPath), + capability: availableCapability(), + }, + fallbackRegistry(), + ); + + expect(result.status).toBe("completed"); + expect(result.runtime?.fallback_reason).toContain("model=gpt-tail-target"); + expect(result.runtime?.fallback_reason).toContain("quota_unavailable=true"); + expect(result.runtime?.fallback_reason).not.toContain("balance=0"); + expect(result.runtime?.fallback_reason).not.toContain("unrelated-model-0"); + expect(result.runtime?.fallback_reason).not.toContain(dir); + }, + ); + } finally { + await rm(dir, { recursive: true, force: true }); + } + }); + it("falls back when the real app-thread worker request hangs", async () => { const appServer = new HangingWorkerStartAppServer(); const result = await runWorkerWithAdapter( @@ -291,7 +677,7 @@ describe("worker adapters", () => { ); expect(result.status).toBe("completed"); - expect(result.runtime?.fallback_reason).toContain("thread/start timed out after 25ms"); + expect(result.runtime?.fallback_reason).toMatch(/thread\/start timed out after \d+ms/); expect(result.runtime?.fallback_reason).not.toContain("NaN"); } finally { if (previousTimeout === undefined) { @@ -437,7 +823,8 @@ describe("worker adapters", () => { expect(result.runtime?.fallback_reason).toContain("app-thread-execution-unavailable"); expect(result.runtime?.fallback_reason).toContain("thread APIs are available, but the model execution channel did not return a readable assistant response"); expect(result.runtime?.fallback_reason).toContain("last thread/read error"); - expect(result.runtime?.fallback_reason).toContain("fake thread/read failed"); + expect(result.runtime?.fallback_reason).toContain("thread-read-failed"); + expect(result.runtime?.fallback_reason).not.toContain("fake thread/read failed"); expect(result.runtime?.fallback_reason).toContain("thread_id=thread_read_error"); expect(result.runtime?.fallback_reason).toContain("turn_id=turn_read_error"); }); @@ -938,10 +1325,13 @@ describe("worker adapters", () => { class FakeWorkerAppServer { readonly methods: string[] = []; threadName = ""; + threadStartParams: unknown[] = []; + turnStartParams: unknown[] = []; async request(method: string, params?: unknown): Promise { this.methods.push(method); if (method === "thread/start") { + this.threadStartParams.push(params); return { thread: { id: "thread_correctness" } }; } if (method === "thread/name/set") { @@ -949,6 +1339,7 @@ class FakeWorkerAppServer { return {}; } if (method === "turn/start") { + this.turnStartParams.push(params); return { turn: { id: "turn_correctness" } }; } if (method === "thread/read") { @@ -1027,6 +1418,32 @@ class EmptyWorkerAppServer { async notify(_method: string): Promise {} } +class ZeroCreditsWorkerAppServer { + constructor(private readonly sessionPath: string) {} + + async request(method: string): Promise { + if (method === "thread/start") { + return { thread: { id: "thread_zero_credits" } }; + } + if (method === "turn/start") { + return { turn: { id: "turn_zero_credits" } }; + } + if (method === "thread/read") { + return { + thread: { + id: "thread_zero_credits", + status: { type: "systemError" }, + path: this.sessionPath, + turns: [{ id: "turn_zero_credits" }], + }, + }; + } + return {}; + } + + async notify(_method: string): Promise {} +} + class HangingWorkerStartAppServer { closeCalled = false; @@ -1070,10 +1487,13 @@ class HangingCloseWorkerAppServer extends FakeWorkerAppServer { class JsonProbeThenWorkerAppServer { threadStarts = 0; workerThreadStarted = false; + threadStartParams: unknown[] = []; + turnStartParams: unknown[] = []; async request(method: string, params?: unknown): Promise { if (method === "thread/start") { this.threadStarts += 1; + this.threadStartParams.push(params); if (this.threadStarts > 1) { this.workerThreadStarted = true; return { thread: { id: "thread_worker_after_probe" } }; @@ -1081,6 +1501,7 @@ class JsonProbeThenWorkerAppServer { return { thread: { id: "thread_json_probe" } }; } if (method === "turn/start") { + this.turnStartParams.push(params); return { turn: { id: this.workerThreadStarted ? "turn_worker_after_probe" : "turn_json_probe" } }; } if (method === "thread/read") { @@ -1324,6 +1745,265 @@ function completed(workerId: string, runtime?: WorkerResult["runtime"]): WorkerR }; } +async function withEnv(values: Record, run: () => Promise): Promise { + const previous = Object.fromEntries(Object.keys(values).map((name) => [name, process.env[name]])) as Record; + try { + for (const [name, value] of Object.entries(values)) { + restoreEnv(name, value); + } + await run(); + } finally { + for (const [name, value] of Object.entries(previous)) { + restoreEnv(name, value); + } + } +} + +function restoreEnv(name: string, value: string | undefined): void { + if (value === undefined) { + delete process.env[name]; + return; + } + process.env[name] = value; +} + +async function withAppThreadModelEnv(values: Record, run: () => Promise): Promise { + const names = ["CWF_APP_THREAD_MODEL", "CWF_APP_THREAD_MODEL_PROVIDER", "CWF_APP_THREAD_REASONING_EFFORT"]; + await withEnv(Object.fromEntries(names.map((name) => [name, values[name]])), run); +} + +function fakeStdioWorkerAppServerScript(): string { + return `#!/usr/bin/env node +import { appendFileSync } from "node:fs"; + +const logPath = process.env.CWF_FAKE_STDIO_LOG; +let buffer = ""; +let mode = "worker"; + +process.stdin.setEncoding("utf8"); +process.stdin.on("data", (chunk) => { + buffer += chunk; + for (;;) { + const index = buffer.indexOf("\\n"); + if (index < 0) break; + const line = buffer.slice(0, index).trim(); + buffer = buffer.slice(index + 1); + if (!line) continue; + const message = JSON.parse(line); + if (!message.id) continue; + if (logPath) appendFileSync(logPath, message.method + " " + JSON.stringify(message.params ?? {}) + "\\n"); + let result = {}; + if (message.method === "thread/start") { + result = { thread: { id: "thread_stdio" } }; + } + if (message.method === "turn/start") { + mode = message.params?.outputSchema ? "probe" : "worker"; + result = { turn: { id: mode === "probe" ? "turn_stdio_probe" : "turn_stdio_worker" } }; + } + if (message.method === "thread/read") { + if (mode === "probe") { + result = { + thread: { + id: "thread_stdio", + turns: [ + { + id: "turn_stdio_probe", + finalResponse: "{\\"probe\\":\\"cwf-app-thread-ok\\"}" + } + ] + } + }; + } else { + result = { + thread: { + id: "thread_stdio", + turns: [ + { + id: "turn_stdio_worker", + finalResponse: JSON.stringify({ + worker_id: "correctness", + summary: "stdio app-thread ok", + findings: [], + verification: ["stdio transport selected by adapter default"], + artifacts: [], + confidence: "high" + }) + } + ] + } + }; + } + } + process.stdout.write(JSON.stringify({ id: message.id, result }) + "\\n"); + } +}); +`; +} + +async function startFakeWorkerWebSocketAppServer(socketPath: string): Promise<{ methods: string[]; close(): Promise }> { + const methods: string[] = []; + const sockets = new Set(); + const server = createServer((socket) => { + sockets.add(socket); + socket.on("close", () => sockets.delete(socket)); + handleFakeWorkerWebSocket(socket, methods); + }); + await listen(server, socketPath); + return { + methods, + async close() { + for (const socket of sockets) { + socket.destroy(); + } + await closeServer(server); + }, + }; +} + +function handleFakeWorkerWebSocket(socket: Socket, methods: string[]): void { + let buffer = Buffer.alloc(0); + let handshaken = false; + + socket.on("data", (chunk) => { + buffer = Buffer.concat([buffer, chunk]); + if (!handshaken) { + const headerEnd = buffer.indexOf("\r\n\r\n"); + if (headerEnd < 0) { + return; + } + const header = buffer.subarray(0, headerEnd).toString("utf8"); + buffer = buffer.subarray(headerEnd + 4); + const key = /^Sec-WebSocket-Key:\s*(.+)\s*$/im.exec(header)?.[1]?.trim(); + const accept = createHash("sha1") + .update(`${key ?? ""}258EAFA5-E914-47DA-95CA-C5AB0DC85B11`) + .digest("base64"); + socket.write([ + "HTTP/1.1 101 Switching Protocols", + "Upgrade: websocket", + "Connection: Upgrade", + `Sec-WebSocket-Accept: ${accept}`, + "", + "", + ].join("\r\n")); + handshaken = true; + } + buffer = readFakeWorkerFrames(socket, buffer, methods); + }); +} + +function readFakeWorkerFrames(socket: Socket, input: Buffer, methods: string[]): Buffer { + let buffer = input; + while (buffer.length >= 2) { + const opcode = buffer[0] & 0x0f; + const masked = (buffer[1] & 0x80) !== 0; + let length = buffer[1] & 0x7f; + let offset = 2; + if (length === 126) { + if (buffer.length < offset + 2) { + return buffer; + } + length = buffer.readUInt16BE(offset); + offset += 2; + } else if (length === 127) { + if (buffer.length < offset + 8) { + return buffer; + } + length = Number(buffer.readBigUInt64BE(offset)); + offset += 8; + } + const mask = masked ? buffer.subarray(offset, offset + 4) : undefined; + offset += masked ? 4 : 0; + if (buffer.length < offset + length) { + return buffer; + } + const payload = Buffer.from(buffer.subarray(offset, offset + length)); + buffer = buffer.subarray(offset + length); + if (mask) { + for (let index = 0; index < payload.length; index += 1) { + payload[index] ^= mask[index % 4]; + } + } + if (opcode === 0x1) { + handleFakeWorkerMessage(socket, payload.toString("utf8"), methods); + } else if (opcode === 0x8) { + writeFakeWorkerFrame(socket, 0x8, Buffer.alloc(0)); + socket.end(); + return buffer; + } + } + return buffer; +} + +function handleFakeWorkerMessage(socket: Socket, raw: string, methods: string[]): void { + const message = JSON.parse(raw) as { id?: number; method?: string }; + if (typeof message.id !== "number" || !message.method) { + return; + } + methods.push(message.method); + let result: unknown = {}; + if (message.method === "thread/start") { + result = { thread: { id: "thread_daemon_worker" } }; + } + if (message.method === "turn/start") { + result = { turn: { id: "turn_daemon_worker" } }; + } + if (message.method === "thread/read") { + result = { + thread: { + id: "thread_daemon_worker", + turns: [ + { + id: "turn_daemon_worker", + finalResponse: JSON.stringify({ + worker_id: "correctness", + summary: "daemon app-thread ok", + findings: [], + verification: ["daemon transport selected by explicit env"], + artifacts: [], + confidence: "high", + }), + }, + ], + }, + }; + } + writeFakeWorkerFrame(socket, 0x1, Buffer.from(JSON.stringify({ id: message.id, result }), "utf8")); +} + +function writeFakeWorkerFrame(socket: Socket, opcode: number, payload: Buffer): void { + let header: Buffer; + if (payload.length < 126) { + header = Buffer.from([0x80 | opcode, payload.length]); + } else if (payload.length <= 0xffff) { + header = Buffer.alloc(4); + header[0] = 0x80 | opcode; + header[1] = 126; + header.writeUInt16BE(payload.length, 2); + } else { + header = Buffer.alloc(10); + header[0] = 0x80 | opcode; + header[1] = 127; + header.writeBigUInt64BE(BigInt(payload.length), 2); + } + socket.write(Buffer.concat([header, payload])); +} + +async function listen(server: Server, socketPath: string): Promise { + await new Promise((resolveListen, rejectListen) => { + server.once("error", rejectListen); + server.listen(socketPath, () => { + server.off("error", rejectListen); + resolveListen(); + }); + }); +} + +async function closeServer(server: Server): Promise { + await new Promise((resolveClose) => { + server.close(() => resolveClose()); + }); +} + function failed(workerId: string, runtime?: WorkerResult["runtime"]): WorkerResult { return { worker_id: workerId, @@ -1356,6 +2036,7 @@ function availableCapability(): DesktopCapabilitySummary { "thread/start": true, "thread/name/set": true, "thread/list": true, + "thread/read": true, "turn/start": true, }, thread_apis_available: true, diff --git a/workflows/dynamic/change-summary.workflow.js b/workflows/dynamic/change-summary.workflow.js new file mode 100644 index 0000000..2e29edb --- /dev/null +++ b/workflows/dynamic/change-summary.workflow.js @@ -0,0 +1,20 @@ +export const metadata = { + "id": "change-summary", + "title": "Change Summary", + "version": "1.0.0", + "permissions": ["read-only"] +}; + +export default async function workflow(cwf) { + const changedFiles = await cwf.git.changedFiles(); + const diff = await cwf.git.diff(); + await cwf.artifacts.write({ + name: "change-summary.md", + content: "# Change Summary\n\nChanged files JSON:\n\n```json\n" + JSON.stringify(changedFiles, null, 2) + "\n```\n\nDiff bytes: " + diff.length + "\n" + }); + return { + template: "change-summary", + changed_file_count: changedFiles.length, + diff_bytes: diff.length + }; +} diff --git a/workflows/dynamic/docs-change-check.workflow.js b/workflows/dynamic/docs-change-check.workflow.js new file mode 100644 index 0000000..81a2afc --- /dev/null +++ b/workflows/dynamic/docs-change-check.workflow.js @@ -0,0 +1,26 @@ +export const metadata = { + "id": "docs-change-check", + "title": "Docs Change Check", + "version": "1.0.0", + "permissions": ["read-only"] +}; + +export default async function workflow(cwf) { + const changedFiles = await cwf.git.changedFiles(); + const diff = await cwf.git.diff(); + let docsFiles = 0; + for (const file of changedFiles) { + if (String(file).startsWith("docs/") || file === "README.md" || file === "README.zh-CN.md") { + docsFiles += 1; + } + } + await cwf.artifacts.write({ + name: "docs-change-check.md", + content: "# Docs Change Check\n\nChanged files JSON:\n\n```json\n" + JSON.stringify(changedFiles, null, 2) + "\n```\n\nDocs-like files: " + docsFiles + "\n\nDiff bytes: " + diff.length + "\n" + }); + return { + template: "docs-change-check", + docs_file_count: docsFiles, + changed_file_count: changedFiles.length + }; +}