Ralph the Riveter

A focused reimplementation of the Ralph autonomous coding loop in Rust. Slimmer than the original bash version, with pluggable agents (codex or claude), per-iteration jj commits, and in-place execution (no branch switching).

The binary is named riveter (the tool); the methodology it implements is Ralph (one focused story per fresh agent context, persistent state on disk, jj as the review surface).

1. Goals

Single binary riveter written in Rust.
Pluggable agents: codex, claude, or mock (built-in test agent) selected via flag/config.
Pluggable model + reasoning: --model, --thinking <low|med|high>.
Commits via jj (jujutsu), not git. One jj commit per loop iteration after the agent finishes.
In-place operation: never switch branches/bookmarks; work on @ directly.
Strict separation of concerns: a skill (run in an agent session) converts a spec into a PRD and writes a fresh run folder with a unique runId. The CLI knows nothing about specs — it only loads an existing run folder and executes the loop.
PRD as TOML (prd.toml) — human-editable, comment-friendly.
Integration-testable: a built-in mock agent with hardcoded deterministic behavior so end-to-end tests don't need an external binary or script.

2. Non-Goals

No archiving / .last-branch bookkeeping.
No multi-repo / multi-worktree orchestration.
No web UI or daemon mode.
No retry/backoff sophistication. Agent failures stop the loop.

3. Project Layout

ralph-the-riveter/
├── Cargo.toml                  # workspace
├── .gitignore
├── .jjignore
├── README.md
├── rust-toolchain.toml
├── crates/
│   └── riveter/                # main binary (cli name: `riveter`)
│       ├── Cargo.toml
│       ├── src/
│       │   ├── main.rs
│       │   ├── cli.rs          # clap definitions + Agent/Model/Thinking enums
│       │   ├── agent.rs        # trait Agent + Codex/Claude/Mock impls
│       │   ├── jj.rs           # thin wrapper over `jj` CLI
│       │   ├── loop_.rs        # iteration loop + termination check
│       │   ├── run_dir.rs      # run folder layout + runId resolution
│       │   ├── prompt.rs       # iteration prompt template (include_str! + replace)
│       │   └── prd.rs          # load/parse/write prd.toml
│       └── tests/
│           └── integration.rs  # uses --agent mock + tempdir + jj
├── skills/
│   └── prd/SKILL.md            # spec -> PRD + run folder (non-interactive)

4. CLI

The CLI consumes pre-built run folders. It has no --spec flag and no PRD-generation logic — that's the skill's job.

riveter run      -r <RUN_ID> [OPTIONS]   # execute the loop
riveter validate -r <RUN_ID>             # check a run folder for correctness

For inspecting / listing runs, the user uses cat, ls, and jj log — no dedicated subcommands.

`riveter validate`

Checks that a run folder is well-formed and ready to execute. Designed to be invoked by skills (and CI) so authoring mistakes are caught before any agent iteration is wasted.

riveter validate -r <RUN_ID>

What it checks:

Filesystem layout — prd.toml and spec.md exist.
prd.toml schema — parseable; all required fields (description, createdAt, stories) present and typed.
Stories — at least one story; id is a positive integer, unique, sequential 1..N with no gaps; title ≤ 80 chars (so the commit subject fits in 72 after prefix); acceptanceCriteria non-empty; passes is a bool.

Output:

$ riveter validate -r task-priority-a1b2c3d4
✓ filesystem layout
✗ prd.toml
  - stories[1].id: expected 2, found 5 (ids must be sequential 1..N)
  - stories[2].acceptanceCriteria: empty
2 errors

Exit codes: 0 ok, 30 validation errors, 31 filesystem missing, 32 I/O error.

Skill integration. The prd skill runs riveter validate -r <runId> after generating the folder, parses the human output (one error per - line), fixes each error (re-writes prd.toml), and re-runs until exit 0 — up to 3 attempts. If validation still fails, the skill surfaces the remaining errors to the user and refuses to print the "ready to run" message.

`riveter run`

riveter run -r <RUN_ID> [OPTIONS]

Required:
  -r, --run <RUN_ID>              run id (or full path to a run folder)

Options:
  -a, --agent <codex|claude|mock> [default: codex]
  -m, --model <STRING>            [default: gpt-5]
  -t, --thinking <low|med|high>   [default: high]
  -n, --max-iterations <N>        [default: 10]
  -v, --verbose
  -h, --help
  -V, --version

<RUN_ID> is resolved as: (1) absolute/relative path if it contains /, (2) <state-dir>/runs/<RUN_ID> otherwise. Errors out if the folder is missing or prd.toml doesn't exist.

All flags are also readable from env (RIVETER_AGENT, RIVETER_MODEL, RIVETER_THINKING, RIVETER_MAX_ITERATIONS). Env-only knobs: RIVETER_STATE_DIR (override the state dir, used by tests) and RIVETER_AGENT_BIN (override the agent executable, used by tests for codex/claude shims).

Typical workflow

# 1. In an agent session, invoke the `prd` skill with your spec.
#    The skill generates the PRD, creates the run folder, validates it,
#    and tells you the runId.

# 2. (Optional) inspect / edit the generated PRD.
$EDITOR ~/.local/state/riveter/runs/task-priority-a1b2c3d4/prd.toml

# 3. Run the loop in-place against the current jj repo.
riveter run -r task-priority-a1b2c3d4 -a codex -m gpt-5 -t high

# 4. Resume later (same command - skips stories already passing).
riveter run -r task-priority-a1b2c3d4

5. Agent Abstraction

pub trait Agent {
    fn name(&self) -> &'static str;
    fn run(&self, prompt: &str, opts: &RunOpts) -> Result<AgentOutput>;
}

pub struct RunOpts<'a> {
    pub model: Option<&'a str>,
    pub thinking: Thinking,   // Low | Med | High
    pub cwd: &'a Path,
}

pub struct AgentOutput {
    pub stdout: String,
    pub stderr: String,
    pub exit_code: i32,
}

Codex impl invokes codex exec --model <m> --reasoning <thinking> … (exact flags pinned in code with comments). Claude impl invokes claude --dangerously-skip-permissions --print --model <m> …. Both stream output live to stderr (via tee-style passthrough) while capturing stdout for logging.

Mock impl is hardcoded in the binary for integration tests. On each call it:

Reads prd.toml from cwd.
Picks the first story with passes = false.
Touches riveter-mock-<storyId>.txt in cwd (so jj st has changes).
Rewrites prd.toml setting that story's passes = true.
Exits 0.

After enough invocations the mock walks the entire PRD and the loop terminates naturally (all passes = true). No env vars, no scripts.

6. Loop Behavior

Per iteration:

Load PRD. Read <run-dir>/prd.toml.
Terminate? If every story already has passes = true, exit 0 (status=ok).
Pick story. Riveter selects the first story (array order) with passes = false and injects it into the prompt. The agent only sees one story per iteration — keeping its context tiny. The prompt instructs the agent to set that story's passes = true in prd.toml when its acceptance criteria are met.
Run agent. Pipe the rendered prompt to the selected agent; tee live output to console (line-prefixed │ ) and to iterations/NNN/stdout.log.
Commit. If jj st shows changes, Riveter runs jj describe -m <msg> then jj new. No bookmark moves, no branch switching. Message format:
```
[RIVETER(<runId>,#<storyId>,<model>)] chore: <story title>
```
Example:
```
[RIVETER(task-priority-a1b2c3d4,#1,gpt-5)] chore: Add priority field to tasks table
```
Always chore: <story title> — no parsing of agent output, no fallback. Reviewers can jj describe to rewrite the subject if they prefer a different conventional-commit type. Grep with jj log -r 'description(glob:"[RIVETER*")'.
Empty work-copy = failure. If jj st is clean after the agent exits 0, treat it as clean_workcopy and stop. The agent should have produced changes; if it didn't, something's wrong and we don't want to silently re-prompt.
Loop back to step 1, up to --max-iterations.

After --max-iterations without all stories passing → exit 20.

Termination is driven solely by prd.toml state: when every story has passes = true, the loop exits ok. There is no special "COMPLETE" signal in agent output. Story selection happens in Riveter, not the agent.

No resume support. If a run is interrupted mid-iteration (Ctrl-C, crash), the partial commit and partially-updated prd.toml are left on disk. Re-running riveter run -r <id> does not auto-clean: it re-reads prd.toml as-is and proceeds from the first passes = false story. If the user wants a clean retry, they should jj abandon <commit> and edit prd.toml themselves.

7. Skills

A single skill, plain markdown so any agent (codex/claude/amp) can load it. The skill is the spec→PRD pipeline. The CLI never sees a spec.

`skills/prd/SKILL.md`

Triggered by "create a prd", "spec out", "plan this feature", "convert this spec".

The skill is non-interactive: it does not ask the user clarifying questions. It converts whatever spec it is given into a structured PRD. The user can always edit prd.toml afterwards.

Steps:

Take the user's spec (free-form description or full PRD markdown).
Compute a runId = <kebab-of-feature-title>-<random-8-alphanumeric> (lowercase [a-z0-9-]+, ≤64 chars). Random suffix so re-running the skill on the same spec creates a new run rather than colliding.
Create the run folder at <state-dir>/runs/<runId>/. State dir convention: $XDG_STATE_HOME/riveter (Linux), ~/Library/Application Support/riveter (macOS), %LOCALAPPDATA%\riveter (Windows).
Write:
- spec.md — the user's original input, verbatim
- prd.toml — the structured PRD Riveter consumes (schema in §7b)
Validate. Run riveter validate -r <runId>. If errors are reported, patch the offending files (typically prd.toml) and re-validate. Retry up to 3 times. If validation still fails, show the errors to the user and stop — do not print the "ready to run" message.
Print to the user (only after validation passes):

Created run task-priority-a1b2c3d4 at ~/.local/state/riveter/runs/task-priority-a1b2c3d4. Review prd.toml, then run: riveter run -r task-priority-a1b2c3d4

The skill is markdown executed by an LLM, so it's free to compute the runId, call mkdir, and write TOML directly. It does not need a Rust helper. The CLI's only contract with skills is: read prd.toml from a run folder.

The skill handles both free-form specs and pre-written PRDs uniformly. If the input already contains ## User Stories, the skill preserves them; otherwise it generates them from the description.

7b. `prd.toml` Format

description = "Add high/medium/low priority to tasks (default medium); show colored badges; filter by priority."
createdAt   = "2026-05-24T11:42:07Z"

[[stories]]
id          = 1
title       = "Add priority field to tasks table"
passes      = false
acceptanceCriteria = [
  "Add priority column: 'high' | 'medium' | 'low' (default 'medium')",
  "Generate and run migration successfully",
  "Typecheck passes",
]

[[stories]]
id          = 2
title       = "Display priority badge on task cards"
passes      = false
acceptanceCriteria = [
  "Each task card shows colored badge (red=high, yellow=medium, gray=low)",
  "Typecheck passes",
]

description is the top-level context the agent sees on every iteration (alongside the picked story) — it tells it what the project is and what it's building. createdAt is recorded for traceability. The folder name is the runId; no runId field needed inside the file.

Story execution order is the array order. id is a stable integer (1-indexed, sequential, no gaps).
The only per-story state Riveter mutates is passes. Reviewers can flip passes = false to re-run a story.
TOML chosen over JSON because: comments survive round-trips, multiline arrays read well for criteria, humans actually edit this file between iterations.

7c. Run Layout & Output Files

The run folder is the run identity. It lives outside the project under test so agent transcripts never pollute the working repo.

$XDG_STATE_HOME/riveter/runs/<runId>/
  ├── spec.md               # snapshot of the original spec input (verbatim)
  ├── prd.toml              # structured PRD — the single source of truth Riveter reads
  └── iterations/
      ├── 001/
      │   ├── prompt.txt    # exact bytes piped to the agent
      │   ├── stdout.log
      │   ├── stderr.log
      │   └── exit.txt      # agent exit code
      ├── 002/
      └── ...

Default state location: dirs::state_dir(). Override with RIVETER_STATE_DIR.
riveter run prints the run dir path on stderr first so users can tail -f iterations/.../stdout.log immediately.
Iteration dirs are append-only across re-runs (numbering continues 003, 004, …); they never overwrite a prior iteration.

To list runs: ls $XDG_STATE_HOME/riveter/runs/. To see PRD progress: cat $XDG_STATE_HOME/riveter/runs/<runId>/prd.toml (grep for passes =). To find the most recent run: ls -t $XDG_STATE_HOME/riveter/runs/ | head -1.

7d. Console Output (what the user sees)

Default (-v adds debug lines, -vv adds every shell command):

riveter 0.1.0
run:     ~/.local/state/riveter/runs/task-priority-a1b2c3d4
agent:   codex   model: gpt-5   thinking: high
cwd:     /home/me/proj   (jj repo @ kwptlqqv)
prd:     4 stories, 1 passing, 3 remaining

============================================================
  iteration 3/10  ·  #2 "Display priority badge on task cards"
============================================================
[agent] streaming to iterations/003/stdout.log ...
│ Reading prd.toml...
│ Implementing priority badge component...
│ (live agent output prefixed with "│ ")
[agent] exit 0 in 47.2s
[jj] 3 files changed -> commit zptlqqvr
     [RIVETER(task-priority-a1b2c3d4,#2,gpt-5)] chore: Display priority badge on task cards
[prd] story #2 marked passes=true (2/4)

============================================================
  iteration 4/10  ·  #3 "Add priority selector to task edit"
============================================================
...

[done] all stories passing after 4 iterations (3m12s)
       run dir: ~/.local/state/riveter/runs/task-priority-a1b2c3d4
       review:  jj log -r 'description(glob:"\\[RIVETER*")'

Live agent output is line-prefixed (│ ). With --quiet, only iteration headers, errors, and the final summary are printed; full transcripts always land in the run dir.

7e. Failure Handling

Trigger	Exit
All stories `passes = true`	`0`
Agent exits non-zero	`10`
Agent exits 0 but `jj st` is clean	`12`
Any `jj` subcommand fails	`13`
`prd.toml` missing/invalid after iteration	`14`
Hit `--max-iterations` with stories still pending	`20`
SIGINT/SIGTERM	`130`

Rules:

No silent rollback of jj commits. Bad commits are left in place; the user can jj abandon <id> to discard them.
No resume. Re-running riveter run -r <runId> does not clean up partial work from a prior interrupted run; it just re-reads prd.toml and proceeds. Iteration numbering continues from the highest existing NNN.
Signal handling. SIGINT during an agent call: wait up to 10s for graceful shutdown, then SIGKILL the process group; exit 130.

8. Built-in `mock` Agent for Integration Tests

The mock agent is selected with --agent mock. Its behavior is hardcoded in the binary (no env vars, no scripts, no separate crate). On each invocation it:

Reads prd.toml from the run folder.
Picks the first story with passes = false.
Touches riveter-mock-<storyId>.txt in the working copy (so jj st sees changes).
Rewrites prd.toml to set that story's passes = true.
Exits 0.

After N iterations (where N = number of stories), every story is passing and the loop terminates with exit 0. This is enough to test the loop, the jj integration, the iteration numbering, and the per-iteration log files.

Sample integration test

#[test]
fn loop_walks_prd_to_completion() {
    let tmp = tempdir().unwrap();
    init_jj(tmp.path());
    let run_id = seed_run(tmp.path(), &demo_prd_with_3_stories());

    let exit = run_riveter(&[
        "run", "-r", &run_id, "--agent", "mock", "--max-iterations", "10",
    ], tmp.path());

    assert_eq!(exit, 0);
    assert_eq!(jj_log_count(tmp.path()), 3);   // one commit per story
    assert!(all_stories_passing(tmp.path(), &run_id));
}

9. Fresh Project Setup

cargo new --bin crates/riveter, then add a workspace root Cargo.toml.
rust-toolchain.toml pinning a recent stable (e.g. 1.85.0).

.gitignore:

/target
/scratch/
*.tmp
**/*.log
.DS_Store

.jjignore mirrors the above for jj.
Dependencies (intentionally minimal):
- clap (derive) — CLI
- serde + toml — prd.toml (read/write)
- dirs — cross-platform state dir
- rand — runId random suffix
- anyhow — errors
- tracing + tracing-subscriber — logs
- tempfile, assert_cmd, predicates — dev-only, for integration tests

The iteration prompt template is embedded via include_str!("../templates/prompt.md") and rendered with str::replace — no templating engine needed.

9a. README Contents

The top-level README.md must include a "Why Ralph the Riveter" section explaining the rationale, because the design only makes sense once you understand the trade. Outline:

Why Ralph the Riveter

Fresh context per story. Each iteration spawns a new agent process with no memory of prior iterations. The agent's working set is exactly one user story + the codebase + the progress log — never the full transcript of past attempts. This keeps the context window small and focused, which is the single biggest lever on agent quality and cost.
Persistent state lives on disk, not in-context. prd.toml tracks which stories are done — the agent reads it (plus its assigned story) at the start of every run and flips passes = true when the story's acceptance criteria are met.
jj is the review surface. Because every iteration is one jj commit on @, the user reviews Riveter's work the same way they review their own: jj log, jj diff -r <id>, jj abandon <id> to reject, jj squash/jj split to reshape. No branch dance, no PR ceremony, no merge conflicts with your in-flight work.
Pluggable agents. Codex and Claude are interchangeable — pick the model that's good at the kind of work the spec needs. Same loop, same artifacts, same review flow.

Also include:

Quick start — two phases: (1) run the prd skill in an agent session against your spec → it prints a runId; (2) riveter run -r <runId> from the project's jj repo.
Reviewing a run (jj log -r 'description(glob:"\[RIVETER*")')
Rejecting bad iterations (jj abandon <commit>, optionally edit prd.toml to flip passes back to false)
Tuning for cost vs. quality (the --thinking/--model matrix, and a note that defaults are deliberately on the expensive end)
--agent mock usage (deterministic agent for hacking on Riveter itself)

10. Milestones

M1 — Scaffold. Workspace, .gitignore, .jjignore, clap CLI parsing, no-op loop.
M2 — riveter validate. Schema + filesystem checks; unit tests against good/bad fixtures.
M3 — Agent trait + Claude impl. Real claude invocation, output tee'd to per-iteration log files.
M4 — jj integration. Commit per iteration, in-place.
M5 — Codex impl + model/thinking flags.
M6 — Built-in mock agent + first integration test.
M7 — prd skill authored — non-interactive spec→PRD conversion + validate auto-fix loop.
M8 — Docs (README) and a small example PRD end-to-end run.

11. Open Questions

Codex flag mapping. Confirm exact codex exec flags for --thinking and --model for the target version.
Repo / workspace folder name. Working tree is currently rivet-ralph/ on disk. Rename to ralph-the-riveter/ to match the project name, or leave it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ralph the Riveter

1. Goals

2. Non-Goals

3. Project Layout

4. CLI

`riveter validate`

`riveter run`

Typical workflow

5. Agent Abstraction

6. Loop Behavior

7. Skills

`skills/prd/SKILL.md`

7b. `prd.toml` Format

7c. Run Layout & Output Files

7d. Console Output (what the user sees)

7e. Failure Handling

8. Built-in `mock` Agent for Integration Tests

Sample integration test

9. Fresh Project Setup

9a. README Contents

Why Ralph the Riveter

10. Milestones

11. Open Questions

FilesExpand file tree

PROPOSAL.md

Latest commit

History

PROPOSAL.md

File metadata and controls

Ralph the Riveter

1. Goals

2. Non-Goals

3. Project Layout

4. CLI

riveter validate

riveter run

Typical workflow

5. Agent Abstraction

6. Loop Behavior

7. Skills

skills/prd/SKILL.md

7b. prd.toml Format

7c. Run Layout & Output Files

7d. Console Output (what the user sees)

7e. Failure Handling

8. Built-in mock Agent for Integration Tests

Sample integration test

9. Fresh Project Setup

9a. README Contents

Why Ralph the Riveter

10. Milestones

11. Open Questions

`riveter validate`

`riveter run`

`skills/prd/SKILL.md`

7b. `prd.toml` Format

8. Built-in `mock` Agent for Integration Tests