A focused reimplementation of the Ralph autonomous coding loop in Rust. Slimmer than the original bash version, with pluggable agents (codex or claude), per-iteration jj commits, and in-place execution (no branch switching).
The binary is named riveter (the tool); the methodology it implements is Ralph (one focused story per fresh agent context, persistent state on disk, jj as the review surface).
- Single binary
riveterwritten in Rust. - Pluggable agents:
codex,claude, ormock(built-in test agent) selected via flag/config. - Pluggable model + reasoning:
--model,--thinking <low|med|high>. - Commits via
jj(jujutsu), not git. Onejj commitper loop iteration after the agent finishes. - In-place operation: never switch branches/bookmarks; work on
@directly. - Strict separation of concerns: a skill (run in an agent session) converts a spec into a PRD and writes a fresh run folder with a unique
runId. The CLI knows nothing about specs — it only loads an existing run folder and executes the loop. - PRD as TOML (
prd.toml) — human-editable, comment-friendly. - Integration-testable: a built-in
mockagent with hardcoded deterministic behavior so end-to-end tests don't need an external binary or script.
- No archiving /
.last-branchbookkeeping. - No multi-repo / multi-worktree orchestration.
- No web UI or daemon mode.
- No retry/backoff sophistication. Agent failures stop the loop.
ralph-the-riveter/
├── Cargo.toml # workspace
├── .gitignore
├── .jjignore
├── README.md
├── rust-toolchain.toml
├── crates/
│ └── riveter/ # main binary (cli name: `riveter`)
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs
│ │ ├── cli.rs # clap definitions + Agent/Model/Thinking enums
│ │ ├── agent.rs # trait Agent + Codex/Claude/Mock impls
│ │ ├── jj.rs # thin wrapper over `jj` CLI
│ │ ├── loop_.rs # iteration loop + termination check
│ │ ├── run_dir.rs # run folder layout + runId resolution
│ │ ├── prompt.rs # iteration prompt template (include_str! + replace)
│ │ └── prd.rs # load/parse/write prd.toml
│ └── tests/
│ └── integration.rs # uses --agent mock + tempdir + jj
├── skills/
│ └── prd/SKILL.md # spec -> PRD + run folder (non-interactive)
The CLI consumes pre-built run folders. It has no --spec flag and no PRD-generation logic — that's the skill's job.
riveter run -r <RUN_ID> [OPTIONS] # execute the loop
riveter validate -r <RUN_ID> # check a run folder for correctness
For inspecting / listing runs, the user uses cat, ls, and jj log — no dedicated subcommands.
Checks that a run folder is well-formed and ready to execute. Designed to be invoked by skills (and CI) so authoring mistakes are caught before any agent iteration is wasted.
riveter validate -r <RUN_ID>
What it checks:
- Filesystem layout —
prd.tomlandspec.mdexist. prd.tomlschema — parseable; all required fields (description,createdAt,stories) present and typed.- Stories — at least one story;
idis a positive integer, unique, sequential 1..N with no gaps;title≤ 80 chars (so the commit subject fits in 72 after prefix);acceptanceCriterianon-empty;passesis a bool.
Output:
$ riveter validate -r task-priority-a1b2c3d4
✓ filesystem layout
✗ prd.toml
- stories[1].id: expected 2, found 5 (ids must be sequential 1..N)
- stories[2].acceptanceCriteria: empty
2 errors
Exit codes: 0 ok, 30 validation errors, 31 filesystem missing, 32 I/O error.
Skill integration. The prd skill runs riveter validate -r <runId> after generating the folder, parses the human output (one error per - line), fixes each error (re-writes prd.toml), and re-runs until exit 0 — up to 3 attempts. If validation still fails, the skill surfaces the remaining errors to the user and refuses to print the "ready to run" message.
riveter run -r <RUN_ID> [OPTIONS]
Required:
-r, --run <RUN_ID> run id (or full path to a run folder)
Options:
-a, --agent <codex|claude|mock> [default: codex]
-m, --model <STRING> [default: gpt-5]
-t, --thinking <low|med|high> [default: high]
-n, --max-iterations <N> [default: 10]
-v, --verbose
-h, --help
-V, --version
<RUN_ID> is resolved as: (1) absolute/relative path if it contains /, (2) <state-dir>/runs/<RUN_ID> otherwise. Errors out if the folder is missing or prd.toml doesn't exist.
All flags are also readable from env (RIVETER_AGENT, RIVETER_MODEL, RIVETER_THINKING, RIVETER_MAX_ITERATIONS). Env-only knobs: RIVETER_STATE_DIR (override the state dir, used by tests) and RIVETER_AGENT_BIN (override the agent executable, used by tests for codex/claude shims).
# 1. In an agent session, invoke the `prd` skill with your spec.
# The skill generates the PRD, creates the run folder, validates it,
# and tells you the runId.
# 2. (Optional) inspect / edit the generated PRD.
$EDITOR ~/.local/state/riveter/runs/task-priority-a1b2c3d4/prd.toml
# 3. Run the loop in-place against the current jj repo.
riveter run -r task-priority-a1b2c3d4 -a codex -m gpt-5 -t high
# 4. Resume later (same command - skips stories already passing).
riveter run -r task-priority-a1b2c3d4pub trait Agent {
fn name(&self) -> &'static str;
fn run(&self, prompt: &str, opts: &RunOpts) -> Result<AgentOutput>;
}
pub struct RunOpts<'a> {
pub model: Option<&'a str>,
pub thinking: Thinking, // Low | Med | High
pub cwd: &'a Path,
}
pub struct AgentOutput {
pub stdout: String,
pub stderr: String,
pub exit_code: i32,
}Codex impl invokes codex exec --model <m> --reasoning <thinking> … (exact flags pinned in code with comments).
Claude impl invokes claude --dangerously-skip-permissions --print --model <m> ….
Both stream output live to stderr (via tee-style passthrough) while capturing stdout for logging.
Mock impl is hardcoded in the binary for integration tests. On each call it:
- Reads
prd.tomlfromcwd. - Picks the first story with
passes = false. - Touches
riveter-mock-<storyId>.txtincwd(sojj sthas changes). - Rewrites
prd.tomlsetting that story'spasses = true. - Exits 0.
After enough invocations the mock walks the entire PRD and the loop terminates naturally (all passes = true). No env vars, no scripts.
Per iteration:
-
Load PRD. Read
<run-dir>/prd.toml. -
Terminate? If every story already has
passes = true, exit 0 (status=ok). -
Pick story. Riveter selects the first story (array order) with
passes = falseand injects it into the prompt. The agent only sees one story per iteration — keeping its context tiny. The prompt instructs the agent to set that story'spasses = trueinprd.tomlwhen its acceptance criteria are met. -
Run agent. Pipe the rendered prompt to the selected agent; tee live output to console (line-prefixed
│) and toiterations/NNN/stdout.log. -
Commit. If
jj stshows changes, Riveter runsjj describe -m <msg>thenjj new. No bookmark moves, no branch switching. Message format:[RIVETER(<runId>,#<storyId>,<model>)] chore: <story title>Example:
[RIVETER(task-priority-a1b2c3d4,#1,gpt-5)] chore: Add priority field to tasks tableAlways
chore: <story title>— no parsing of agent output, no fallback. Reviewers canjj describeto rewrite the subject if they prefer a different conventional-commit type. Grep withjj log -r 'description(glob:"[RIVETER*")'. -
Empty work-copy = failure. If
jj stis clean after the agent exits 0, treat it asclean_workcopyand stop. The agent should have produced changes; if it didn't, something's wrong and we don't want to silently re-prompt. -
Loop back to step 1, up to
--max-iterations.
After --max-iterations without all stories passing → exit 20.
Termination is driven solely by prd.toml state: when every story has passes = true, the loop exits ok. There is no special "COMPLETE" signal in agent output. Story selection happens in Riveter, not the agent.
No resume support. If a run is interrupted mid-iteration (Ctrl-C, crash), the partial commit and partially-updated prd.toml are left on disk. Re-running riveter run -r <id> does not auto-clean: it re-reads prd.toml as-is and proceeds from the first passes = false story. If the user wants a clean retry, they should jj abandon <commit> and edit prd.toml themselves.
A single skill, plain markdown so any agent (codex/claude/amp) can load it. The skill is the spec→PRD pipeline. The CLI never sees a spec.
Triggered by "create a prd", "spec out", "plan this feature", "convert this spec".
The skill is non-interactive: it does not ask the user clarifying questions. It converts whatever spec it is given into a structured PRD. The user can always edit prd.toml afterwards.
Steps:
- Take the user's spec (free-form description or full PRD markdown).
- Compute a
runId=<kebab-of-feature-title>-<random-8-alphanumeric>(lowercase[a-z0-9-]+, ≤64 chars). Random suffix so re-running the skill on the same spec creates a new run rather than colliding. - Create the run folder at
<state-dir>/runs/<runId>/. State dir convention:$XDG_STATE_HOME/riveter(Linux),~/Library/Application Support/riveter(macOS),%LOCALAPPDATA%\riveter(Windows). - Write:
spec.md— the user's original input, verbatimprd.toml— the structured PRD Riveter consumes (schema in §7b)
- Validate. Run
riveter validate -r <runId>. If errors are reported, patch the offending files (typicallyprd.toml) and re-validate. Retry up to 3 times. If validation still fails, show the errors to the user and stop — do not print the "ready to run" message. - Print to the user (only after validation passes):
Created run
task-priority-a1b2c3d4at~/.local/state/riveter/runs/task-priority-a1b2c3d4. Reviewprd.toml, then run:riveter run -r task-priority-a1b2c3d4
The skill is markdown executed by an LLM, so it's free to compute the runId, call mkdir, and write TOML directly. It does not need a Rust helper. The CLI's only contract with skills is: read prd.toml from a run folder.
The skill handles both free-form specs and pre-written PRDs uniformly. If the input already contains ## User Stories, the skill preserves them; otherwise it generates them from the description.
description = "Add high/medium/low priority to tasks (default medium); show colored badges; filter by priority."
createdAt = "2026-05-24T11:42:07Z"
[[stories]]
id = 1
title = "Add priority field to tasks table"
passes = false
acceptanceCriteria = [
"Add priority column: 'high' | 'medium' | 'low' (default 'medium')",
"Generate and run migration successfully",
"Typecheck passes",
]
[[stories]]
id = 2
title = "Display priority badge on task cards"
passes = false
acceptanceCriteria = [
"Each task card shows colored badge (red=high, yellow=medium, gray=low)",
"Typecheck passes",
]description is the top-level context the agent sees on every iteration (alongside the picked story) — it tells it what the project is and what it's building. createdAt is recorded for traceability. The folder name is the runId; no runId field needed inside the file.
- Story execution order is the array order.
idis a stable integer (1-indexed, sequential, no gaps). - The only per-story state Riveter mutates is
passes. Reviewers can flippasses = falseto re-run a story. - TOML chosen over JSON because: comments survive round-trips, multiline arrays read well for criteria, humans actually edit this file between iterations.
The run folder is the run identity. It lives outside the project under test so agent transcripts never pollute the working repo.
$XDG_STATE_HOME/riveter/runs/<runId>/
├── spec.md # snapshot of the original spec input (verbatim)
├── prd.toml # structured PRD — the single source of truth Riveter reads
└── iterations/
├── 001/
│ ├── prompt.txt # exact bytes piped to the agent
│ ├── stdout.log
│ ├── stderr.log
│ └── exit.txt # agent exit code
├── 002/
└── ...
- Default state location:
dirs::state_dir(). Override withRIVETER_STATE_DIR. riveter runprints the run dir path on stderr first so users cantail -f iterations/.../stdout.logimmediately.- Iteration dirs are append-only across re-runs (numbering continues
003,004, …); they never overwrite a prior iteration.
To list runs: ls $XDG_STATE_HOME/riveter/runs/.
To see PRD progress: cat $XDG_STATE_HOME/riveter/runs/<runId>/prd.toml (grep for passes =).
To find the most recent run: ls -t $XDG_STATE_HOME/riveter/runs/ | head -1.
Default (-v adds debug lines, -vv adds every shell command):
riveter 0.1.0
run: ~/.local/state/riveter/runs/task-priority-a1b2c3d4
agent: codex model: gpt-5 thinking: high
cwd: /home/me/proj (jj repo @ kwptlqqv)
prd: 4 stories, 1 passing, 3 remaining
============================================================
iteration 3/10 · #2 "Display priority badge on task cards"
============================================================
[agent] streaming to iterations/003/stdout.log ...
│ Reading prd.toml...
│ Implementing priority badge component...
│ (live agent output prefixed with "│ ")
[agent] exit 0 in 47.2s
[jj] 3 files changed -> commit zptlqqvr
[RIVETER(task-priority-a1b2c3d4,#2,gpt-5)] chore: Display priority badge on task cards
[prd] story #2 marked passes=true (2/4)
============================================================
iteration 4/10 · #3 "Add priority selector to task edit"
============================================================
...
[done] all stories passing after 4 iterations (3m12s)
run dir: ~/.local/state/riveter/runs/task-priority-a1b2c3d4
review: jj log -r 'description(glob:"\\[RIVETER*")'
Live agent output is line-prefixed (│ ). With --quiet, only iteration headers, errors, and the final summary are printed; full transcripts always land in the run dir.
| Trigger | Exit |
|---|---|
All stories passes = true |
0 |
| Agent exits non-zero | 10 |
Agent exits 0 but jj st is clean |
12 |
Any jj subcommand fails |
13 |
prd.toml missing/invalid after iteration |
14 |
Hit --max-iterations with stories still pending |
20 |
| SIGINT/SIGTERM | 130 |
Rules:
- No silent rollback of
jjcommits. Bad commits are left in place; the user canjj abandon <id>to discard them. - No resume. Re-running
riveter run -r <runId>does not clean up partial work from a prior interrupted run; it just re-readsprd.tomland proceeds. Iteration numbering continues from the highest existingNNN. - Signal handling. SIGINT during an agent call: wait up to 10s for graceful shutdown, then SIGKILL the process group; exit 130.
The mock agent is selected with --agent mock. Its behavior is hardcoded in the binary (no env vars, no scripts, no separate crate). On each invocation it:
- Reads
prd.tomlfrom the run folder. - Picks the first story with
passes = false. - Touches
riveter-mock-<storyId>.txtin the working copy (sojj stsees changes). - Rewrites
prd.tomlto set that story'spasses = true. - Exits 0.
After N iterations (where N = number of stories), every story is passing and the loop terminates with exit 0. This is enough to test the loop, the jj integration, the iteration numbering, and the per-iteration log files.
#[test]
fn loop_walks_prd_to_completion() {
let tmp = tempdir().unwrap();
init_jj(tmp.path());
let run_id = seed_run(tmp.path(), &demo_prd_with_3_stories());
let exit = run_riveter(&[
"run", "-r", &run_id, "--agent", "mock", "--max-iterations", "10",
], tmp.path());
assert_eq!(exit, 0);
assert_eq!(jj_log_count(tmp.path()), 3); // one commit per story
assert!(all_stories_passing(tmp.path(), &run_id));
}cargo new --bin crates/riveter, then add a workspace rootCargo.toml.rust-toolchain.tomlpinning a recent stable (e.g.1.85.0)..gitignore:/target /scratch/ *.tmp **/*.log .DS_Store.jjignoremirrors the above forjj.- Dependencies (intentionally minimal):
clap(derive) — CLIserde+toml—prd.toml(read/write)dirs— cross-platform state dirrand— runId random suffixanyhow— errorstracing+tracing-subscriber— logstempfile,assert_cmd,predicates— dev-only, for integration tests
The iteration prompt template is embedded via include_str!("../templates/prompt.md") and rendered with str::replace — no templating engine needed.
The top-level README.md must include a "Why Ralph the Riveter" section explaining the rationale, because the design only makes sense once you understand the trade. Outline:
- Fresh context per story. Each iteration spawns a new agent process with no memory of prior iterations. The agent's working set is exactly one user story + the codebase + the progress log — never the full transcript of past attempts. This keeps the context window small and focused, which is the single biggest lever on agent quality and cost.
- Persistent state lives on disk, not in-context.
prd.tomltracks which stories are done — the agent reads it (plus its assigned story) at the start of every run and flipspasses = truewhen the story's acceptance criteria are met. - jj is the review surface. Because every iteration is one
jjcommit on@, the user reviews Riveter's work the same way they review their own:jj log,jj diff -r <id>,jj abandon <id>to reject,jj squash/jj splitto reshape. No branch dance, no PR ceremony, no merge conflicts with your in-flight work. - Pluggable agents. Codex and Claude are interchangeable — pick the model that's good at the kind of work the spec needs. Same loop, same artifacts, same review flow.
Also include:
- Quick start — two phases: (1) run the
prdskill in an agent session against your spec → it prints arunId; (2)riveter run -r <runId>from the project's jj repo. - Reviewing a run (
jj log -r 'description(glob:"\[RIVETER*")') - Rejecting bad iterations (
jj abandon <commit>, optionally editprd.tomlto flippassesback tofalse) - Tuning for cost vs. quality (the
--thinking/--modelmatrix, and a note that defaults are deliberately on the expensive end) --agent mockusage (deterministic agent for hacking on Riveter itself)
- M1 — Scaffold. Workspace,
.gitignore,.jjignore,clapCLI parsing, no-op loop. - M2 —
riveter validate. Schema + filesystem checks; unit tests against good/bad fixtures. - M3 — Agent trait + Claude impl. Real
claudeinvocation, output tee'd to per-iteration log files. - M4 —
jjintegration. Commit per iteration, in-place. - M5 — Codex impl + model/thinking flags.
- M6 — Built-in
mockagent + first integration test. - M7 —
prdskill authored — non-interactive spec→PRD conversion +validateauto-fix loop. - M8 — Docs (README) and a small example PRD end-to-end run.
- Codex flag mapping. Confirm exact
codex execflags for--thinkingand--modelfor the target version. - Repo / workspace folder name. Working tree is currently
rivet-ralph/on disk. Rename toralph-the-riveter/to match the project name, or leave it?