Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .devflow/mistakes.example.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,18 @@
"symptom": "Agent used configuration or API patterns from an older major version of a framework or tool.",
"correction": "Before editing framework configuration, inspect installed package versions and current primary docs; do not assume older major-version setup still applies.",
"appliesTo": ["tooling-version-drift", "framework-config"]
},
{
"id": "powershell-select-object-range-syntax",
"symptom": "Agent passed a PowerShell range expression as a string to Select-Object -Index.",
"correction": "Wrap PowerShell ranges in parentheses, for example Select-Object -Index (108..156).",
"appliesTo": ["windows-powershell", "shell-file-io"]
},
{
"id": "playwright-module-unavailable",
"symptom": "Agent tried to run Playwright before the package or workspace runtime was available.",
"correction": "Inspect the repo package manager and installed dependencies before loading Playwright; install dependencies or use the bundled runtime path when the project expects it.",
"appliesTo": ["playwright", "browser-automation"]
}
]
}
45 changes: 45 additions & 0 deletions docs/contributing/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -1004,6 +1004,51 @@ Outputs:
Use this when work happened in a browser, terminal, IDE, meeting, or agent host
that does not have a stable adapter yet.

## `devflow mistakes`

Records and detects repo-local repeated agent mistake memory. This is the
mistake repair loop that feeds `devflow doctor`, start skills, and future
plugin hooks without making Devflow a persistent autonomous agent.

Examples:

```powershell
devflow mistakes add --repo C:\Projects\devflow-demo --id powershell-select-object-range-syntax --category shell-file-io-friction --symptom "Agent passed a PowerShell range expression as a string to Select-Object -Index." --correction "Wrap PowerShell ranges in parentheses, for example Select-Object -Index (108..156)." --applies-to windows-powershell --json
devflow mistakes list --repo C:\Projects\devflow-demo --json
devflow mistakes detect --repo C:\Projects\devflow-demo --platform windows-powershell --command 'Get-Content -LiteralPath docs\product-plan.md | Select-Object -Index 108..156' --stderr 'Cannot bind parameter ''Index''. Cannot convert value "108..156" to type "System.Int32".' --record --json
```

Subcommands:

- `add`: writes a maintainer-approved correction into `.devflow/mistakes.json`
- `list`: renders the current project mistake memory
- `detect`: scans command output for known failure signatures and returns
candidate corrections

Inputs:

- repository path
- mistake id, category, symptom, correction, and applies-to tags for `add`
- platform, command text, stdout, stderr, and optional exit code for `detect`
- `--record` on `detect` to upsert detected candidates into
`.devflow/mistakes.json`

Outputs:

- `mistakes_add`, `mistakes_list`, or `mistakes_detect` JSON
- normalized mistake records with occurrence counts and bounded evidence text
- local `.devflow/mistakes.json` updates only when `add` or
`detect --record` is used

Current detection signatures cover:

- PowerShell `Select-Object -Index 108..156` range syntax mistakes
- Playwright package/runtime unavailable errors

Detection records a candidate; it does not automatically edit `AGENTS.md` or
skill files. Promotion to durable instruction files should remain
confirmation-gated so project docs do not accumulate noisy one-off errors.

## `devflow doctor`

Inspects the local execution contract that agent hosts should respect before
Expand Down
13 changes: 10 additions & 3 deletions docs/product-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,10 @@ beginner profile should translate those concepts into plain language.
the context of my project.
9. As a beginner, I can turn vague intent into a better prompt without learning
all implementation vocabulary upfront.
10. As a maintainer, I can capture repeated agent mistakes such as shell
mismatch, Windows path handling, encoding issues, unsafe commands, and
missing setup steps, then feed those lessons into future sessions.
10. As a maintainer, I can capture and detect repeated agent mistakes such as
shell mismatch, Windows path handling, encoding issues, unsafe commands,
missing setup steps, and unavailable tools, then feed those lessons into
future sessions.

## Product Shape

Expand Down Expand Up @@ -152,6 +153,12 @@ Repeated-mistake memory should be layered:
- private user memory can record maintainer-specific habits, paths, and
historical failures

The repair loop should be explicit: command output or user correction becomes a
mistake candidate, confirmed candidates are stored in `.devflow/mistakes.json`,
`devflow doctor` injects the correction at session start, and repeated
candidates can be promoted to `AGENTS.md` or skill files only through a
confirmation-gated patch.

## Supported Agents

The product should treat coding tools as adapters, not as the product center.
Expand Down
3 changes: 2 additions & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,15 @@ Build:
- `devflow status`
- `devflow finish`
- `devflow doctor`
- `devflow mistakes add/list/detect`
- `devflow prompt next`
- `devflow prompt latest`
- repo-local Codex/Claude plugin hooks for start, prompt intent, and finish
guard context
- local `.devflow/` state files
- git dirty-file capture
- gate evidence capture
- platform execution contract and repeated-mistake memory capture
- platform execution contract and repeated-mistake memory capture/detection
- repo-local Codex plugin wrappers for the start/status/doctor and finish
evidence loops
- Markdown next-session handoff output and latest prompt projection
Expand Down
3 changes: 3 additions & 0 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ packages/cli/
- `devflow sessions attach`
- `devflow sessions list`
- `devflow sessions note`
- `devflow mistakes add`
- `devflow mistakes list`
- `devflow mistakes detect`

`devflow init` currently renders a scaffold plan by default and writes the
minimum project contract only when `--confirm` is provided. The first scaffold
Expand Down
73 changes: 72 additions & 1 deletion packages/cli/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ import {
} from "../../adapters/src/index.js";
import {
createFinishSummary,
createMistakeDetection,
createMistakeListSummary,
readHarnessInspect,
readHarnessHealth,
readHarnessPlan,
Expand All @@ -43,6 +45,7 @@ import {
readLatestHandoff,
readMistakeMemory,
recordFinishEvent,
recordMistakeMemory,
recordManualSessionNoteEvent,
recordReviewEvent,
recordSessionAttachedEvent,
Expand Down Expand Up @@ -97,6 +100,12 @@ try {
await renderFinish(args.slice(1));
} else if (command === "doctor") {
await renderDoctor(args.slice(1));
} else if (command === "mistakes" && args[1] === "add") {
await renderMistakeAdd(args.slice(2));
} else if (command === "mistakes" && args[1] === "list") {
await renderMistakeList(args.slice(2));
} else if (command === "mistakes" && args[1] === "detect") {
await renderMistakeDetect(args.slice(2));
} else if (command === "gates" && args[1] === "run") {
await renderGatesRun(args.slice(2));
} else if (command === "review" && args[1] === "record") {
Expand Down Expand Up @@ -409,6 +418,60 @@ async function renderDoctor(argsForCommand) {
render(summary, options.json);
}

async function renderMistakeAdd(argsForCommand) {
const options = parseOptions(argsForCommand);
const repoPath = options.repo ?? cwd();
const summary = await recordMistakeMemory(repoPath, {
id: options.id,
category: options.category,
scope: options.scope,
symptom: options.symptom,
correction: options.correction,
appliesTo: collectRepeated(options["applies-to"] ?? options.appliesTo),
confidence: options.confidence,
evidence: collectRepeated(options.evidence).map((text) => ({
kind: "user-correction",
text,
})),
});

render(summary, options.json);
}

async function renderMistakeList(argsForCommand) {
const options = parseOptions(argsForCommand);
const repoPath = options.repo ?? cwd();
const memory = await readMistakeMemory(repoPath);
const summary = createMistakeListSummary({
mistakes: memory.mistakes,
warnings: memory.warnings,
});

render(summary, options.json);
}

async function renderMistakeDetect(argsForCommand) {
const options = parseOptions(argsForCommand);
const repoPath = options.repo ?? cwd();
const detection = createMistakeDetection({
platform: options.platform ?? defaultPlatformName(),
command: options.command,
stderr: options.stderr,
stdout: options.stdout,
exitCode: options["exit-code"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Command line arguments parsed from the CLI are always strings. Passing options["exit-code"] directly to createMistakeDetection results in a string type for exitCode (e.g., "1"), whereas other parts of the system (such as gate evidence) expect exitCode to be a number or null. Parsing it as an integer ensures type consistency across the JSON contracts.

    exitCode: options["exit-code"] !== undefined ? parseInt(options["exit-code"], 10) : undefined,

});
const recorded = [];

if (options.record) {
for (const candidate of detection.candidates) {
const result = await recordMistakeMemory(repoPath, candidate);
recorded.push(result.mistake);
}
}

render({ ...detection, recorded }, options.json);
}

async function renderGatesRun(argsForCommand) {
const { options, positional } = parseOptionsAndPositionals(argsForCommand);
const repoPath = options.repo ?? cwd();
Expand Down Expand Up @@ -967,6 +1030,11 @@ function renderHelp(group) {
mcp: [
"devflow mcp stdio",
],
mistakes: [
"devflow mistakes add --id <id> --symptom <text> --correction <text> [--json]",
"devflow mistakes list [--json]",
"devflow mistakes detect --stderr <text> [--command <text>] [--record] [--json]",
],
};

if (group && groups[group]) {
Expand Down Expand Up @@ -998,6 +1066,7 @@ function renderHelp(group) {
" init Plan or write a .devflow project scaffold",
" health Check the project scaffold",
" doctor Inspect local shell/tooling rules",
" mistakes <command> Record and detect repeated agent mistake memory",
" status Show repo, work, session, gate, and handoff state",
" harness <command> Inspect/install/verify Codex and Claude harness files",
" mcp stdio Run the Devflow MCP stdio server",
Expand All @@ -1019,6 +1088,7 @@ function renderHelp(group) {
"Group help:",
" devflow harness --help",
" devflow mcp --help",
" devflow mistakes --help",
" devflow work --help",
" devflow prompt --help",
"",
Expand Down Expand Up @@ -1190,7 +1260,8 @@ function parseOptionsAndPositionals(rawArgs) {
key === "once" ||
key === "dry-run" ||
key === "check" ||
key === "repo-visible"
key === "repo-visible" ||
key === "record"
) {
options[key] = true;
continue;
Expand Down
121 changes: 121 additions & 0 deletions packages/cli/test/cli-mvp.test.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -1997,6 +1997,127 @@ test("CLI doctor renders platform and mistake memory JSON", async () => {
assert.match(parsed.recommendations[0].message, /Get-Content -LiteralPath/);
});

test("CLI mistakes add and list persist repo-local correction memory", async () => {
const repoPath = await createTempGitRepo();

const added = await execFileAsync("node", [
"packages/cli/src/index.js",
"mistakes",
"add",
"--repo",
repoPath,
"--id",
"powershell-select-object-range-syntax",
"--category",
"shell-file-io-friction",
"--symptom",
"Agent passed a PowerShell range expression as a string to Select-Object -Index.",
"--correction",
"Wrap PowerShell ranges in parentheses, for example Select-Object -Index (108..156).",
"--applies-to",
"windows-powershell",
"--json",
]);
const addedJson = JSON.parse(added.stdout);

assert.equal(addedJson.command, "mistakes_add");
assert.equal(addedJson.mistake.id, "powershell-select-object-range-syntax");
assert.equal(addedJson.mistake.occurrences, 1);

const listed = await execFileAsync("node", [
"packages/cli/src/index.js",
"mistakes",
"list",
"--repo",
repoPath,
"--json",
]);
const listJson = JSON.parse(listed.stdout);

assert.equal(listJson.command, "mistakes_list");
assert.equal(listJson.count, 1);
assert.equal(listJson.mistakes[0].category, "shell-file-io-friction");
assert.match(listJson.mistakes[0].correction, /Select-Object -Index \(108\.\.156\)/);

const doctor = await execFileAsync("node", [
"packages/cli/src/index.js",
"doctor",
"--repo",
repoPath,
"--platform",
"windows-powershell",
"--json",
]);
const doctorJson = JSON.parse(doctor.stdout);

assert.equal(doctorJson.memory.repeatedMistakes[0].id, "powershell-select-object-range-syntax");
assert.match(
doctorJson.recommendations.find((item) => item.source === "powershell-select-object-range-syntax")
.message,
/PowerShell ranges/,
);
});

test("CLI mistakes detect records PowerShell and Playwright mistake candidates", async () => {
const repoPath = await createTempGitRepo();

const powershell = await execFileAsync("node", [
"packages/cli/src/index.js",
"mistakes",
"detect",
"--repo",
repoPath,
"--platform",
"windows-powershell",
"--command",
"Get-Content -LiteralPath docs\\product-plan.md | Select-Object -Index 108..156",
"--stderr",
"Cannot bind parameter 'Index'. Cannot convert value \"108..156\" to type \"System.Int32\".",
"--record",
"--json",
]);
const powershellJson = JSON.parse(powershell.stdout);

assert.equal(powershellJson.command, "mistakes_detect");
assert.equal(powershellJson.candidates[0].id, "powershell-select-object-range-syntax");
assert.equal(powershellJson.recorded[0].id, "powershell-select-object-range-syntax");

const playwright = await execFileAsync("node", [
"packages/cli/src/index.js",
"mistakes",
"detect",
"--repo",
repoPath,
"--platform",
"windows-powershell",
"--command",
"node smoke.mjs",
"--stderr",
"Error: Cannot find module 'playwright'",
"--record",
"--json",
]);
const playwrightJson = JSON.parse(playwright.stdout);

assert.equal(playwrightJson.candidates[0].id, "playwright-module-unavailable");
assert.equal(playwrightJson.recorded[0].id, "playwright-module-unavailable");

const listed = await execFileAsync("node", [
"packages/cli/src/index.js",
"mistakes",
"list",
"--repo",
repoPath,
"--json",
]);
const listJson = JSON.parse(listed.stdout);

assert.deepEqual(
listJson.mistakes.map((mistake) => mistake.id).sort(),
["playwright-module-unavailable", "powershell-select-object-range-syntax"],
);
});

test("CLI sessions codex renders explicit read-only Codex discovery JSON", async () => {
const repoPath = await createTempGitRepo();
const codexHome = await mkdtemp(join(tmpdir(), "devflow-cli-codex-home-"));
Expand Down
Loading