Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Splus — agent instructions

Splus makes the coding agent in your editor a disciplined, precision-first code
reviewer: a deterministic Rust engine (the grounding) + a thin TS layer (MCP
server, memory) + the review protocol shipped as skills. 100% local.

## Layout

```
crates/splus-engine/ # deterministic engine (Rust) — the source of truth, zero inference
packages/
shared/ # canonical Finding model (TS ↔ Rust serde lockstep) + runEngine/inspect
suppression/ # per-repo memory — suppress (dismiss) + reinforce (accept)
triage/ # headless review pipeline — BENCH-ONLY, the MCP path never calls it
mcp/ # the local stdio MCP server — the one and only usage path
skills/ # the review protocol (review, prefs) — installed per agent by install.sh
bench/ # run.mjs (regression gate) + martian/ (competitive benchmark)
install.sh # curl|sh installer: binaries → ~/.splus, wires MCP + skills into agents
```

## Build & test

```sh
cargo build --release # engine → target/release/splus-engine
cargo test --locked # engine tests
pnpm install
pnpm -r build && pnpm -r typecheck
pnpm -r test # unit tests (shared + suppression + triage)
pnpm build:release # bundle MCP server → dist-release/mcp.cjs
node bench/run.mjs # engine regression gate — MUST stay green
```

## Verification — non-negotiable

**Always run `greenrun` after code changes** — it executes the repo's GitHub
Actions locally and must PASS before work is called done or pushed:

```sh
greenrun --plain # ~/.greenrun/bin/greenrun if not on PATH
```

Exit 0 = passed; treat anything else as a failure to fix, and never describe a
partial run as green.

## Conventions

- The Rust model (`crates/splus-engine/src/model.rs`) and the TS model
(`packages/shared/src/index.ts`) stay in lockstep — change them together.
- Versions bump in lockstep across ALL `package.json` files + `Cargo.toml`
(then `cargo build` to refresh `Cargo.lock`), with a `CHANGELOG.md` entry.
- The MCP server is agent-led by design: it grounds and directs, the session
agent reasons. Never wire `@splus/triage` into it — that pipeline exists only
so the benchmark can measure the protocol headlessly.
- The protocol lives in `skills/` — a directive change in
`packages/mcp/src/index.ts` (`discoveryDirective`) must be mirrored in
`skills/review/` and vice versa.
- `SPLUS.md` (repo root) is the review contract; Splus reviews itself with it.
- The engine is zero-inference and deterministic; anything nondeterministic in a
collector or analysis pass is a bug.

## Release

Tag `vX.Y.Z` and push — `.github/workflows/release.yml` cross-compiles the
engine (macOS/Linux, arm64/x64), bundles `mcp.cjs`, packages `skills/`, and
publishes tarballs + `SHA256SUMS` that `install.sh` consumes.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ this project uses [semantic versioning](https://semver.org) (pre-1.0: minor vers

## [Unreleased]

### Added
- `AGENTS.md` (canonical instructions for coding agents working on this repo —
layout, build/test, the greenrun verification gate, conventions, release) and
`CLAUDE.md` (imports it).

### Changed
- `SPLUS.md` contract: every change set must pass `greenrun` (full CI, locally)
before it ships — reviews flag work that skipped it.
- Docs refreshed for skills-first delivery: ARCHITECTURE diagram drops the stale
`mcp → triage` edge (bench-only) and adds `skills/`; README/CONTRIBUTING cover
skill installation, `pnpm -r test`, and the greenrun gate; TOOLS.md documents
the CHANGED SYMBOLS block and the contract-trace stage.

## [0.9.2] — 2026-06-05

### Changed
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@AGENTS.md
16 changes: 10 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ stand up. Start with **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** for the fu
- `crates/splus-engine/` — the deterministic engine (Rust). The source of truth for findings.
- `packages/shared/` — the canonical `Finding`/`Report` model (TS, mirrors the Rust serde model) + `runEngine`, which shells out to the engine binary.
- `packages/suppression/` — per-repo memory: suppress (`dismiss`) + reinforce (`accept`).
- `packages/triage/` — the optional LLM layer: the multi-pass review (detect→…→verify), downstream of the engine.
- `packages/triage/` — the headless review pipeline, **bench-only**: it exists so the Martian benchmark can measure the protocol without a human agent. The MCP path never calls it.
- `packages/mcp/` — the local stdio MCP server your agent connects to. Tools: see [docs/TOOLS.md](docs/TOOLS.md).
- `skills/` — the review protocol as agent skills (`review`, `prefs`); `install.sh` installs them into Claude Code / Codex / OpenCode. A directive change in `packages/mcp` must be mirrored here, and vice versa.
- `bench/` — `run.mjs` (the regression gate) + `martian/` (the competitive benchmark).

The splus.sh marketing site lives in its own repo, [kiwi-init/splus-lp](https://github.com/kiwi-init/splus-lp).
Expand All @@ -23,8 +24,9 @@ cargo test # engine tests
pnpm install
pnpm -r build # build the TS packages
pnpm -r typecheck
node --test packages/suppression/dist/*.test.js packages/triage/dist/*.test.js # unit tests
pnpm -r test # unit tests (shared + suppression + triage)
node bench/run.mjs # the regression gate — MUST stay green
greenrun --plain # the full CI, locally — must PASS before pushing
```

Run the freshly built engine directly:
Expand Down Expand Up @@ -57,13 +59,15 @@ big orchestration changes so quality is measured, not asserted.
Tag a version and push:

```sh
git tag v0.4.1 && git push --tags
git tag v0.9.2 && git push --tags
```

`.github/workflows/release.yml` cross-compiles the engine for macOS/Linux (arm64 + x64),
bundles the MCP server into a single `.cjs` file (`scripts/build-release.mjs`), and
publishes a GitHub Release with per-platform tarballs + `SHA256SUMS`. `install.sh` pulls these
from the stable `releases/latest/download/splus-<os>-<arch>.tar.gz` URL.
bundles the MCP server into a single `.cjs` file (`scripts/build-release.mjs`), packages
`skills/`, and publishes a GitHub Release with per-platform tarballs + `SHA256SUMS`.
`install.sh` pulls these from the stable `releases/latest/download/splus-<os>-<arch>.tar.gz`
URL. Versions bump in lockstep across all `package.json` files + `Cargo.toml`, with a
`CHANGELOG.md` entry.

## Principles

Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,13 @@ Drop a `SPLUS.md` at the repo root (layered over your personal `~/.splus/SPLUS.m

### Skills

The `skills/` bundle drives the agent-led flow: `review` (fans out **fresh, unbiased sub-agents** per
The `skills/` bundle IS the review protocol: `review` (fans out **fresh, unbiased sub-agents** per
unit — finder ≠ verifier — and degrades to a sequential pass where sub-agents aren't available) and
`prefs` (author `SPLUS.md`).
`prefs` (author `SPLUS.md`). The installer puts them directly into every agent it finds — Claude Code
(`~/.claude/skills/splus-review`, `splus-prefs`), Codex (`/splus-review`, `/splus-prefs` prompts),
OpenCode (`/splus-review`, `/splus-prefs` commands) — with the canonical copy at `~/.splus/skills`,
refreshed on every `splus update`. The protocol is loaded explicitly, never inferred from tool
descriptions.

**Full reference: [`docs/TOOLS.md`](docs/TOOLS.md)** — every tool, parameter, and return shape.

Expand Down Expand Up @@ -190,6 +194,7 @@ pulls from. See [`CONTRIBUTING.md`](CONTRIBUTING.md).
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** — how the engine + review protocol work (with diagrams).
- **[docs/TOOLS.md](docs/TOOLS.md)** — the MCP tools your agent calls (every param + return).
- **[CONTRIBUTING.md](CONTRIBUTING.md)** — build, test, and the release process.
- **[AGENTS.md](AGENTS.md)** — working on this repo with a coding agent (build, verify, conventions).
- **[bench/martian/](bench/martian/)** — score Splus on the independent Martian Code Review Bench.

## Repo layout
Expand All @@ -201,8 +206,10 @@ packages/
suppression/ # per-repo memory — suppress (dismiss) + reinforce (accept)
triage/ # benchmark harness — runs the protocol headlessly to measure it (not a usage path)
mcp/ # the local MCP server your agent talks to — the one and only way to use Splus
skills/ # the review protocol as skills — installed into your agents by install.sh
bench/ # regression gate (run.mjs) + the Martian benchmark adapter (martian/)
docs/ # ARCHITECTURE.md · TOOLS.md
AGENTS.md # instructions for coding agents working ON this repo (CLAUDE.md imports it)
install.sh # the one-line installer
```

Expand Down
4 changes: 3 additions & 1 deletion SPLUS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# splus.md — how the Splus repo wants to be reviewed
# SPLUS.md — how the Splus repo wants to be reviewed

Splus reviews itself. This contract is read first on every review.

Expand All @@ -13,6 +13,8 @@ Splus reviews itself. This contract is read first on every review.
- Honest confidence is mandatory: a blast radius or finding must never be presented as
more certain than its resolution method warrants.
- Tests build fixtures in-memory or under a tempdir; they are not production paths.
- Every change set runs `greenrun` (the full CI, locally) before it ships. Flag any work
presented as done without a passing greenrun.

## skip
- skip: dist-release/**
Expand Down
10 changes: 7 additions & 3 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,19 @@ Nothing leaves your machine.
```mermaid
flowchart TB
agent["🤖 Your coding agent<br/>Claude Code · Codex · OpenCode"]
skills["skills/<br/><i>the review protocol (installed per agent)</i>"]
mcp["packages/mcp<br/><i>local MCP server (stdio)</i>"]
engine["crates/splus-engine<br/><b>deterministic floor</b> (Rust)"]
triage["packages/triage<br/><i>multi-pass LLM review</i>"]
bench["bench/martian<br/><i>benchmark</i>"]
triage["packages/triage<br/><i>headless pipeline (bench-only)</i>"]
supp["packages/suppression<br/><i>per-repo memory</i>"]
shared["packages/shared<br/><i>Finding model + runEngine</i>"]

skills -.->|drives| agent
agent <-->|MCP tools| mcp
mcp -->|grounds with| engine
mcp -->|optional · key or claude -p| triage
mcp -->|dismiss / accept| supp
bench -->|measures| triage
engine --> shared
triage --> shared
shared -.->|spawns the binary| engine
Expand All @@ -39,7 +42,8 @@ model than the one you already run — it makes that model disciplined.
| `packages/shared` | TS | The canonical `Finding` / `Report` model (mirrors the Rust serde model) + `runEngine`, which shells out to the binary and validates its JSON. |
| `packages/suppression` | TS | Per-repo learned memory: suppress what you `dismiss`, reinforce what you `accept`. The compounding moat. |
| `packages/triage` | TS | The **benchmark harness** — runs the review protocol headlessly (key or `claude -p`) so the Martian bench can measure it without a human agent. Strictly downstream of the engine. **Not a usage path** — products only ship the MCP flow. |
| `packages/mcp` | TS | The local stdio MCP server your agent connects to — **the one and only way to use Splus**. Wires the engine + suppression together and exposes the tools. |
| `packages/mcp` | TS | The local stdio MCP server your agent connects to — **the one and only way to use Splus**. Wires the engine + suppression together and exposes the tools. It never calls `packages/triage`. |
| `skills/` | md | The review protocol as first-class skills. `install.sh` installs them into every detected agent (Claude Code skills, Codex prompts, OpenCode commands; canonical copy at `~/.splus/skills`) so the protocol doesn't depend on MCP tool descriptions being read. |

## The deterministic floor (the engine)

Expand Down
11 changes: 8 additions & 3 deletions docs/TOOLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,14 @@ fix, and cross-file **blast radius**. Learned suppressions are applied first.
There is **one flow** and you are the driver: the response begins with the repo's
[`SPLUS.md`](#preferences) contract (preferences injected, binding `mute:`/`skip:`
rules already enforced) and ends with a **discovery directive** that drives *you*
(the agent) through the full protocol (triage → investigate → verify) over the
changed files. No API key — Splus grounds you with precise anchors and a toolbelt
(`inspect`, `floor`, `recall`); you do the reasoning. Run the protocol; don't relay.
(the agent) through the full protocol (triage → trace contracts → investigate →
verify) over the changed files. The directive includes a deterministic
**CHANGED SYMBOLS** block — the exported symbols whose bodies the diff touches
(engine tree-sitter exports ∩ diff hunks) — so the contract-trace stage starts
aimed: enumerate each one's return/throw shape on every path, open every caller,
report every assumption that no longer holds. No API key — Splus grounds you with
precise anchors and a toolbelt (`inspect`, `floor`, `recall`); you do the
reasoning. Run the protocol; don't relay.

| Param | Type | Default | Description |
|---|---|---|---|
Expand Down