RFC: erp-agent — bscode sibling for ERP / business-API agents

## Summary

Proposes a fourth repository in the WasmAgent ecosystem — `erp-agent`
— positioned as a **sibling** of `bscode` rather than a replacement.
The two reference apps share the same runtime (`wasmagent-js`) and
data factory (`trace-pipeline`); they differ only in their tools,
verifiers, and target deployment surface.

The thesis: the wasmagent flywheel is **task-agnostic by design**.
`bscode` is the first reference app (coding); ERP / business-API
agents are a different vertical with materially different economics
and a much weaker public data baseline. Adding a sibling proves the
task-agnostic claim and opens a market with stronger willingness
to pay than coding tools.

This is a **scoping RFC**: it asks for agreement on the structure
and the boundary between "what lives in the new repo" vs. "what gets
upstreamed into wasmagent-js / aep / trace-pipeline". It is not yet
a build plan.

## Why this is structurally possible

The current ecosystem already separates task-agnostic primitives
from task-specific glue, even though we've only ever instantiated
one task (coding via bscode). The split:

| Layer | Repo | Task-agnostic? |
|---|---|---|
| WASM runtime, model adapters, ranking, AEP emitter | `wasmagent-js` | yes |
| MCP firewall, gateway, taint, consent, lease | `wasmagent-js` | yes |
| Compliance verifier framework + repair planner | `wasmagent-js` | mostly — verifier interface is task-agnostic; concrete verifiers are not |
| AEP record schema (v0.2, v0.3 in [#7](https://github.com/WasmAgent/wasmagent-js/issues/7)) | `wasmagent-js` | yes |
| `validate-aep`, `trust-score`, `audit-report` | `trace-pipeline` | yes |
| `TrainingDataExporter` (SFT / DPO / PPO / router records) | `trace-pipeline` | yes |
| Cloudflare worker shell, auth, session, KV, rate-limit, rollout-adapter | `bscode` | mostly — could be lifted to a template |
| Concrete tools (`fs_write`, `bash`, `read_file`, ...) | `bscode` | **task-specific** |
| Concrete verifiers (`BuildPassesVerifier`, `VisualAssertVerifier`) | `wasmagent-js` (lib) + `bscode` (adapter) | **task-specific** |

Concretely, [`packages/core/src/agents/verifiers/types.ts:38-44`](https://github.com/WasmAgent/wasmagent-js/blob/main/packages/core/src/agents/verifiers/types.ts) already declares the
`verify_method` field as an **open string union** — built-ins are
listed for autocomplete, but applications can register custom kinds
via `VerificationPipeline.register()`. The docstring explicitly says:

> This keeps the protocol product-agnostic — bscode's "build_passes"
> or a CI's "lighthouse_score_min" verifier registers without
> touching WasmAgent core.

ERP-specific verifiers (`order_state_machine_valid`, `ledger_balanced`,
`permission_boundary_respected`, ...) fit the same registration
pattern. Nothing in the runtime needs to know they exist.

## Proposed structure

```
erp-agent/                                ← new sibling repo
├── apps/
│   ├── worker/                           ← runtime shell (mirrors bscode)
│   │   ├── src/
│   │   │   ├── tools/                    ← ERP-specific tools
│   │   │   │   ├── odoo-xmlrpc.ts
│   │   │   │   ├── netsuite-suiteql.ts
│   │   │   │   ├── sap-odata.ts
│   │   │   │   └── domain-glossary.ts
│   │   │   ├── verifiers/                ← ERP-specific verifiers
│   │   │   │   ├── order-state-verifier.ts
│   │   │   │   ├── ledger-balance-verifier.ts
│   │   │   │   ├── permission-boundary-verifier.ts
│   │   │   │   └── dual-write-consistency-verifier.ts
│   │   │   ├── rollout-adapter.ts        ← copy bscode pattern
│   │   │   ├── trajectoryExport.ts       ← AEP emission, mirrors bscode
│   │   │   ├── auth.ts                   ← strict auth like bscode
│   │   │   ├── mcp.ts                    ← gateway mounted at /mcp
│   │   │   └── ...
│   │   └── package.json                  ← depends on @wasmagent/* via npm
│   └── web/                              ← operator approval UI (optional)
├── .githooks/                            ← same pre-push hook
├── docs/
│   └── BRANCH_PROTECTION.md              ← pointer to wasmagent-js canon
└── README.md
```

The shared-with-bscode surface is roughly:

- `apps/worker/src/app.ts` (Hono app skeleton)
- `apps/worker/src/middleware/auth.ts`, `rateLimit.ts`
- `apps/worker/src/config/productionGuard.ts`
- `apps/worker/src/build-results.ts` → renamed to `verifier-results.ts` (same shape, different semantic)
- `apps/worker/src/rollout-adapter.ts` (same pattern, adapted to ERP verifiers)
- `apps/worker/src/trajectoryExport.ts`
- `apps/worker/scripts/test-aep-roundtrip.ts`

The diff-from-bscode surface is:

- `apps/worker/src/tools/` (all ERP API SDK wrappers + governance metadata)
- `apps/worker/src/verifiers/` (domain logic — the moat)
- `apps/worker/src/policies/` (lease shapes specific to financial/order side-effects)

## Verifier taxonomy

Coding verifiers operate on **deterministic build artifacts**
(`exitCode === 0`). ERP verifiers operate on **business-state
invariants** which are mostly post-condition checks against the
target system's API. Proposed taxonomy:

| Verifier family | What it checks | Example |
|---|---|---|
| **State machine** | Did the entity transition through a legal state edge? | `quote → sales_order` requires customer.credit_status == "ok" |
| **Ledger / balance** | Do debits == credits across the affected accounts? | After AP voucher post: AP↑, Cash↓ or Expense↑, sum balanced |
| **Dual-write consistency** | If the agent wrote to two systems (e.g., CRM + ERP), do they agree? | Salesforce Opportunity.amount == NetSuite Estimate.totalAmount |
| **Permission boundary** | Did the call respect the principal's role / segment / region? | Buyer in EMEA cannot approve PO over €5k without VP sign-off |
| **Idempotency** | Did a retry produce the same observable effect? | Two POSTs with the same idempotency-key → one row, not two |
| **Audit-trail completeness** | Did the underlying ERP create the expected audit records? | Approval action → audit_log row with principal + reason |
| **Schema-drift detection** | Did the response still match the contract we trained on? | NetSuite added a field, our prompt template now drifts |

The first four are the high-value moat; the last three are
defensive. All seven plug into `VerificationPipeline.register()`
exactly like `BuildPassesVerifier` does today.

## Compatibility with AEP v0.2 / v0.3

ERP tool calls map cleanly onto the current AEP record shape and
benefit from the v0.3 additions in [#7](https://github.com/WasmAgent/wasmagent-js/issues/7) more than coding does:

| AEP v0.3 field | Why it matters more for ERP than for coding |
|---|---|
| `side_effect_class` | Coding: read vs write vs network. ERP: read vs **financial-mutate** vs network-egress-to-third-party. Distinction is regulatory. |
| `state_digest_kind: "db-rowset"` + coverage descriptor | Pre/post digest over an explicit table + row predicate is exactly the shape an ERP post-condition verifier needs. The coverage descriptor (`database`, `table`, `rows_predicate`) was designed with this in mind even though the prompt was @armorer-labs. |
| `argument_drift` | High-stakes: a model that "approves PO #1234" then drifts to "approves PO #5678" is a real bug class in production. v0.3's one-record-per-event rule makes the audit trail explicit. |
| `approval_mode: "bounded-lease"` | The natural shape for "this agent can post journal entries in cost-centre X for the next 60 minutes, up to 10 entries, total ≤ $50k". |
| `deny_reason_class: "missing-delegation"` | Maps directly to SoD (segregation-of-duties) violations in financial controls. |

No new AEP schema needs to be invented for ERP. The v0.3 RFC fields
work as-is. **This is the strongest argument that the architecture
is task-agnostic in practice, not just in slides.**

## Training-data strategy

Coding has abundant public training signal (SWE-bench, MBPP,
HumanEval, IFEval). ERP has near-zero — every customer's business
rules, fields, and permissions are different, and no one publishes
training data over real ledger data.

This is both the **opportunity** and the **constraint**.

### Opportunity

Training records produced by an ERP agent operating under real
business constraints are scarce by definition. They are the moat
that bscode-derived coding data cannot be.

### Constraint

You **cannot fork-execute** ERP API calls the way you can fork
sandbox builds. There is no "try 100 branches, see which one
posts the right journal entry" — every call has externally-visible
side effects (or audit-log entries even if "rolled back"). Three
implications:

1. **Generation happens in production runs, not in synthetic
   sweeps.** A human operator + AI assistant produces one
   trajectory per real task. Trust-score gating and AEP signature
   verification become more important, not less.

2. **Verifier-based reward, not fork ranking.** `RolloutForkRunner`
   doesn't fit. Instead, `RolloutSingleRunner` + verifier ensemble
   produces a labelled record. Routes more like RLHF-from-real-use
   than DPO-from-ranked-rollouts.

3. **Training stays close to customer data boundary.** Two
   acceptable shapes:
   - **Local training**: customer runs `trace-pipeline` inside
     their VPC, model weights never leave.
   - **Federated contribution**: customer opts in to share
     redacted training records (using AEP's existing
     `redaction_profile` field at [`packages/aep/src/types.ts:55`](https://github.com/WasmAgent/wasmagent-js/blob/main/packages/aep/src/types.ts#L55)) in exchange for
     improved model weights.

### Provenance

`trace-pipeline/evomerge/schemas/training.py` already carries a
`Provenance.source: str` field on every `SftTrainingRecord` /
`DpoTrainingRecord`. bscode emits `source = "bscode-trajectory"`;
erp-agent would emit `source = "erp-agent-trajectory"` (or finer,
e.g. `"erp-agent-odoo"`). Downstream training can either filter by
source (separate domain models) or pool them (general capability
SFT across both verticals).

Pooling decision matrix:

| Training stage | Pool bscode + erp-agent? |
|---|---|
| SFT — general capability (tool use, instruction following) | yes |
| SFT — domain reasoning | no (separate models) |
| DPO — "follow tool schema correctly" | yes |
| DPO — "right answer" | no |
| Router training (which task → which capability) | yes |
| Verifier ensemble for trust score | yes (each verifier reports independently) |

## Choice of first ERP target

PoC should target one ERP, not a portfolio. Three candidates:

| Target | Pro | Con |
|---|---|---|
| **Odoo** (open source, XML-RPC + REST) | Source-available; testable locally; large SME market; SDKs in many languages | Less brand presence with enterprise procurement |
| **NetSuite SuiteQL** | Strong mid-market; reasonable API; query-rich | Account access expensive; auth (TBA) tedious |
| **SAP S/4 OData** | Largest TAM; API is well-typed | Sandbox access locked behind partner agreements; long sales cycle |

**Proposed first target: Odoo.** Reasons:
- Lowest friction to set up a real test environment (Docker compose).
- Open source means the schema and the SDKs are public — we can write
  reference verifiers without an NDA.
- SME segment ≈ best fit for an MIT-licensed reference app: customers
  willing to self-host AI tooling tend to also self-host their ERP.
- Once Odoo proves the pattern works, the SAP/NetSuite adaptation is
  mostly a different SDK call inside the same `tools/` shape.

Open question: is there appetite for a parallel `erp-agent-netsuite`
branch / fork from day one, or pick-one-and-finish-it?

## What needs to change in existing repos

Mostly **nothing**. The reference design is "drop a new sibling
repo, depend on the same npm packages, define your own tools and
verifiers." Concrete required changes:

1. **`wasmagent-js`** — optionally extract `apps/worker` from bscode
   into a reusable template package (`@wasmagent/worker-template`).
   This is **not blocking** the erp-agent PoC; it's a quality-of-life
   refactor that would let future siblings (#3, #4, ...) skip the
   90% boilerplate copy.

2. **`wasmagent-js/packages/aep`** — no schema change beyond what
   v0.3 ([#7](https://github.com/WasmAgent/wasmagent-js/issues/7)) already proposes. ERP-specific fields stay in tool
   payload, not in the AEP envelope.

3. **`trace-pipeline`** — no schema change. Add an entry to the
   documented list of recognised `Provenance.source` values
   (purely documentation; the field is already `str`).

4. **`docs/ecosystem.md`** — update the diagram to show two
   reference apps under the same runtime + data factory. The
   "How the loop closes" pseudocode becomes generic ("agent runs
   tasks → …") with bscode and erp-agent as parallel instances.

5. **`docs/BRANCH_PROTECTION.md`** — extend the scope sentence to
   include the new repo. Already a shared canonical doc per the
   recent reorg, so this is a one-line edit.

## Phased rollout

Three phases. Phase 1 commits to nothing concrete; phase 2 commits
to engineering work; phase 3 commits to customers.

### Phase 1 — Architecture lock (1 week, this RFC's scope)

- Agree on the structure proposed here (or its revisions in
  comments).
- Pick first ERP target.
- Decide whether to extract `@wasmagent/worker-template` now or
  later. (Recommendation: later — copy bscode first; extract once
  the pattern is proven across two repos.)

No code changes.

### Phase 2 — PoC (≤ 4 weeks)

- Stand up `erp-agent` repo, mirror bscode's worker structure.
- Implement 3–5 Odoo tools (read partner, read invoice, create
  quote, update customer, read journal entry).
- Implement 2–3 verifiers (`order_state_machine_valid`,
  `customer_field_consistency`, `permission_boundary_respected`).
- Run one end-to-end loop: operator says "create a quote for
  customer ABC" → AI calls Odoo → AEP records the action →
  trace-pipeline computes trust score → output the first ERP
  training record (`Provenance.source = "erp-agent-odoo"`).

Success criterion: produce **one** verified ERP training record.
That's enough to prove the pipeline; everything after is volume.

### Phase 3 — 1–2 paying customers (3–6 months)

- Recruit two design-partner customers (one Odoo, one larger ERP
  for the upgrade path).
- Local-training shape: customer runs trace-pipeline inside their
  VPC; we ship updates as model-merge recipes, not weights.
- Federated-contribution shape: customer opts in to share
  redacted SFT/DPO records; in exchange they get improved adapter
  weights.

Success criterion: a verified ERP-domain DPO record set that
outperforms a coding-only-trained baseline on the customer's own
held-out tasks.

## Risks

1. **Verifier-development cost is real.** ERP verifiers need
   domain experts who've actually implemented SAP / NetSuite /
   Odoo flows. Hiring or partnering for this is a different
   problem than hiring engineers. **Mitigation:** start with
   permission-boundary and idempotency verifiers (cheap, general);
   take state-machine and ledger verifiers as a learning curve.

2. **Customers won't share ledger data even if anonymized.**
   This is the open-source-AI version of every healthcare data
   problem. **Mitigation:** lead with local-training, treat
   federated contribution as opt-in upside, not the default.

3. **A bad PoC could damage the bscode story.** If erp-agent
   ships visibly buggy verifiers, anyone evaluating bscode will
   wonder if the runtime is at fault. **Mitigation:** clear
   labelling (erp-agent is experimental); separate maturity
   tiers in the org README; don't cross-link until erp-agent is
   beta.

4. **Engineering bandwidth.** With three repos and one
   maintainer + one new contributor, adding a fourth is
   ambitious. **Mitigation:** Phase 2 is ≤ 4 weeks of one
   contributor's time, ~80% copy-paste from bscode. The
   expensive parts are tools + verifiers, which are
   straightforwardly parallel work to add a contributor for.

## Non-goals

- **Not** a generic "ERP integration platform" (Mulesoft / Boomi /
  Tray.io territory). The point is **agent training data**, not
  integration plumbing.
- **Not** a hosted SaaS product. The reference app is MIT
  self-host like bscode; commercial offering (if any) sits at
  the trace-pipeline data-loop layer.
- **Not** a Salesforce / Workday / SAP partner integration. Those
  are sales channels, not engineering deliverables.
- **Not** changing the AEP v0.3 design to accommodate ERP. The
  v0.3 fields already fit; if they don't, we report back to
  [#7](https://github.com/WasmAgent/wasmagent-js/issues/7) as a finding from a second runtime — which is exactly
  the empirical bar [#7](https://github.com/WasmAgent/wasmagent-js/issues/7) sets for promoting `decision_envelope`
  to normative in v0.4.

## Open questions

1. **`@wasmagent/worker-template` — extract now or later?** Extract
   now means erp-agent and any future sibling share a real package;
   later means we copy bscode and refactor after the pattern is
   proven twice. (Recommendation in this RFC: later.)
2. **First ERP target — Odoo (recommended) or NetSuite?**
3. **Verifier ownership — `@wasmagent/erp-verifiers` as a npm
   package** that lives in wasmagent-js (next to
   `@wasmagent/core/agents/verifiers/`), **or** kept inside
   `erp-agent` until a third consumer needs it?
4. **Repo visibility** — public from day one (matches bscode)
   or private until PoC is presentable?
5. **Domain expert recruiting** — does this RFC's acceptance
   imply a hiring commitment? (Not necessarily; could be
   contractor / advisor for the verifier portion.)
6. **AEP v0.3 ERP feedback loop** — if erp-agent finds the v0.3
   schema insufficient for some ERP scenario, that's a strong
   signal for the v0.4 design. Should we instrument the PoC to
   report back v0.3-coverage findings to [#7](https://github.com/WasmAgent/wasmagent-js/issues/7)?

## Related

- AEP v0.3 RFC: [#7](https://github.com/WasmAgent/wasmagent-js/issues/7) — pre-condition for several
  ERP-specific record shapes
- Verifier interface: [`packages/core/src/agents/verifiers/types.ts`](https://github.com/WasmAgent/wasmagent-js/blob/main/packages/core/src/agents/verifiers/types.ts)
- bscode rollout adapter (the pattern erp-agent should mirror):
  [`apps/worker/src/rollout-adapter.ts`](https://github.com/WasmAgent/bscode/blob/main/apps/worker/src/rollout-adapter.ts)
- Trace-pipeline provenance: [`evomerge/schemas/training.py`](https://github.com/WasmAgent/trace-pipeline/blob/main/evomerge/schemas/training.py)
- Ecosystem diagram (will need updating): [`docs/ecosystem.md`](https://github.com/WasmAgent/wasmagent-js/blob/main/docs/ecosystem.md)

---

Comments welcome from anyone who has built agents against a
production ERP, particularly on the verifier taxonomy (which
classes I'm missing, which ones are too general to be useful) and
on the federated-vs-local training-data shape.


Layer	Repo	Task-agnostic?
WASM runtime, model adapters, ranking, AEP emitter	`wasmagent-js`	yes
MCP firewall, gateway, taint, consent, lease	`wasmagent-js`	yes
Compliance verifier framework + repair planner	`wasmagent-js`	mostly — verifier interface is task-agnostic; concrete verifiers are not
AEP record schema (v0.2, v0.3 in #7)	`wasmagent-js`	yes
`validate-aep`, `trust-score`, `audit-report`	`trace-pipeline`	yes
`TrainingDataExporter` (SFT / DPO / PPO / router records)	`trace-pipeline`	yes
Cloudflare worker shell, auth, session, KV, rate-limit, rollout-adapter	`bscode`	mostly — could be lifted to a template
Concrete tools (`fs_write`, `bash`, `read_file`, ...)	`bscode`	task-specific
Concrete verifiers (`BuildPassesVerifier`, `VisualAssertVerifier`)	`wasmagent-js` (lib) + `bscode` (adapter)	task-specific

AEP v0.3 field	Why it matters more for ERP than for coding
`side_effect_class`	Coding: read vs write vs network. ERP: read vs financial-mutate vs network-egress-to-third-party. Distinction is regulatory.
`state_digest_kind: "db-rowset"` + coverage descriptor	Pre/post digest over an explicit table + row predicate is exactly the shape an ERP post-condition verifier needs. The coverage descriptor (`database`, `table`, `rows_predicate`) was designed with this in mind even though the prompt was @armorer-labs.
`argument_drift`	High-stakes: a model that "approves PO #1234" then drifts to "approves PO #5678" is a real bug class in production. v0.3's one-record-per-event rule makes the audit trail explicit.
`approval_mode: "bounded-lease"`	The natural shape for "this agent can post journal entries in cost-centre X for the next 60 minutes, up to 10 entries, total ≤ $50k".
`deny_reason_class: "missing-delegation"`	Maps directly to SoD (segregation-of-duties) violations in financial controls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: erp-agent — bscode sibling for ERP / business-API agents #8

Summary

Why this is structurally possible

Proposed structure

Verifier taxonomy

Compatibility with AEP v0.2 / v0.3

Training-data strategy

Opportunity

Constraint

Provenance

Choice of first ERP target

What needs to change in existing repos

Phased rollout

Phase 1 — Architecture lock (1 week, this RFC's scope)

Phase 2 — PoC (≤ 4 weeks)

Phase 3 — 1–2 paying customers (3–6 months)

Risks

Non-goals

Open questions

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Verifier family	What it checks	Example
State machine	Did the entity transition through a legal state edge?	`quote → sales_order` requires customer.credit_status == "ok"
Ledger / balance	Do debits == credits across the affected accounts?	After AP voucher post: AP↑, Cash↓ or Expense↑, sum balanced
Dual-write consistency	If the agent wrote to two systems (e.g., CRM + ERP), do they agree?	Salesforce Opportunity.amount == NetSuite Estimate.totalAmount
Permission boundary	Did the call respect the principal's role / segment / region?	Buyer in EMEA cannot approve PO over €5k without VP sign-off
Idempotency	Did a retry produce the same observable effect?	Two POSTs with the same idempotency-key → one row, not two
Audit-trail completeness	Did the underlying ERP create the expected audit records?	Approval action → audit_log row with principal + reason
Schema-drift detection	Did the response still match the contract we trained on?	NetSuite added a field, our prompt template now drifts

Training stage	Pool bscode + erp-agent?
SFT — general capability (tool use, instruction following)	yes
SFT — domain reasoning	no (separate models)
DPO — "follow tool schema correctly"	yes
DPO — "right answer"	no
Router training (which task → which capability)	yes
Verifier ensemble for trust score	yes (each verifier reports independently)

Target	Pro	Con
Odoo (open source, XML-RPC + REST)	Source-available; testable locally; large SME market; SDKs in many languages	Less brand presence with enterprise procurement
NetSuite SuiteQL	Strong mid-market; reasonable API; query-rich	Account access expensive; auth (TBA) tedious
SAP S/4 OData	Largest TAM; API is well-typed	Sandbox access locked behind partner agreements; long sales cycle

Uh oh!

RFC: erp-agent — bscode sibling for ERP / business-API agents #8

Description

Summary

Why this is structurally possible

Proposed structure

Verifier taxonomy

Compatibility with AEP v0.2 / v0.3

Training-data strategy

Opportunity

Constraint

Provenance

Choice of first ERP target

What needs to change in existing repos

Phased rollout

Phase 1 — Architecture lock (1 week, this RFC's scope)

Phase 2 — PoC (≤ 4 weeks)

Phase 3 — 1–2 paying customers (3–6 months)

Risks

Non-goals

Open questions

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions