Skip to content

RFC: erp-agent — bscode sibling for ERP / business-API agents #8

Description

@telleroutlook

Summary

Proposes a fourth repository in the WasmAgent ecosystem — erp-agent
— positioned as a sibling of bscode rather than a replacement.
The two reference apps share the same runtime (wasmagent-js) and
data factory (trace-pipeline); they differ only in their tools,
verifiers, and target deployment surface.

The thesis: the wasmagent flywheel is task-agnostic by design.
bscode is the first reference app (coding); ERP / business-API
agents are a different vertical with materially different economics
and a much weaker public data baseline. Adding a sibling proves the
task-agnostic claim and opens a market with stronger willingness
to pay than coding tools.

This is a scoping RFC: it asks for agreement on the structure
and the boundary between "what lives in the new repo" vs. "what gets
upstreamed into wasmagent-js / aep / trace-pipeline". It is not yet
a build plan.

Why this is structurally possible

The current ecosystem already separates task-agnostic primitives
from task-specific glue, even though we've only ever instantiated
one task (coding via bscode). The split:

Layer Repo Task-agnostic?
WASM runtime, model adapters, ranking, AEP emitter wasmagent-js yes
MCP firewall, gateway, taint, consent, lease wasmagent-js yes
Compliance verifier framework + repair planner wasmagent-js mostly — verifier interface is task-agnostic; concrete verifiers are not
AEP record schema (v0.2, v0.3 in #7) wasmagent-js yes
validate-aep, trust-score, audit-report trace-pipeline yes
TrainingDataExporter (SFT / DPO / PPO / router records) trace-pipeline yes
Cloudflare worker shell, auth, session, KV, rate-limit, rollout-adapter bscode mostly — could be lifted to a template
Concrete tools (fs_write, bash, read_file, ...) bscode task-specific
Concrete verifiers (BuildPassesVerifier, VisualAssertVerifier) wasmagent-js (lib) + bscode (adapter) task-specific

Concretely, packages/core/src/agents/verifiers/types.ts:38-44 already declares the
verify_method field as an open string union — built-ins are
listed for autocomplete, but applications can register custom kinds
via VerificationPipeline.register(). The docstring explicitly says:

This keeps the protocol product-agnostic — bscode's "build_passes"
or a CI's "lighthouse_score_min" verifier registers without
touching WasmAgent core.

ERP-specific verifiers (order_state_machine_valid, ledger_balanced,
permission_boundary_respected, ...) fit the same registration
pattern. Nothing in the runtime needs to know they exist.

Proposed structure

erp-agent/                                ← new sibling repo
├── apps/
│   ├── worker/                           ← runtime shell (mirrors bscode)
│   │   ├── src/
│   │   │   ├── tools/                    ← ERP-specific tools
│   │   │   │   ├── odoo-xmlrpc.ts
│   │   │   │   ├── netsuite-suiteql.ts
│   │   │   │   ├── sap-odata.ts
│   │   │   │   └── domain-glossary.ts
│   │   │   ├── verifiers/                ← ERP-specific verifiers
│   │   │   │   ├── order-state-verifier.ts
│   │   │   │   ├── ledger-balance-verifier.ts
│   │   │   │   ├── permission-boundary-verifier.ts
│   │   │   │   └── dual-write-consistency-verifier.ts
│   │   │   ├── rollout-adapter.ts        ← copy bscode pattern
│   │   │   ├── trajectoryExport.ts       ← AEP emission, mirrors bscode
│   │   │   ├── auth.ts                   ← strict auth like bscode
│   │   │   ├── mcp.ts                    ← gateway mounted at /mcp
│   │   │   └── ...
│   │   └── package.json                  ← depends on @wasmagent/* via npm
│   └── web/                              ← operator approval UI (optional)
├── .githooks/                            ← same pre-push hook
├── docs/
│   └── BRANCH_PROTECTION.md              ← pointer to wasmagent-js canon
└── README.md

The shared-with-bscode surface is roughly:

  • apps/worker/src/app.ts (Hono app skeleton)
  • apps/worker/src/middleware/auth.ts, rateLimit.ts
  • apps/worker/src/config/productionGuard.ts
  • apps/worker/src/build-results.ts → renamed to verifier-results.ts (same shape, different semantic)
  • apps/worker/src/rollout-adapter.ts (same pattern, adapted to ERP verifiers)
  • apps/worker/src/trajectoryExport.ts
  • apps/worker/scripts/test-aep-roundtrip.ts

The diff-from-bscode surface is:

  • apps/worker/src/tools/ (all ERP API SDK wrappers + governance metadata)
  • apps/worker/src/verifiers/ (domain logic — the moat)
  • apps/worker/src/policies/ (lease shapes specific to financial/order side-effects)

Verifier taxonomy

Coding verifiers operate on deterministic build artifacts
(exitCode === 0). ERP verifiers operate on business-state
invariants
which are mostly post-condition checks against the
target system's API. Proposed taxonomy:

Verifier family What it checks Example
State machine Did the entity transition through a legal state edge? quote → sales_order requires customer.credit_status == "ok"
Ledger / balance Do debits == credits across the affected accounts? After AP voucher post: AP↑, Cash↓ or Expense↑, sum balanced
Dual-write consistency If the agent wrote to two systems (e.g., CRM + ERP), do they agree? Salesforce Opportunity.amount == NetSuite Estimate.totalAmount
Permission boundary Did the call respect the principal's role / segment / region? Buyer in EMEA cannot approve PO over €5k without VP sign-off
Idempotency Did a retry produce the same observable effect? Two POSTs with the same idempotency-key → one row, not two
Audit-trail completeness Did the underlying ERP create the expected audit records? Approval action → audit_log row with principal + reason
Schema-drift detection Did the response still match the contract we trained on? NetSuite added a field, our prompt template now drifts

The first four are the high-value moat; the last three are
defensive. All seven plug into VerificationPipeline.register()
exactly like BuildPassesVerifier does today.

Compatibility with AEP v0.2 / v0.3

ERP tool calls map cleanly onto the current AEP record shape and
benefit from the v0.3 additions in #7 more than coding does:

AEP v0.3 field Why it matters more for ERP than for coding
side_effect_class Coding: read vs write vs network. ERP: read vs financial-mutate vs network-egress-to-third-party. Distinction is regulatory.
state_digest_kind: "db-rowset" + coverage descriptor Pre/post digest over an explicit table + row predicate is exactly the shape an ERP post-condition verifier needs. The coverage descriptor (database, table, rows_predicate) was designed with this in mind even though the prompt was @armorer-labs.
argument_drift High-stakes: a model that "approves PO #1234" then drifts to "approves PO #5678" is a real bug class in production. v0.3's one-record-per-event rule makes the audit trail explicit.
approval_mode: "bounded-lease" The natural shape for "this agent can post journal entries in cost-centre X for the next 60 minutes, up to 10 entries, total ≤ $50k".
deny_reason_class: "missing-delegation" Maps directly to SoD (segregation-of-duties) violations in financial controls.

No new AEP schema needs to be invented for ERP. The v0.3 RFC fields
work as-is. This is the strongest argument that the architecture
is task-agnostic in practice, not just in slides.

Training-data strategy

Coding has abundant public training signal (SWE-bench, MBPP,
HumanEval, IFEval). ERP has near-zero — every customer's business
rules, fields, and permissions are different, and no one publishes
training data over real ledger data.

This is both the opportunity and the constraint.

Opportunity

Training records produced by an ERP agent operating under real
business constraints are scarce by definition. They are the moat
that bscode-derived coding data cannot be.

Constraint

You cannot fork-execute ERP API calls the way you can fork
sandbox builds. There is no "try 100 branches, see which one
posts the right journal entry" — every call has externally-visible
side effects (or audit-log entries even if "rolled back"). Three
implications:

  1. Generation happens in production runs, not in synthetic
    sweeps.
    A human operator + AI assistant produces one
    trajectory per real task. Trust-score gating and AEP signature
    verification become more important, not less.

  2. Verifier-based reward, not fork ranking. RolloutForkRunner
    doesn't fit. Instead, RolloutSingleRunner + verifier ensemble
    produces a labelled record. Routes more like RLHF-from-real-use
    than DPO-from-ranked-rollouts.

  3. Training stays close to customer data boundary. Two
    acceptable shapes:

    • Local training: customer runs trace-pipeline inside
      their VPC, model weights never leave.
    • Federated contribution: customer opts in to share
      redacted training records (using AEP's existing
      redaction_profile field at packages/aep/src/types.ts:55) in exchange for
      improved model weights.

Provenance

trace-pipeline/evomerge/schemas/training.py already carries a
Provenance.source: str field on every SftTrainingRecord /
DpoTrainingRecord. bscode emits source = "bscode-trajectory";
erp-agent would emit source = "erp-agent-trajectory" (or finer,
e.g. "erp-agent-odoo"). Downstream training can either filter by
source (separate domain models) or pool them (general capability
SFT across both verticals).

Pooling decision matrix:

Training stage Pool bscode + erp-agent?
SFT — general capability (tool use, instruction following) yes
SFT — domain reasoning no (separate models)
DPO — "follow tool schema correctly" yes
DPO — "right answer" no
Router training (which task → which capability) yes
Verifier ensemble for trust score yes (each verifier reports independently)

Choice of first ERP target

PoC should target one ERP, not a portfolio. Three candidates:

Target Pro Con
Odoo (open source, XML-RPC + REST) Source-available; testable locally; large SME market; SDKs in many languages Less brand presence with enterprise procurement
NetSuite SuiteQL Strong mid-market; reasonable API; query-rich Account access expensive; auth (TBA) tedious
SAP S/4 OData Largest TAM; API is well-typed Sandbox access locked behind partner agreements; long sales cycle

Proposed first target: Odoo. Reasons:

  • Lowest friction to set up a real test environment (Docker compose).
  • Open source means the schema and the SDKs are public — we can write
    reference verifiers without an NDA.
  • SME segment ≈ best fit for an MIT-licensed reference app: customers
    willing to self-host AI tooling tend to also self-host their ERP.
  • Once Odoo proves the pattern works, the SAP/NetSuite adaptation is
    mostly a different SDK call inside the same tools/ shape.

Open question: is there appetite for a parallel erp-agent-netsuite
branch / fork from day one, or pick-one-and-finish-it?

What needs to change in existing repos

Mostly nothing. The reference design is "drop a new sibling
repo, depend on the same npm packages, define your own tools and
verifiers." Concrete required changes:

  1. wasmagent-js — optionally extract apps/worker from bscode
    into a reusable template package (@wasmagent/worker-template).
    This is not blocking the erp-agent PoC; it's a quality-of-life
    refactor that would let future siblings (feat: zero-tech-debt — brand, schema, tier, stability, e2e data loop #3, chore: release packages #4, ...) skip the
    90% boilerplate copy.

  2. wasmagent-js/packages/aep — no schema change beyond what
    v0.3 (#7) already proposes. ERP-specific fields stay in tool
    payload, not in the AEP envelope.

  3. trace-pipeline — no schema change. Add an entry to the
    documented list of recognised Provenance.source values
    (purely documentation; the field is already str).

  4. docs/ecosystem.md — update the diagram to show two
    reference apps under the same runtime + data factory. The
    "How the loop closes" pseudocode becomes generic ("agent runs
    tasks → …") with bscode and erp-agent as parallel instances.

  5. docs/BRANCH_PROTECTION.md — extend the scope sentence to
    include the new repo. Already a shared canonical doc per the
    recent reorg, so this is a one-line edit.

Phased rollout

Three phases. Phase 1 commits to nothing concrete; phase 2 commits
to engineering work; phase 3 commits to customers.

Phase 1 — Architecture lock (1 week, this RFC's scope)

  • Agree on the structure proposed here (or its revisions in
    comments).
  • Pick first ERP target.
  • Decide whether to extract @wasmagent/worker-template now or
    later. (Recommendation: later — copy bscode first; extract once
    the pattern is proven across two repos.)

No code changes.

Phase 2 — PoC (≤ 4 weeks)

  • Stand up erp-agent repo, mirror bscode's worker structure.
  • Implement 3–5 Odoo tools (read partner, read invoice, create
    quote, update customer, read journal entry).
  • Implement 2–3 verifiers (order_state_machine_valid,
    customer_field_consistency, permission_boundary_respected).
  • Run one end-to-end loop: operator says "create a quote for
    customer ABC" → AI calls Odoo → AEP records the action →
    trace-pipeline computes trust score → output the first ERP
    training record (Provenance.source = "erp-agent-odoo").

Success criterion: produce one verified ERP training record.
That's enough to prove the pipeline; everything after is volume.

Phase 3 — 1–2 paying customers (3–6 months)

  • Recruit two design-partner customers (one Odoo, one larger ERP
    for the upgrade path).
  • Local-training shape: customer runs trace-pipeline inside their
    VPC; we ship updates as model-merge recipes, not weights.
  • Federated-contribution shape: customer opts in to share
    redacted SFT/DPO records; in exchange they get improved adapter
    weights.

Success criterion: a verified ERP-domain DPO record set that
outperforms a coding-only-trained baseline on the customer's own
held-out tasks.

Risks

  1. Verifier-development cost is real. ERP verifiers need
    domain experts who've actually implemented SAP / NetSuite /
    Odoo flows. Hiring or partnering for this is a different
    problem than hiring engineers. Mitigation: start with
    permission-boundary and idempotency verifiers (cheap, general);
    take state-machine and ledger verifiers as a learning curve.

  2. Customers won't share ledger data even if anonymized.
    This is the open-source-AI version of every healthcare data
    problem. Mitigation: lead with local-training, treat
    federated contribution as opt-in upside, not the default.

  3. A bad PoC could damage the bscode story. If erp-agent
    ships visibly buggy verifiers, anyone evaluating bscode will
    wonder if the runtime is at fault. Mitigation: clear
    labelling (erp-agent is experimental); separate maturity
    tiers in the org README; don't cross-link until erp-agent is
    beta.

  4. Engineering bandwidth. With three repos and one
    maintainer + one new contributor, adding a fourth is
    ambitious. Mitigation: Phase 2 is ≤ 4 weeks of one
    contributor's time, ~80% copy-paste from bscode. The
    expensive parts are tools + verifiers, which are
    straightforwardly parallel work to add a contributor for.

Non-goals

  • Not a generic "ERP integration platform" (Mulesoft / Boomi /
    Tray.io territory). The point is agent training data, not
    integration plumbing.
  • Not a hosted SaaS product. The reference app is MIT
    self-host like bscode; commercial offering (if any) sits at
    the trace-pipeline data-loop layer.
  • Not a Salesforce / Workday / SAP partner integration. Those
    are sales channels, not engineering deliverables.
  • Not changing the AEP v0.3 design to accommodate ERP. The
    v0.3 fields already fit; if they don't, we report back to
    #7 as a finding from a second runtime — which is exactly
    the empirical bar #7 sets for promoting decision_envelope
    to normative in v0.4.

Open questions

  1. @wasmagent/worker-template — extract now or later? Extract
    now means erp-agent and any future sibling share a real package;
    later means we copy bscode and refactor after the pattern is
    proven twice. (Recommendation in this RFC: later.)
  2. First ERP target — Odoo (recommended) or NetSuite?
  3. Verifier ownership — @wasmagent/erp-verifiers as a npm
    package
    that lives in wasmagent-js (next to
    @wasmagent/core/agents/verifiers/), or kept inside
    erp-agent until a third consumer needs it?
  4. Repo visibility — public from day one (matches bscode)
    or private until PoC is presentable?
  5. Domain expert recruiting — does this RFC's acceptance
    imply a hiring commitment? (Not necessarily; could be
    contractor / advisor for the verifier portion.)
  6. AEP v0.3 ERP feedback loop — if erp-agent finds the v0.3
    schema insufficient for some ERP scenario, that's a strong
    signal for the v0.4 design. Should we instrument the PoC to
    report back v0.3-coverage findings to #7?

Related


Comments welcome from anyone who has built agents against a
production ERP, particularly on the verifier taxonomy (which
classes I'm missing, which ones are too general to be useful) and
on the federated-vs-local training-data shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    rfcRequest for Comments — design proposal

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions