Skip to content

docs(m2): testnet runbook + engine.m2.toml + just run-m2 (validated boot)#31

Open
brunota20 wants to merge 1 commit into
feat/supervisor-integration-tests-cow-1068from
feat/m2-runbook-and-smoke-config
Open

docs(m2): testnet runbook + engine.m2.toml + just run-m2 (validated boot)#31
brunota20 wants to merge 1 commit into
feat/supervisor-integration-tests-cow-1068from
feat/m2-runbook-and-smoke-config

Conversation

@brunota20

Copy link
Copy Markdown
Collaborator

What does this PR do?

Adds an M2 testnet runbook + the engine config to actually run M2 against Sepolia. Closes the gap "M2 is unit + integration tested but has never been exercised against a real chain".

Why

The 5 supervisor integration tests (COW-1068) confirm wit-bindgen + WitBindgenHost + dispatch work on synthetic events. They do NOT confirm:

  • WS `eth_subscribe` streams survive on a real RPC
  • Real Sepolia block / log payloads decode through the modules
  • Manifest capability sets resolve correctly at boot
  • The orderbook actually accepts what the modules submit

This PR wires the operator path so any of us can run M2 end-to-end against Sepolia in ~30s.

Changes

File Purpose
`engine.m2.toml` engine config: Sepolia WS RPC + two `[[modules]]` entries (twap-monitor, ethflow-watcher), `state_dir = "./data/m2"` so it does not collide with M1 example runbook
`docs/operations/m2-testnet-runbook.md` 200-line operator runbook: prerequisites, smoke run, round-trip run (Safe + Compose for TWAP / cow.fi for EthFlow), state inspection, scope boundaries, troubleshooting, references
`justfile` new `build-m2` + `run-m2` recipes

Validated

Booted the engine against Sepolia public WS locally - the observed log output is captured verbatim in runbook section 1 so future operators know what healthy looks like:

```
INFO opening chain RPC provider chain_id=11155111 url="wss://..."
INFO loading module manifest manifest=modules/twap-monitor/module.toml
[manifest] required capabilities: logging, local-store, chain, cow-api
INFO compiling component component=target/wasm32-wasip2/release/twap_monitor.wasm
INFO twap-monitor init module="twap-monitor"
INFO init succeeded module=twap-monitor
INFO loading module manifest manifest=modules/ethflow-watcher/module.toml
[manifest] required capabilities: logging, local-store, chain, cow-api
INFO compiling component component=target/wasm32-wasip2/release/ethflow_watcher.wasm
INFO ethflow-watcher init module="ethflow-watcher"
INFO supervisor up count=2
INFO supervisor ready modules=2 chains=1
INFO block subscription open chain_id=11155111
INFO log subscription open module=twap-monitor chain_id=11155111
INFO log subscription open module=ethflow-watcher chain_id=11155111
```

Clean SIGTERM shutdown.

Breaking changes

None. New files + 2 new justfile recipes.

Testing

  • `just run-m2` boots against Sepolia public WS, both modules init, all 3 subscriptions open.
  • Clean SIGTERM shutdown.
  • `./data/m2` gitignored (already in `/data/` blanket).
  • No em-dashes in new files (one pre-existing em-dash in `justfile` line 13 is mfw78's M1 prose, intentionally untouched).
  • Operator runs the round-trip section 2 with a real Safe + EthFlow swap. Requires test ETH + wallet setup; cannot automate.

What this PR does NOT do

Explicit in runbook section 4:

  • Throughput / 7-day soak -> COW-1031.
  • Cross-module isolation under load -> COW-1064 (4-6h e2e).
  • Adversarial resource exhaustion -> COW-1036.
  • Security review -> COW-1065.

AI assistance disclosure

AI Assistance: this change + description was produced by a Claude Code agent (Claude Opus 4.7 1M context). A human (Bruno) reviewed and is accountable for the result. The Sepolia boot validation was run by the agent in the local repo.

Stacks on #30 (COW-1068 supervisor integration tests) -> #29 (COW-1067 SDK doctests) -> #28 (COW-1069 rustdoc gate) -> #27 (COW-1066 CI matrix) -> #26 (COW-1063 QA cleanup).

… boot)

Wires up the M2 milestone for actual testnet exercise on Sepolia.
Closes the gap "M2 is fully tested in unit + integration but has
never been run against a real chain".

## New files

- `engine.m2.toml` - workspace-root engine config that boots
  `twap-monitor` + `ethflow-watcher` against Sepolia public WS.
  Separate `state_dir = "./data/m2"` so it never collides with
  the M1 example runbook.
- `docs/operations/m2-testnet-runbook.md` - 200-line runbook with
  6 sections:
    0. Prerequisites (rustup target, just, Sepolia RPC, faucet)
    1. Smoke run (passive, observe traffic on Sepolia)
    2. Round-trip run (author a TWAP via Safe + Compose + an
       EthFlow swap via cow.fi, watch end-to-end submission)
    3. Inspecting state after a run
    4. What this run does NOT prove (and which issues cover that)
    5. Troubleshooting matrix
    6. References (engine_config schema, ADRs, PR range)
- `justfile` recipes:
    build-m2: cargo build both M2 wasm modules
    run-m2:   build-m2 + build-engine + cargo run engine

## Validated locally

Booted `cargo run -p nexum-engine -- --engine-config engine.m2.toml`
against Sepolia public WS. Observed in ~1s wall clock:

  - WS provider opened against ethereum-sepolia-rpc.publicnode.com
  - Both manifests parsed; both capability sets resolved
    (logging + local-store + chain + cow-api)
  - Both wasm components compiled
  - Both `init` succeeded
  - `supervisor up count=2`, `supervisor ready modules=2 chains=1`
  - All 3 subscriptions opened cleanly:
      block subscription chain_id=11155111
      log subscription module=twap-monitor chain_id=11155111
      log subscription module=ethflow-watcher chain_id=11155111
  - Clean SIGTERM shutdown

The actual observed log output is captured verbatim in the runbook
section 1 so future operators know what "healthy" looks like.

## Scope

- The smoke half (section 1) is passive: it validates boot +
  subscription health without producing traffic. Useful before
  every round-trip.
- The round-trip half (section 2) requires a Sepolia Safe + test
  ETH + interaction with the Compose Safe app / cow.fi UI. Cannot
  be automated from CI (chain-side actions need a wallet). Operator
  works through the steps.
- What this does NOT prove is explicit in section 4: throughput
  / soak (COW-1031), cross-module isolation under load (COW-1064),
  adversarial resource exhaustion (COW-1036), security review
  (COW-1065).

## Not addressed

- Env-var substitution in engine.toml (e.g. `${SEPOLIA_RPC}`) is
  not wired in the engine today; runbook documents the workaround
  (edit URL inline). Filing as a follow-up is out of scope here -
  if needed, add as an M4 nice-to-have.
- `ls-dump` CLI binary referenced in section 3 does not exist yet;
  section explicitly says "no ls-dump bin in 0.2; proper inspector
  is M4 scope" and falls back to re-booting the engine on the same
  state_dir to inspect rows via the dispatch logs.

Linear: stacks on COW-1068. No new issue created - this is
documentation work supporting the existing M2 milestone, not a new
deliverable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant