Stake-weighted Schelling consensus on AI output quality, used as a payment-settlement primitive. A research-grade protocol on Arc that asks: what if AI service payments resolved on a market-derived quality score, not just delivery?
Read this first: this is a research-grade protocol with no production adopters yet. The mechanism design is the contribution. Use it as a reference for thinking about quality-conditional settlement on programmable money; don't deploy it under real value without an audit + real validator bootstrap. Honest limits section is below. Self-run Slither static analysis reports no high or medium severity findings.
6 protocol versions shipped (v0 → v0.6), each frozen as deployable artifact
142 forge tests + 7 SDK tests passing
v0.6 live on Arc Testnet — deployed for ~0.088 USDC of gas
TypeScript SDK supports both v0 (initial release) and v0.6 (latest)
MIT licensed, no admin keys, no upgrade proxy, ~2,600 LOC Solidity
Built above Cadence (Arc402) base payment layer
Composes upward via a CrucibleMetricOracle adapter that exposes a
resolved market's scoreBps as an IMetricOracle for Helm
(deployed at 0x8d7efaacbf2e944e459801f891577b40fa6124c4 on Arc Testnet,
maintained in the Helm repo: github.com/Ccheh/helm)
What this is: research-grade reference implementation
What this isn't: production payment rail (yet) — no third-party adopters,
pre-audit, validator network not bootstrapped
| Component | Address on Arc Testnet |
|---|---|
| CrucibleMarketV6 | 0x6535a3cbb4235746b732ab5d55c6b0988f381a20 |
| TestcaseResolverV5 | 0x51cc924fe83dc5221150f5752454a37121bE3957 |
Deployment txs:
- Market:
0x37c23d5b... - Resolver:
0xaca5f288...
| Component | Address |
|---|---|
| CrucibleMarket | 0x61996d505d6510a339f39c9923519b2f5350f61c |
| TestcaseResolver | 0xa12874e9f77be35efb9e3aeb19eb547b9f224195 |
| MockResolver | 0x76696e3c541eb32c81cfc1cbfb3e5e5ef1c4d35f |
| Version | Headline addition | Tests |
|---|---|---|
| v0 | per-call market + pluggable resolver + optimistic settle | 30 |
| v0.2 | slashing + reward fee pool + pendingVotes guard | +24 |
| v0.3 | stake-weighted median + dispute bond | +22 |
| v0.4 | MasterChef subscription pool + 40% voting cap + ERC-8004 events | +23 |
| v0.5 | commit-reveal voting + per-market disputeBondBps + config MIN_STAKE | +32 |
| v0.6 | force-resolve fallback for stuck disputed markets | +11 |
| total | 142 |
Today's payment rails (Stripe, Lightning, x402, Circle Nanopayments, Cadence) treat AI calls as deterministic transactions — pay X, receive Y, done. But AI is probabilistic: outputs are stochastic, quality is subjective. Existing rails settle on delivery confirmation, not outcome quality.
Crucible explores what a quality-conditional settlement primitive looks like: payment held in escrow, output evaluated by a stake-weighted validator consensus, funds released proportional to a 0–10000 quality score.
| Protocol | Quality awareness | Resolution mechanism |
|---|---|---|
| Stripe | None (fraud signals only) | Centralized chargeback |
| Lightning | None | None |
| Coinbase x402 | None | 200 OK = paid |
| Circle Nanopayments | None | Same as x402 |
| Cadence (Arc402) | None | None |
| Crucible | Market-resolved score per call | Pluggable resolvers + stake-weighted validator consensus + optimistic dispute |
The mechanism is closer to UMA's optimistic oracle / Augur's stake-weighted Schelling consensus than to Polymarket-style order-book markets:
- ✅ Like prediction markets: subjective claims are resolved by economic stake-weighted voting; honest validators are rewarded, dissenters slashed proportional to distance from consensus.
- ❌ Unlike prediction markets: there is no order book, no continuous price discovery, no liquidity provider. The "market" is a one-shot voting round with a 30-min commit + 30-min reveal window.
So "prediction-market-settled" is a marketing handle. The technically precise label is stake-weighted Schelling consensus on output quality, with proportional-distance slashing. The README uses the looser phrase because it lands faster; the contracts implement the precise mechanism.
import { ServiceClient, AgentClient, CRUCIBLE_ARC_TESTNET, codeGenCommitment } from "@crucible/sdk";
import { parseEther, keccak256, toBytes } from "viem";
// Service side: deposit bond, sign auth per call
const service = new ServiceClient({ privateKey: SERVICE_PK, marketAddress: CRUCIBLE_ARC_TESTNET.market });
await service.depositBond(parseEther("1"));
await service.setResolverAllowed(CRUCIBLE_ARC_TESTNET.mockResolver, true);
const code = await yourLLM(prompt);
const signedAuth = await service.signOpenAuth({
agent: agentAddress,
resolver: CRUCIBLE_ARC_TESTNET.mockResolver,
amount: parseEther("0.01"),
bondLockAmount: parseEther("0.05"),
commitmentHash: codeGenCommitment({ input: prompt, testcases, expectedOutputHash: keccak256(toBytes(code)) }),
disputeWindow: 60,
});
// Agent side: open market with signed auth + payment
const agent = new AgentClient({ privateKey: AGENT_PK, marketAddress: CRUCIBLE_ARC_TESTNET.market });
const { marketId } = await agent.openMarket(signedAuth);
// ... agent runs testcases ... if pass: wait + collect; if fail: disputeWorking end-to-end demo (real Arc Testnet tx) ships in sdk-ts/examples/full-lifecycle.ts:
git clone https://github.com/Ccheh/crucible.git
cd crucible/sdk-ts && npm install
# Set PRIVATE_KEY (agent) + SERVICE_PRIVATE_KEY in ../.env (your Arc Testnet keys)
npm run demoTotal wall-clock: ~80 seconds (60s dispute window + 4 on-chain txs).
Full protocol lifecycle exercised on Arc Testnet. Every step is a real transaction:
Phase 1 — CrucibleMarket optimistic settlement
| Step | tx |
|---|---|
| service deposits 0.5 USDC bond | 0xbed641ed... |
| service whitelists resolver | 0xa0de7856... |
| agent opens market via EIP-712 auth | 0x616c8d57... |
| collect after 60s → resolved at score 10000 | market 0xaf28e414... status=3 ✅ |
Phase 2 — TestcaseResolver validator network
| Step | tx |
|---|---|
| main wallet stakes 0.2 USDC as validator | 0x0117371c... |
| validator votes scoreBps=7500 | 0x65c71cc1... |
Phase 3 — Full lifecycle via @crucible/sdk
| Step | tx |
|---|---|
| agent opens market via SDK | 0xd3fc1968... |
| agent collects after window via SDK | 0x396551e8... |
Not unit-test-only. Real EVM execution, verifiable on https://testnet.arcscan.app.
| Role | What they do | Skin in the game | Reward |
|---|---|---|---|
| Agent | Pays for AI service, optionally disputes (with bond from v0.3+) | USDC in escrow + dispute bond | Refund proportional to (10000-score) |
| Service | Provides AI output, commits to quality claim | Bond posted to bondPool, locked per market | Payment proportional to score |
| Validator | Stakes USDC, commits + reveals a vote per disputed market (v0.5+) | Validator stake (>= MIN_STAKE) | Subscription yield (v0.4+) + dispute reward share |
| Resolver | Pluggable on-chain verification logic | None (pure code) | None directly; receives fees + subscriptions for distribution |
The protocol is permissionless by design — no KYC, no central operator, no admin keys. The active set is empty today: no third-party services use Crucible for real traffic, and the validator network is the smart contract waiting for stakers. This is infrastructure waiting for adopters.
| Resolver | What it verifies | Trust model | v0 status |
|---|---|---|---|
TestcaseResolver |
Code-generation outputs vs. testcases | Validators run sandbox | shipped |
MockResolver |
Testing only (takes score from calldata) | None | shipped |
OracleResolver |
Real-world predictions vs. ground truth | Chainlink / Pyth / UMA | v0.2 |
ValidatorVoteResolver |
Subjective quality (translation, creative) | Schelling point of staked validators | v0.2 |
TEEResolver |
Inference-integrity proofs | Trusted hardware attestation | v0.3 |
ZkMlResolver |
Pure cryptographic proof of inference | ZK ML | future |
- Agent pays a code-gen service 0.05 USDC to write a Python function.
- Service returns code + posts commitment hash on-chain.
- Agent runs testcases locally; if they pass, lets the dispute window expire — service gets paid in full.
- If tests fail, agent disputes within the window.
- Validators (anyone with 0.1 USDC staked in TestcaseResolver) run the testcases themselves, vote stake-weighted on the pass rate.
- Contract auto-resolves: payment to service =
escrow × score / 10000; remainder + proportional service-bond slash → agent.
Crucible is a layer above Cadence (Arc402), not a replacement.
┌────────────────────────────────────────────┐
│ Application: paid AI service │
├────────────────────────────────────────────┤
│ ★ Crucible — quality-outcome settlement │ ← this repo
│ • per-call prediction markets │
│ • pluggable resolvers (testcase / oracle │
│ / validator-vote / TEE / ZK-ML) │
│ • permissionless validator economics │
├────────────────────────────────────────────┤
│ Cadence (Arc402) — payment escrow │
│ • PaymentEscrowV2 (existing, live) │
│ • EIP-712 signed claims │
│ • batched settlement │
├────────────────────────────────────────────┤
│ Arc — chain (USDC as native gas) │
└────────────────────────────────────────────┘
A Crucible-protected service can:
- Use Cadence's PaymentEscrow as its USDC escrow contract
- Add Crucible quality-outcome layer on top
- Settle through Cadence's batch path or directly via Crucible
The mechanism design and engineering are real. The market validation is not. Specifically:
- No production adopters. Every on-chain transaction was generated by our own scripts. No third-party AI service uses Crucible. The "validator network" today is the smart contracts, not an active set of staked validators — we deployed the infrastructure but it has not bootstrapped a real network.
- Pre-audit. 142 forge tests pass, but no independent security audit. Treat as testnet-only research code.
- The killer demo (real LLM end-to-end) is in progress — see
sdk-ts/examples/. The deterministic mock LLM is shipped; the real-API integration is the next milestone, not a current claim. - ERC-8004 reputation events are emitted but not yet read by any indexer. The schema is designed for forward compatibility when ERC-8004 indexers emerge; today they are just structured log events.
- Arc-specificity is loose. Crucible could run on any EVM chain. We chose Arc because (a) USDC native gas keeps sub-cent settlement clean, and (b) Arc is Circle's agentic-economy bet. There is no technical mechanism that requires Arc specifically.
- Schelling consensus has a known >50%-stake-attack ceiling. The 40% voting weight cap mitigates the 40–70% range; >70% stake by a single coordinated party cannot be mitigated by any one-shot mechanism. This is a property of the design, not a bug.
- Validator economics require dispute volume to bootstrap. Subscription pool (v0.4) gives validators baseline yield from all settlements, but the absolute amounts at testnet scale are negligible. Real economics need mainnet traffic.
If you're considering integrating, treat this as research infrastructure on a probabilistic-AI-payment thesis Circle is also pursuing, not as production-ready rails.
✅ Contracts on Arc Testnet (six versions, v0 → v0.6): CrucibleMarket*, TestcaseResolver*, MockResolver, IResolver interface, IResolverFeeReceiver interface, IResolverSubscriptionReceiver interface
✅ TypeScript SDK: @crucible/sdk with v0 clients + new v06 module (ServiceClientV6, AgentClientV6, ValidatorClientV6)
✅ Spec v0: 15 sections + v0.2–v0.6 addenda in docs/spec-v0.md
✅ End-to-end optimistic-path demo on Arc Testnet (real txs) — see sdk-ts/examples/v06-optimistic.ts
✅ 142 forge tests + 7 SDK tests passing
⏳ Open items (we are deliberately stopping protocol work to focus here):
- Real LLM integration in a demo (no more API stubs)
- Real-chain dispute-path lifecycle evidence (commit + reveal + resolveDisputed txs in README)
- Independent audit (M2 of original roadmap)
- Mainnet deploy with raised MIN_STAKE
❌ NOT in any current version: ZK-ML resolver, TEE resolver, mainnet, audit, third-party integrators.
| Folder | Purpose |
|---|---|
contracts/ |
Solidity contracts (Foundry) — CrucibleMarket.sol, resolvers/, tests, deploy script |
sdk-ts/ |
TypeScript SDK + end-to-end lifecycle demo |
docs/ |
Protocol spec + security considerations |
examples/ |
Live smoke test scripts |
Zen Chen — MSc Data Science (Sheffield). Building on Arc.
Crucible's resolution mechanism is closest in spirit to UMA's optimistic-oracle stake-weighted Schelling consensus, applied to AI service quality at per-call granularity. The "prediction market" framing in the lead is a marketing handle; the precise label is documented in The thesis section.