Skip to content

test(bench): Phase 2 real-flow CU bench — v1 vs new on Hadrian#171

Open
anil-rome wants to merge 5 commits into
masterfrom
feat-flow-bench-v2
Open

test(bench): Phase 2 real-flow CU bench — v1 vs new on Hadrian#171
anil-rome wants to merge 5 commits into
masterfrom
feat-flow-bench-v2

Conversation

@anil-rome
Copy link
Copy Markdown
Contributor

Summary

Phase 2 of the universal-delegation CU benchmark. Where Phase 1 (#170) measured primitive-level CU savings for A1-A6 selectors, this PR measures real production-contract flow-level CU on Hadrian by deploying NEW (post-#165/#166/#167/#168/#169) versions side-by-side with v1 (currently deployed) contracts and benchmarking user-facing ops.

Headline real-flow numbers (Hadrian, mean of 3 samples)

Op v1 CU v2 CU Save %
SPL_ERC20.transfer 261,432 262,724 +1,292 +0.5% (no regression ✓)
SPL_ERC20.approve 414,104 211,997 −202,107 −48.8%
SPL_ERC20.transferFrom (self-allowance) 258,015 263,313 +5,298 +2.1% (no regression ✓)
SPL_ERC20.bridgeOutToSolana 582,407 351,968 −230,439 −39.6%
RomeBridgeWithdraw.burnUSDC 952,642 799,637 −153,005 −16.1%
ERC20SPLFactory.create_token_mint 460,082 198,706 −261,376 −56.8%

Expected-vs-measured alignment:

Architecture — Option B (side-by-side)

Layer v1 (currently deployed on Hadrian) v2 (deployed this PR)
ERC20SPLFactory 0x21cc267f… 0xfF96ab9E… (deploys own ERC20Users)
SPL_ERC20 for the bench 0x94AC3E5e… (wUSDC) 0xD39EC36a… (wBench wraps a fresh mint; deployer minted supply via HelperProgram.mint_spl)
RomeBridgeWithdraw 0xf48902e4… 0xAa457897… (constructor reuses v1's wUSDC + wETH wrappers so bridges share state)

Why NEW.SPL_ERC20 wraps a fresh mint instead of v1's wUSDC: separate wrapper contracts own separate Solana PDAs which own separate ATAs — there's no way to share v1's wUSDC SPL balance with a new wrapper. The wrapper-level operations (transfer/approve/transferFrom/bridgeOutToSolana) have identical code paths regardless of underlying mint, so the CU comparison is fair.

Unmeasured this run

  • RomeBridgeWithdraw.approveBurnETH / burnETH — deployer has no wETH balance on Hadrian. Bridging in ETH from Sepolia would take ~10-15 min; deferred to a follow-up. Phase 1 primitive measurements (A2 −154K + A3 ×2 = −187K) gives expected −241K CU per Wormhole outbound bridge.
  • ERC20SPLFactory.init_token_mint — partial (1/3 v1 samples captured; v2 errored). Likely deployer PDA lamport reserve exhausted after multiple create_token_mint calls. Future runs should top up more aggressively.

Test plan

  • 3 NEW contracts deployed cleanly on Hadrian (5/5 setup steps in the deploy script)
  • 6 of 9 flows captured with 3-sample data on both v1 and v2
  • Per-sample EVM tx hashes + Solana sigs preserved in artifact JSON
  • No regression on already-migrated ops (transfer, transferFrom)
  • Re-run for burnETH/approveBurnETH after bridging in wETH
  • Re-run for init_token_mint with larger PDA lamport top-up

Replay

export HARDHAT_VAR_HADRIAN_PRIVATE_KEY=...
npx hardhat run scripts/bench/deploy-real-flow-v2.ts --network hadrian
npx hardhat run scripts/bench/measure-real-flows.ts --network hadrian

Cross-references

Anil Kumar added 5 commits May 17, 2026 00:53
Deploys NEW (post-#165/#166/#167/#168/#169) versions of ERC20SPLFactory,
SPL_ERC20 wBench wrapper, and RomeBridgeWithdraw side-by-side with v1
(currently deployed) contracts on Hadrian. Measures real production-contract
CU on 6 user-facing flows via the same 3-sample real-receipt methodology
as the primitive baseline.

Methodology: production-contract tx -> rome_solanaTxForEvmTx -> Solana
getTransaction -> meta.computeUnitsConsumed. No synthetic probes — these
are the real CU users pay.

Side-by-side architecture:
  - NEW ERC20SPLFactory creates own ERC20Users + a wBench mint via the
    factory's own create_token_mint + init_token_mint path (exercises A4 + A5)
  - NEW SPL_ERC20 wBench wraps the fresh mint we control; deployer mints
    supply via HelperProgram.mint_spl
  - NEW RomeBridgeWithdraw constructor reuses v1's wUSDC + wETH wrappers
    so both bridges operate on the same wrapper state for fair compare

Captured (mean of 3 samples, Hadrian 2026-05-16/17):

  Op                                    v1 CU      v2 CU      Save        %
  ---                                   -----      -----      ----        -
  SPL_ERC20.transfer                  261,432    262,724      +1,292   +0.5%
  SPL_ERC20.approve                   414,104    211,997    -202,107  -48.8%
  SPL_ERC20.transferFrom              258,015    263,313      +5,298   +2.1%
  SPL_ERC20.bridgeOutToSolana         582,407    351,968    -230,439  -39.6%
  RomeBridgeWithdraw.burnUSDC         952,642    799,637    -153,005  -16.1%
  ERC20SPLFactory.create_token_mint   460,082    198,706    -261,376  -56.8%
  ERC20SPLFactory.init_token_mint     229,548    (errored)       n/a     n/a

Per-flow expectations matched: transfer/transferFrom show ~0 delta as
they were already migrated in #143/#163 (no regression confirmed).
bridgeOutToSolana shows the A1 win (recipient ATA-create migration).
burnUSDC shows the A3 + canonical-ATA win.
create_token_mint shows the A4 (System CreateAccount via
HelperProgram.create_mint_account) win.

Unmeasured this run:
  - RomeBridgeWithdraw.approveBurnETH + burnETH (deployer has no wETH
    balance on Hadrian; bridging ETH in from Sepolia takes ~10-15 min,
    deferred)
  - init_token_mint partial — v1 captured 1 of 3 samples; v2 errored
    with code -32000 (likely insufficient PDA lamports after multiple
    create_token_mint calls drained the reserve; future runs should
    top up more aggressively)

Artifacts:
  - deployments/hadrian.real-flow-bench.json — v1 + v2 addresses + wBench
    mint identity
  - deployments/hadrian.real-flow-bench.results.json — per-sample EVM tx
    hashes + Solana sigs + CU readings

Scripts:
  - scripts/bench/deploy-real-flow-v2.ts — single-shot deploy harness
  - scripts/bench/measure-real-flows.ts — 9-op x 2-version x 3-sample bench runner

Replay:
  export HARDHAT_VAR_HADRIAN_PRIVATE_KEY=...
  npx hardhat run scripts/bench/deploy-real-flow-v2.ts --network hadrian
  npx hardhat run scripts/bench/measure-real-flows.ts --network hadrian
Adds Romeswap (Uniswap V2 fork) DEX flow benchmark scaffold for Hadrian.
Bench covers 2 pair types:
  - wrapped x wrapped:  wUSDC (v1) x wBench (v2)
  - wrapped x plain ERC20: wUSDC x MOCK (deployed via existing ERC20Factory)

Per-pair ops (Rome-required multi-tx breakdown):
  - createPair (1 sample; idempotent)
  - addLiquidity (3-tx: tokenA->pair, tokenB->pair, pair.mint)
  - swap (2-tx: tokenIn->pair, pair.swap)
  - removeLiquidity (2-tx: lp->pair, pair.burn)

CAPTURED THIS RUN:
  Pair A (wUSDC x wBench) createPair: 1,085,739 CU
  Pair B (wUSDC x MOCK)   createPair: 1,088,488 CU

  Pair addresses:
    A: 0x45350dF36fA7334C2E267598Af8fC136e4982A9E
    B: 0x9FB2471A400CA670F5459829b622A2f4d4824642
  MOCK token: 0x5cB734B113E31005487D7E4bcA39BCC3e17B8e9A

  Both ~1.08M CU per pair — that's the cost of CREATE2-deploying a Romeswap
  pair contract on Hadrian today. Slightly below the 1.4M Solana tx ceiling.
  No v1-vs-v2 split — UniswapV2Factory's createPair logic is wrapper-agnostic.

UNMEASURED THIS RUN (state-setup blockers; needs follow-up):
  - addLiquidity / swap / removeLiquidity for both pair types

  Root cause: each pair receives wrapper SPL tokens which auto-creates the
  pair's external_auth PDA + per-mint ATAs on Solana, with rent paid from the
  CALLER (deployer)'s PDA reserve. After multiple prior bench runs depleted
  the reserve and the top-up tx (swap_gas_to_lamports 20M) failed silently
  on this run, the wrapper.transfer-to-pair txs reverted with
  SimulateTransactionError: mollusk error: Failure(Custom(1)) = InsufficientFunds.

  Fix path for next run: top up PDA lamports in smaller chunks (5M x 4)
  with explicit balance verification between steps; reduce surface to
  ONE pair type per session to avoid cascading state needs.

Also includes measure-bridge-eth-flows.ts (focused script awaiting wETH
bridge-in; not run this session per operator's pivot to DEX).

Replay:
  export HARDHAT_VAR_HADRIAN_PRIVATE_KEY=...
  npx hardhat run scripts/bench/measure-dex-flows.ts --network hadrian
Hypothesis: the Rome convention of splitting addLiquidity/removeLiquidity/swap
across multiple EVM txs is unnecessary. The router overhead (slippage logic,
optimal-amounts derivation, path math) is what busts the 1.4M CU ceiling,
NOT the underlying pair operations themselves.

Test: deploy a minimal RomeswapDirect.sol that takes pre-computed amounts
and does the token transfers + pair op inline as a single function.
Measure on Hadrian via real on-chain receipts (mean of 3 samples,
post-pair-warm-up).

RESULT — all three flows fit comfortably:

  addLiq 1-tx     mean 972,935 CU   ✓ 31% margin to 1.4M ceiling
  swap 1-tx       mean 886,119 CU   ✓ 37% margin
  removeLiq 1-tx  mean 1,187,699 CU ✓ 15% margin

Per-sample CU (cold sample 1 includes pair-state initialization):
  addLiq:     [1,073,102 | 929,628 | 916,075]   steady-state ~920K
  swap:       [  897,304 | 881,564 | 879,488]   steady-state ~880K
  removeLiq:  [1,200,709 | 1,180,440 | 1,181,948]

Implication: a thin Solidity router (RomeswapDirect-style) collapses the
3-tx addLiquidity / 2-tx swap / 2-tx removeLiquidity flows to ONE EVM tx
each. No precompile needed. The user signs once per op instead of 2-3 times.

The pattern that works:
  - Pre-compute all amounts off-chain (rome-ui or any frontend)
  - Submit one tx through a minimal router
  - User pre-approves the router on the tokens (one-time setup per token)

What the existing UniswapV2Router does that pushes it over the ceiling:
  - optimalAmounts() math (~150K CU of EVM ops)
  - slippage check (additional storage reads)
  - multi-hop path walking
  - per-token allowance double-check via transferFrom retries

Trim those → fits. RomeswapMinimalRouter spec recommended as a follow-up.

Note: createPair + addLiq cannot be combined atomically (createPair alone
is ~1.08M; +addLiq ~970K = ~2M > 1.4M). They stay as two separate user txs.

Deployment receipts on Hadrian:
  - RomeswapDirect: 0x95E85BEaeF5D3043f415D61216bAECb1b131BE44
  - Pair A (wUSDC x wBench): 0x45350dF36fA7334C2E267598Af8fC136e4982A9E
    (reserves: 102K / 102K post-bench)

Per-sample EVM tx hashes + Solana sigs in
deployments/hadrian.romeswap-direct.results.json

Replay:
  export HARDHAT_VAR_HADRIAN_PRIVATE_KEY=...
  npx hardhat run scripts/bench/measure-romeswap-direct.ts --network hadrian
Hypothesis: 2-hop swap (wBench → wUSDC → MOCK via Pair A + Pair B)
fits in 1.4M CU atomically.

Result: HYPOTHESIS FALSIFIED. All 3 samples rejected at rome-sdk
preflight with code -32000 (TooManyComputeUnitsInAtomicTx signature).

  Setup verified:
    Pair A seeded: 100266 / 99684 (wUSDC / wBench)
    Pair B seeded: 100000 / 100000 (MOCK / wUSDC)
    Quote: 100 wBench → 100 wUSDC → 99 MOCK
    All approvals + lamport top-ups confirmed before bench

  Bench: 3 samples, all rejected at preflight (off-chain simulation
  exceeds 1.4M; no Solana tx submitted).

Analytical estimate had 2-hop at ~1.34M (4% margin). Real cost lands
just above ceiling — likely due to auto-ATA verification per pair.swap
destination and EVM-side composition overhead.

Implications for target architecture:
  - 1-hop swap: 1-tx atomic (886K, comfortable) ✓
  - 2+ hop swap: split into N sequential 1-tx swaps by rome-ui
  - Each split tx retains slippage + deadline; only inter-hop
    atomicity is lost (intermediate token sits in user wallet ~5s
    between hops)

Path to 2-hop atomic (if ever needed): strip fee-on-transfer support
from pair.swap (save ~100K per hop → 2-hop ~1.24M, would fit). Trade
locks out Token-2022 fee tokens + MetaHook + tax-token forks. Not
recommended; future composability > one extra atomic hop.

Adds:
  - contracts/cpi/test/RomeswapDirect.sol: swap2Hop function
  - scripts/bench/measure-2hop-swap.ts: dedicated 2-hop bench
  - deployments/hadrian.2hop-bench.results.json: per-sample data
…-#364

Tests whether canonical Uniswap V2 Router operations (unmodified — plain
vanilla Uniswap V2 deployed at 0xB342f70D...) fit Solana's 1.4M CU ceiling
on Hadrian after universal-delegation savings (#364).

RESULT: only single-hop swap fits. All other ops bust 1.4M.

  Op                                                    CU         Verdict
  --                                                    --         -------
  swapExactTokensForTokens single-hop (wUSDC→wBench)    1,101,044  ✓ fits (21% margin)
  swapExactTokensForTokens 2-hop                        n/a        ✗ preflight reject
  addLiquidity                                          n/a        ✗ preflight reject
  removeLiquidity (LP pre-approved)                     n/a        ✗ preflight reject
  removeLiquidityWithPermit (valid signature)           n/a        ✗ preflight reject

Comparison vs lean direct executor (RomeswapDirect from prior bench):
                          lean       canonical   overhead
  swap single-hop         886K       1,101K      +215K (24%)
  addLiquidity            973K       busts       >150K (pushes over)
  removeLiquidity         1,187K     busts       >213K (pushes over)

The canonical router's ~200-300K of safety overhead (safeTransferFrom
wrappers, slippage logic, _addLiquidity optimal-amounts derivation, multi-hop
path-walking) is what separates "fits" from "busts." Post-#364 savings
helped the lean path; not enough to cover the canonical overhead.

removeLiquidityWithPermit ALSO busts — permit eliminates a separate approve
tx but the underlying remove operation itself doesn't fit. So canonical
router's permit family is unusable on Rome.

Implication for unified DEX UX using only canonical infrastructure:
  - 1-tx swap (single-hop): YES via canonical, full Uniswap V2 safety features
  - addLiquidity: stays 3-tx (Rome direct-pair convention)
  - removeLiquidity: stays 2-tx (Rome direct-pair convention)
  - 2+ hop swap: stays N-tx (split per hop)

Permit setup verified: EIP-712 domain name is 'Romeswap V2' (not 'Uniswap V2');
the rome-uniswap-v2 fork branded the LP token. With correct domain, permit
signature recovers correctly — busts CU during the underlying operation,
not at signature verification.

Per-sample EVM tx hashes + Solana sigs in
deployments/hadrian.canonical-router.results.json.

Replay:
  npx hardhat run scripts/bench/measure-canonical-router.ts --network hadrian
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant