Summary
Under load, the Commonware resolver actor panics with "resolver should not finish". After Docker auto-restarts the node, the transaction pool retains stale transactions from before the crash. The block builder then enters a permanent failure loop because on-chain nonces were reset but pool txs have higher nonces.
Additionally, after any node restart, the resolver permanently blocks all peers within milliseconds because EVM block verification requires sequential parent snapshots that are lost on restart. This makes catch-up impossible.
Load Test Evidence (2026-05-22)
During a 1,000-tx load test:
ERROR commonware_runtime::utils::handle: task panicked err="resolver should not finish"
After restart, permanent block builder failure:
WARN build_block: execution failed height=288 txs=195 error=TxExecution("Transaction(NonceTooHigh { tx: 24, state: 0 })")
This repeated 1,373 times in a single minute.
Impact
- Any node restart is potentially fatal to the network
- Two restarts in a 4-validator cluster = permanent quorum loss
- Rolling upgrades are impossible
- No mechanism to unblock peers or recover without full cluster restart
Root Cause
- Resolver panic: The resolver actor terminates unexpectedly under high load
- Peer blocking: After restart,
verify_block() returns false when parent snapshots are missing (cold cache), and the resolver permanently blocks the peer
- Stale pool: Restarted node retains pool txs with nonces ahead of the reset state
See tmp/issues/04-resolver-permanent-peer-blocking-after-restart.md for the full writeup.
Summary
Under load, the Commonware resolver actor panics with
"resolver should not finish". After Docker auto-restarts the node, the transaction pool retains stale transactions from before the crash. The block builder then enters a permanent failure loop because on-chain nonces were reset but pool txs have higher nonces.Additionally, after any node restart, the resolver permanently blocks all peers within milliseconds because EVM block verification requires sequential parent snapshots that are lost on restart. This makes catch-up impossible.
Load Test Evidence (2026-05-22)
During a 1,000-tx load test:
After restart, permanent block builder failure:
This repeated 1,373 times in a single minute.
Impact
Root Cause
verify_block()returnsfalsewhen parent snapshots are missing (cold cache), and the resolver permanently blocks the peerSee
tmp/issues/04-resolver-permanent-peer-blocking-after-restart.mdfor the full writeup.