Skip to content

supernode: try native engine reset for invalidation rewinds#20964

Draft
karlfloersch wants to merge 1 commit into
karl/op-reth-reorg-reprofrom
karl/op-reth-reorg-native-reset
Draft

supernode: try native engine reset for invalidation rewinds#20964
karlfloersch wants to merge 1 commit into
karl/op-reth-reorg-reprofrom
karl/op-reth-reorg-native-reset

Conversation

@karlfloersch
Copy link
Copy Markdown
Contributor

Summary

Experimental recovery branch on top of karl/op-reth-reorg-repro. Instead of manually rewinding the EL and recreating the virtual node, invalidation rewinds now route through the op-node engine reset path.

Changes:

  • add an op-node ForceEngineReset entrypoint that runs EngineController.ForceReset on the driver event loop
  • expose that through the supernode virtual node abstraction
  • update ChainContainer.RewindEngine to compute reset targets and issue the native reset
  • if the existing interop freeze protocol has stopped the target VN, resume that chain and wait briefly for the live driver before issuing the reset
  • keep the existing SafeDB reset behavior by draining the engine reset confirmation event

Local Validation

Passing:

go test ./op-supernode/supernode/chain_container/...
go test ./op-node/rollup/driver ./op-node/node
go test ./op-supernode/supernode/activity/interop -run 'TestApplyPendingTransition|TestReset|TestRewind|TestInvalidateBlock'\n```\n\nFocused repro run:\n\n```\nLOG_LEVEL=info mise exec -- go test -v ./op-acceptance-tests/tests/interop/reorgs \\\n  -run '^TestReorgInvalidExecMsgOpRethProofsHistoryTinyWindow$' \\\n  -count=1 -timeout=8m\n```\n\nResult: the repro test fails with `unexpected recovery`, which is the expected signal for this branch. Relevant local log points from `/tmp/op-reth-reorg-native-reset-rerun.log`:\n\n- native reset fired: `Engine is manually force-reset` for chain 901 at unsafe 14 / safe 6 / finalized 0\n- supernode SafeDB reset followed: `Resetting safe head db` to safe block 6\n- op-reth still reproduced the proofs-history crash: `ExEx proofs-history crashed: Attempted to unwind to block 15 beyond earliest stored block 20`\n- test restarted op-reth after the RPC failure\n- after restart, the repro assertion failed because EL unsafe advanced to block 22 instead of staying stuck\n\n## Notes\n\nThis is intentionally separate from #20962. It is a cleaner-looking direction because the reset goes through op-node’s existing engine reset event/confirmation path rather than manually clearing SafeDB from the chain container.\n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant