Add op-reth proofs-history reorg crash repro#20956
Draft
karlfloersch wants to merge 6 commits into
Draft
Conversation
dea0b9b to
80b548d
Compare
Contributor
Author
Prod log comparison against this reproI pulled the sdg-v1 Loki logs for Production signalRelevant prod context:
What the logs show:
I did not find a literal Local repro signalThe local repro creates the same class of failure with a deliberately tiny proofs-history window:
Match / gapThe repro matches the core prod shape:
Known differences:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a minimal draft interop acceptance repro for op-reth
proofs-historyduring a CL-driven invalid executing-message reorg.The repro path is
TestReorgInvalidExecMsgOpRethProofsHistoryTinyWindow. It is intentionally a negative/failing test right now: a failure withrepro observedmeans the repro worked.Test Flow
--proofs-history.window=1.LogIndex.op-test-sequencer.repro observed.supernode_syncStatusfor sustained post-restart non-recovery.unexpected recovery.unexpected supernode recovery.repro observed.Observed Repro
Latest focused restart run showed:
24d174..7c1ebb:15and childebbd92..01431b:16--proofs-history.window=1ExEx proofs-history crashed: Attempted to unwind to block 14 beyond earliest stored block 20ExEx proofs-history crashed: Parent hash mismatch at block 22chain 901 not ready for timestamp ... failed to determine L2BlockRef of height 15repro observedRepro Confidence
This now proves the restart shape we care about in devstack:
proofs-historycan kill op-reth during a CL-driven rewind, op-reth can fail again after restart against the same data directory, and the supernode does not recover automatically during the observation window.Important caveat: this is still a forced-retention repro. It does not prove the full production state under normal proof retention, and the restarted node currently crashes again rather than remaining live with forkchoice pinned while serving partial RPC.
Review Notes / Follow-ups
Validation
mise exec -- go test ./op-acceptance-tests/tests/interop/reorgs -run '^$' -count=0mise exec -- go test ./op-devstack/dsl ./op-devstack/presets ./op-devstack/sysgo -run '^$' -count=09658dedaab:LOG_LEVEL=info mise exec -- go test -v ./op-acceptance-tests/tests/interop/reorgs -run '^TestReorgInvalidExecMsgOpRethProofsHistoryTinyWindow$' -count=1 -timeout=8m/tmp/op-reth-reorg-tiny-proof-window-restart-bounded.log.cd op-acceptance-tests && mise exec -- just build-depsbuilt contracts forge artifacts, op-program/cannon prestates, and release Rust binaries forkona-node,kona-host, andop-reth; it then stopped at the laterop-rbuilderstep because this worktree has no rootop-rbuilder/directory.