Skip to content

execution/state, stagedsync: parallel-exec SD-revival and metamorphic-CREATE2 fixes#21590

Open
mh0lt wants to merge 2 commits into
mainfrom
mh/parallel-exec-sd-revival-metamorphic-create2
Open

execution/state, stagedsync: parallel-exec SD-revival and metamorphic-CREATE2 fixes#21590
mh0lt wants to merge 2 commits into
mainfrom
mh/parallel-exec-sd-revival-metamorphic-create2

Conversation

@mh0lt
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt commented Jun 2, 2026

Closes #20711.

Summary

Correctness fixes for parallel execution around the SELFDESTRUCT-revival and metamorphic-CREATE2 patterns. Surfaced as a chiado from-0 wrong-trie-root and a sepolia tx-4 missed CallNewAccountGas (25,000); both classes resolved by this PR.

Six interrelated sub-fixes:

  • execution/stagedsync/exec3_parallel.gocalcFees: nil pre-state must not short-circuit EIP-161 emptyPre when the worker has already bumped Nonce or set CodeHash, otherwise the emitted SelfDestructPath=true wipes those writes via normalizeWriteSet's sdSet filter.
  • execution/stagedsync/exec3_parallel.gonormalizeWriteSet: detect SD-then-revival by scanning the versionMap history for any prior SelfDestructPath=true write (new AnyDoneBoolWriteEquals helper) rather than relying on the latest-value read, which a revival flips back to false. Narrower than an IncarnationPath probe: pure CREATE (no prior SD=true) doesn't wipe pre-existing storage, so its same-value SSTOREs still no-op against pre-block.
  • execution/state/intra_block_state.gogetVersionedAccount: honour prior-tx SelfDestructPath cell even when CachedReaderV3 (which bypasses the versionMap) returns the pre-SD record. Includes the same-tx metamorphic SD+CREATE2 revival check (AddressPath at TxIdx >= destructTxIndex is the signal; strict > misses it).
  • execution/state/intra_block_state.goCreateAccount: default the synthetic IncarnationPath/BalancePath reads to (StorageRead, UnknownVersion) rather than inheriting the outer (source, version) which may carry refreshVersionedAccount's BalancePath promotion and trip the validator. Only emit IncarnationPath on contractCreation=true (a CreateAccount(false) carries newObj.data.Incarnation=0 and would clobber a prior tx's SD-side cell).
  • execution/state/intra_block_state.goaccountRead: drop the (MapRead, V) promotion when vm.Read(AddressPath) returns None, to avoid a non-converging validator livelock.
  • execution/state/rw_v3.goLightCollector: only emit IncarnationPath on incarnation up-revs. A value-transfer to a same-block SD'd address leaves newObj.Incarnation=0 while original carries the prior block's value; emitting the down-rev would clobber the SD-side cell and break a same-block CREATE2's prevInc lookup.
  • execution/state/versionedio.goversionedStateReader.ReadAccountData: mirror IBS's same-tx metamorphic-recreation check (AddressPath at TxIdx >= destructTxIndex before falling back to strict > subfield checks).
  • execution/state/versionmap.go: new VersionMap.AnyDoneBoolWriteEquals helper scans cells for a Done write at TxIdx <= limit whose data equals target. Used by the narrowed Fix 2 above.

Test plan

  • Unit tests — four regression tests in execution/state/versionedio_test.go: TestAccountRead_BalancePathPromotion_DoesNotInvalidate, TestCreateAccount_SyntheticIncarnationStamp_DoesNotInvalidate, TestGetVersionedAccount_PriorTxSelfDestruct_ReturnsNil, TestGetVersionedAccount_SameTxMetamorphicRecreate_ReturnsAccount. All pass.
  • Chiado from-0 to tip — 21.44M blocks, EXEC3_PARALLEL=true, --prune.mode=archive. Passed the original parallel_exec: chiado exec from 0 trie-root missmatch #20711 regression block (7,221,794) cleanly and reached tip.
  • Sepolia from-0 past 4,913,058 — confirmed at 16:08 UTC; chain currently past 4.93M with no failures. (The 0x140da4236… metamorphic-deploy pattern at 4,913,056-57 was the original sepolia regression motivating this work.)
  • Hoodi from-0 — still soaking past 1.5M, no failures.
  • Mainnet from-0 — still soaking, no failures.
  • make lint clean.

Notes

  • The narrowed Fix 2 was iterated to address a chiado 7,221,794 regression introduced by an earlier IncarnationPath-based form of the same probe — that broader probe wrongly preserved same-value SSTOREs that canonical correctly dropped. The history-scan-on-SelfDestructPath=true form is strictly narrower.
  • An unrelated parallel-exec failure pattern surfaced on gnosis at blocks 14,594,499 / 14,837,280 / 14,845,967 / 14,892,336 during the soak — see docs/plans/20260602-gnosis-parallel-exec-race-14594499.md (separate follow-up; appears to be an OCC race / scheduling artifact, not the SD-revival class addressed here).

…-CREATE2 fixes

Bundle of correctness fixes for parallel exec around the SELFDESTRUCT
revival and metamorphic-CREATE2 patterns. Validated by from-0 sync of
chiado to tip (21.4M, including the regression noted below) and sepolia
past block 4,913,058.

execution/stagedsync/exec3_parallel.go
- calcFees: nil pre-state must not short-circuit EIP-161 emptyPre when
  the worker has already bumped Nonce or set CodeHash; otherwise the
  emitted SelfDestructPath wipes those writes via normalizeWriteSet's
  sdSet filter.
- normalizeWriteSet: detect SD-then-revival by scanning the versionMap
  history for any prior SelfDestructPath=true write (new
  AnyDoneBoolWriteEquals helper) rather than relying on the latest-value
  read which a revival flips back to false. Narrower than an
  IncarnationPath probe: pure CREATE (no prior SD=true) doesn't wipe
  pre-existing storage, so its same-value SSTOREs still no-op against
  pre-block. Earlier IncarnationPath-based form regressed chiado
  7,221,794 by preserving same-value SSTOREs that canonical correctly
  dropped.

execution/state/intra_block_state.go
- getVersionedAccount: honour prior-tx SelfDestructPath cell even when
  CachedReaderV3 (which bypasses the versionMap) returns the pre-SD
  record. Includes the same-tx metamorphic SD+CREATE2 revival check
  (AddressPath at TxIdx >= destructTxIndex is the signal; strict >
  would miss it).
- CreateAccount: default the synthetic IncarnationPath and BalancePath
  reads to (StorageRead, UnknownVersion) rather than inheriting the
  outer (source, version), which may carry refreshVersionedAccount's
  BalancePath promotion and trip the validator's recursive AddressPath
  check.
- CreateAccount: only emit IncarnationPath on contractCreation=true.
  CreateAccount(false) leaves newObj.data.Incarnation at 0 and emitting
  it would clobber a prior tx's SD-side IncarnationPath cell.
- accountRead: drop the (MapRead, V) promotion when vm.Read(AddressPath)
  returns None, to avoid a non-converging validator livelock.

execution/state/rw_v3.go
- LightCollector: only emit IncarnationPath on incarnation up-revs. A
  value-transfer to a same-block SD'd address leaves newObj.Incarnation
  at 0 while original carries the prior block's value; emitting the
  down-rev would clobber the SD-side IncarnationPath cell and break a
  same-block CREATE2's prevInc lookup.

execution/state/versionedio.go
- versionedStateReader.ReadAccountData: mirror IBS's same-tx
  metamorphic-recreation check (AddressPath at TxIdx >= destructTxIndex
  before falling back to the strict > subfield checks).

execution/state/versionmap.go
- New VersionMap.AnyDoneBoolWriteEquals helper that scans the cells for
  a Done write at TxIdx <= limit whose data equals target. Used to
  detect a prior in-block SelfDestructPath=true write that a later
  revival flipped back to false (a case Read alone, which is
  latest-only, misses).

execution/state/versionedio_test.go
- Four regression tests pinning the IBS / versionedStateReader behavior
  above: BalancePath-promoted AddressPath read, synthetic
  IncarnationPath stamp, prior-tx SD with stale stateReader record, and
  same-tx metamorphic SD+CREATE2.
@mh0lt mh0lt requested a review from yperbasis as a code owner June 2, 2026 18:01
mh0lt added a commit that referenced this pull request Jun 2, 2026
Flip the EXEC3_PARALLEL env var default from false to true, making
parallel execution the default for plain `./build/bin/erigon` runs.
Setting EXEC3_PARALLEL=false (or ERIGON_EXEC3_PARALLEL=false) still
forces the serial path.

Depends on #21590 (the parallel-exec SD-revival and metamorphic-CREATE2
fixes) — that PR must merge first so the new default lands with the
correctness fixes already in place. The 5-chain soak validation that
backs the flip is documented in #21590's test plan.
mh0lt added a commit that referenced this pull request Jun 2, 2026
Flip the EXEC3_PARALLEL env var default from false to true, making
parallel execution the default for plain `./build/bin/erigon` runs.
Setting EXEC3_PARALLEL=false (or ERIGON_EXEC3_PARALLEL=false) still
forces the serial path.

Depends on #21590 (the parallel-exec SD-revival and metamorphic-CREATE2
fixes) — that PR must merge first so the new default lands with the
correctness fixes already in place. The 5-chain soak validation that
backs the flip is documented in #21590's test plan.
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is red

@yperbasis yperbasis requested a review from taratorio June 3, 2026 07:21
…IncarnationPath

The gate was added to avoid clobbering a prior tx's SD-side
IncarnationPath cell when a value-transfer revival writes
newObj.data.Incarnation=0 — meant to protect the metamorphic
SD+CREATE2 case. Two problems:

- It breaks TestDeleteRecreateAccount: pre-Cancun SD followed by a
  value-transfer revival expects IncarnationPath to reset on the
  revival; skipping the write leaves the SD's stale incarnation in the
  versionMap and the trie root diverges from canonical.
- It is redundant. LightCollector's strict-greater incarnation gate
  (rw_v3.go: `accountCopy.Incarnation > original.Incarnation`) already
  suppresses the down-rev emission on the apply side, so the metamorphic
  case is still covered without this earlier IBS-side skip.

Revert restores the original unconditional emit. Sepolia regression
tests + adjacent SD/CREATE/CVE tests in execution/tests all pass with
EXEC3_PARALLEL=true after the revert.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Findings to address (follow-ups unless noted):

  1. TestGetVersionedAccount_PriorTxSelfDestruct_ReturnsNil passes on main — the NewVersionedStateReader wrapper's pre-existing SD check returns nil before the new gate is ever load-bearing, so the fix isn't pinned by any test. Hand the IBS the raw accountStateReader (the CachedReaderV3 analogue from the gate's comment); I verified that variant is red on main / green here.

  2. Question on the >= revival check: the only production writer I can find that places AddressPath@N next to SelfDestructPath=true@N is calcFees' tip-credit sibling emission (worker TxOut filters AddressPath out whenever final SD=true). For a coinbase contract that self-destructs without recreate, serial burns the tip (EIP-7708 case 2, per runPostApplyMessageOnMinIBS), while the >= makes tx N+1's calcFees see the coinbase as revived with tip_N and emit tip_N + tip_{N+1}, carrying the pre-SD nonce/codeHash along. If there's no guard I missed, a targeted re-exec of the mainnet 25151825 window would settle it. Related: which flush other than calcFees can produce the vm state hand-built in TestGetVersionedAccount_SameTxMetamorphicRecreate_ReturnsAccount (SD=true + AddressPath at one TxIdx with the account ending alive)?

  3. The LightCollector.UpdateAccountData change looks inert: CollectorWrites are captured and fee-adjusted but never flushed to the versionMap or applied to domains (applyResult.writes comes from normalizeWriteSet(blockIO.WriteSet); resolveStorageWrites/filterWritesByVersionMap have only test callers). The head commit's rationale — "LightCollector's strict-greater gate already suppresses the down-rev on the apply side" — then doesn't hold: the down-rev IncarnationPath=0 cell from CreateAccount(addr, false) reaches the vm via TxOut at HEAD, same as main. The revert stands on TestDeleteRecreateAccount alone; please fix the narrative, and the PR body still lists the reverted contractCreation gate as a live fix.

  4. The from-0 soaks predate the head commit, which changed IncarnationPath emission in exactly the SD-revival scenarios targeted here; only unit/EEST suites ran after the revert. Cheap ask: re-exec the chiado 7,221,794 and sepolia 4,913,058 windows at HEAD.

  5. Residual sharp edges worth tracking issues: (a) versionedRead's per-path revival is still strict-> (versionedio.go:874) — probing getVersionedAccount with a CodeHashPath cell at the destruct TxIdx returns a zeroed CodeHash despite the cell holding the new hash, and the new >= gate now routes such reads into that merge; (b) the accountRead downgrade leaves a narrow validation blind spot — Done-path reads with copyV==nil are never recorded, so the promoted sub-field dependency's only carrier was the AddressPath stamp; after the downgrade, a prior tx re-executing with a different balance while dropping its AddressPath write validates as None+StorageRead with no value check. Recording the promoted read at its own sub-path would close it.

  6. Nits: the 4-probe revival block is now duplicated (intra_block_state.go:1070, versionedio.go:289) and must stay in lockstep — extract a VersionMap helper; AnyDoneBoolWriteEquals could be named for what it detects (e.g. HadSelfDestruct); several new comments run 3-6 sentences with scenario walk-throughs — per the repo comment guidelines the forensic detail belongs in the commit message.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +1911 to 1916
// Don't inherit outer (source, version): it may carry
// refreshVersionedAccount's BalancePath promotion (MapRead, V_bal),
// which stamps IncarnationPath with a cell vm never wrote and trips
// the validator's recursive AddressPath check.
incSource, incVersion := StorageRead, UnknownVersion
if sdb.versionMap != nil {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parallel_exec: chiado exec from 0 trie-root missmatch

3 participants