Skip to content

Remove RequestState trait from lookup sync#9391

Merged
mergify[bot] merged 12 commits into
sigp:unstablefrom
dapplion:remove-request-state-trait
Jun 3, 2026
Merged

Remove RequestState trait from lookup sync#9391
mergify[bot] merged 12 commits into
sigp:unstablefrom
dapplion:remove-request-state-trait

Conversation

@dapplion
Copy link
Copy Markdown
Collaborator

@dapplion dapplion commented Jun 1, 2026

#9155 is quite complex and sensitive so I believe it's best to review the refactor first without any of the Gloas noise.

This PR combines PR #9327's processing-result classification redesign (BlockProcessingResult becomes Imported/ParentUnknown/Error with producer-side penalty classification in network_beacon_processor) with PR #9155's RequestState trait removal (concrete BlockRequest/DataRequest state machine in single_block_lookup.rs, concrete download/ processing handlers in block_lookups/mod.rs). The common.rs RequestState trait is deleted. ParentUnknown remains reactive (surfaced from the processing result), as in unstable. All gloas constructs are stripped.

AI disclosure

AI ported the trait request change of #9155 to sigp/unstable. Then fixed by hand ~50% of the code to reduce the diff by half. All lines have been reviewed carefully by me

… classification

Combine PR sigp#9327's processing-result classification redesign (BlockProcessingResult
becomes Imported/ParentUnknown/Error with producer-side penalty classification in
network_beacon_processor) with PR sigp#9155's RequestState trait removal (concrete
BlockRequest/DataRequest state machine in single_block_lookup.rs, concrete download/
processing handlers in block_lookups/mod.rs). The common.rs RequestState trait is
deleted. ParentUnknown remains reactive (surfaced from the processing result), as in
unstable. All gloas constructs are stripped.
@dapplion dapplion requested a review from jxs as a code owner June 1, 2026 20:11
@dapplion
Copy link
Copy Markdown
Collaborator Author

dapplion commented Jun 1, 2026

Test 1 — Kurtosis devnet

Setup: 4× Lighthouse + geth, 3s slots, all running the branch binary af82c75, via ethereum-package 5.0.1.

Method: Stopped cl-4's beacon node at head 15, let the network advance ~7 slots to 22, restarted it.

Result: Recovered via lookup sync — sync_lookups_created_total 7, sync_lookups_completed_total 7 (0 failed), and all
range-sync batch counters = 0 (process metrics reset on restart, so these are entirely post-restart). Node went Stalled
→ Synced, rejoined head. ✅

Test 2 — Live Ethereum mainnet

Branch binary af82c75 checkpoint-synced to real mainnet head with a mock EL (returns VALID → fully verified, ~720 MB
disk). Mainnet is post-Fulu, so custody/data-column lookups are exercised too. Three scenarios:

  • Stop 1–2 slots / restart: killed the beacon node at head, ~18s downtime, restarted → recovered via lookup sync
    (created 4→6, all 6 completed, range-sync batches 0). dist→0.
  • EL off: killed the mock engine for ~6 slots → el_offline: true, execution engine upcheck (Connect) errors, head
    safely halted at the last verified block (no unsafe optimistic import). Restored the EL → el_offline: false, head
    caught up, dist→0.
  • Internet off (node-scoped): iptables DROP on this node's P2P ports only → peers collapsed 47→3, head froze
    (partitioned). Removed the rules → recovered mostly via lookup sync (+3 created, all completed) with a single
    range-head batch at the ~10-slot tail; dist→0; firewall cleanly restored.

@dapplion dapplion changed the title Remove RequestState trait Remove RequestState trait from lookup sync Jun 1, 2026
…move-request-state-trait

# Conflicts:
#	beacon_node/network/src/network_beacon_processor/sync_methods.rs
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/tests/lookups.rs
@dapplion dapplion added the work-in-progress PR is a work-in-progress label Jun 2, 2026
dapplion added 2 commits June 2, 2026 20:35
Brings sigp#9391 up to the latest lookup-sync trait-removal work: concrete
BlockRequest/DataRequest over the reused SingleLookupRequestState<T> state
machine, DataRequestState { WaitingForBlock, Request, NoData } column gating,
seen_timestamp forwarding, and diff minimization vs unstable.
Resolves conflicts from sigp#9379 (Delete unnecessary SyncMessage variants):
- adopt SyncMessage::UnknownParentSidecarHeader (drop UnknownParentDataColumn/
  UnknownParentPartialDataColumn), keeping our unit BlockComponent::Sidecar
  (parent_root is routed separately to handle_unknown_parent)
- keep concrete trait-removal BlockRequest/DataRequestState handlers
Also: move DownloadResult back to its unstable position to minimize diff.
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
DataRequest no longer carries its own PeerSet clone; the column request reads
self.peers (the lookup's pool) directly. Also fix the stale 'starts as None'
comment on data_request and restore upstream's add_child_components comment.
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment on lines +370 to 383
BlockProcessingResult::Imported(_fully_imported, _info) => {
state.on_processing_success()?;
}
BlockProcessingResult::ParentUnknown { .. } => {
return Err(LookupRequestError::BadState(
"data processing returned ParentUnknown".to_owned(),
));
}
BlockProcessingResult::Error { penalty, .. } => {
let peers = state.on_processing_failure()?;
if let Some((action, whom, msg)) = penalty {
whom.apply(action, &peers, msg, cx);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR it would be nice to separate BlockProcessingResult from the data column download path, i.e. introduce a separate DataProcessingResult enum. I think you mentioned that this is a deeper change, at a quick glance it does seems like it'd explode the diff. But I think it would be really nice to tackle that in a future PR

Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Comment thread beacon_node/network/src/sync/block_lookups/single_block_lookup.rs Outdated
Copy link
Copy Markdown
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a quick summary of the changes here for anyone else interested in taking a look:

This PR removes a layer of abstraction across requests for blocks and columns. Blocks and columns follow different request paradigms and abstracting over the two is a bit nonsensical. Handling these different request types explicitly helps make the code easier to understand esp when it comes to reasoning about certain edge cases. It should also help make it easier to extend lookup sync for gloas. Also in the future it may make it easier to implement tree sync. Lookup sync is a pretty confusing state machine, so anything that makes it less confusing is very welcome in my opinion.

The bulk of the changes are in single_block_lookup.rs. The PR introduces two new types

BlockRequest
and
DataRequest

The existing SingleBlockLookup is updated to store BlockRequest and DataRequest fields instead of BlockRequestState and ComponentRequestState

data_request is the data columns request. It is set to WaitingForBlock initially. Once the block is received we can set it to the correct variant, eitherRequest or NoData

Most of SingleBlockLookup implementation is functionally equivalent, just updated to handle the updated fields

One important difference is continue_requests, which is what actually makes this whole state machine clearer/more readable.

Its separated into two loops, the first loop is the block request with its states handled accordingly. and the second loop is the data requests with its states handled accordingly. Note that that we will try to make as much progress as possible in the block loop and then move to the data column loop and try to make progress as well. Once were done with both we'll mark the lookup as completed. Explicit, straightforward and easy to extend for gloas payload envelope lookup. Nice

Another difference is State::Processed contains the actual data type (i.e. block or column). Beforehand we had this dual oracle setup the data could exist in our local oracle or the global oracle

The local oracle: the lookup's own block_request_state.state via
peek_downloaded_data()

The global oracle: fork_choice/da cache

This dual oracle could cause unintended side effects or in the worst case a stalled lookup. Lookup sync was designed to prevent these stalled lookups. Thinking about all the different edge cases that could arise was difficult and made extending/changing lookup sync more difficult than necessary.

With the move to State::Processed holding the actual downloaded value, we now have a single source of truth. We just need to do

  if let Some(block) = self.block_request.state.peek_downloaded_data() {
      let block_epoch = block.slot().epoch(...);
      if block.num_expected_blobs() == 0 { /* NoData */ }
      else if cx.chain.should_fetch_custody_columns(block_epoch) { /* Request */ }
      ...
  }

and dont need to worry about a bunch of different edge cases. As someone who is only now diving deeper into lookup sync, this is a very welcome change.

dapplion added 3 commits June 3, 2026 15:22
- drop misleading 'Block request — always present' comment; reword data_request
- add_child_components Sidecar: clearer comment (no stale 'blob request state')
- drop stale '// None = waiting for block'
- fix continue_requests doc state name (AwaitingProcess, not Downloaded)
- on_data_processing_result doc: custody columns
Copy link
Copy Markdown
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eserilev eserilev added the ready-for-review The code is ready for review label Jun 3, 2026
@eserilev eserilev removed the work-in-progress PR is a work-in-progress label Jun 3, 2026
Not needed: a zero-hash (genesis parent) search just creates a lookup that
fails its download attempts and is dropped. Removes logic not present in
unstable; tests pass without it.
@dapplion dapplion added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jun 3, 2026
@mergify mergify Bot added the queued label Jun 3, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented Jun 3, 2026

Merge Queue Status

This pull request spent 31 minutes 54 seconds in the queue, including 28 minutes 25 seconds running CI.

Required conditions to merge

mergify Bot added a commit that referenced this pull request Jun 3, 2026
@mergify mergify Bot merged commit eab5163 into sigp:unstable Jun 3, 2026
61 of 62 checks passed
@mergify mergify Bot removed the queued label Jun 3, 2026
dapplion added a commit to dapplion/lighthouse that referenced this pull request Jun 3, 2026
- Rename PayloadRequest::NoPayload -> PreGloas; post-Gloas blocks always
  instantiate a payload Request (a payload may exist; FULL/EMPTY is decided by
  whether a FULL child donated peers). Restore envelope-known-to-fork-choice
  skip inside payload_lookup_request.
- Port remaining gloas test guards onto sigp#9391 harness.

Status: gloas lookups 47/55 pass; fulu + phase0 55/55 (no regression). 8
custody_lookup_* gloas tests still fail on peer-attribution semantics — see
report.
dapplion added a commit to dapplion/lighthouse that referenced this pull request Jun 4, 2026
Rebase the gloas lookup-sync work onto sigp#9391's RequestState trait-removal
design: payload-envelope request reuses the generic SingleLookupRequestState,
concrete BlockRequest/DataRequest/PayloadRequest, parent-imported gate against
awaiting_parent: Option<Hash256>. (Some gloas custody-failure tests still fail —
known peer-attribution issue, pushed for visibility.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge. syncing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants