Remove RequestState trait from lookup sync#9391
Conversation
… classification Combine PR sigp#9327's processing-result classification redesign (BlockProcessingResult becomes Imported/ParentUnknown/Error with producer-side penalty classification in network_beacon_processor) with PR sigp#9155's RequestState trait removal (concrete BlockRequest/DataRequest state machine in single_block_lookup.rs, concrete download/ processing handlers in block_lookups/mod.rs). The common.rs RequestState trait is deleted. ParentUnknown remains reactive (surfaced from the processing result), as in unstable. All gloas constructs are stripped.
Test 1 — Kurtosis devnetSetup: 4× Lighthouse + geth, 3s slots, all running the branch binary af82c75, via ethereum-package 5.0.1. Method: Stopped cl-4's beacon node at head 15, let the network advance ~7 slots to 22, restarted it. Result: Recovered via lookup sync — sync_lookups_created_total 7, sync_lookups_completed_total 7 (0 failed), and all Test 2 — Live Ethereum mainnetBranch binary af82c75 checkpoint-synced to real mainnet head with a mock EL (returns VALID → fully verified, ~720 MB
|
…move-request-state-trait # Conflicts: # beacon_node/network/src/network_beacon_processor/sync_methods.rs # beacon_node/network/src/sync/block_lookups/mod.rs # beacon_node/network/src/sync/manager.rs # beacon_node/network/src/sync/tests/lookups.rs
Brings sigp#9391 up to the latest lookup-sync trait-removal work: concrete BlockRequest/DataRequest over the reused SingleLookupRequestState<T> state machine, DataRequestState { WaitingForBlock, Request, NoData } column gating, seen_timestamp forwarding, and diff minimization vs unstable.
Resolves conflicts from sigp#9379 (Delete unnecessary SyncMessage variants): - adopt SyncMessage::UnknownParentSidecarHeader (drop UnknownParentDataColumn/ UnknownParentPartialDataColumn), keeping our unit BlockComponent::Sidecar (parent_root is routed separately to handle_unknown_parent) - keep concrete trait-removal BlockRequest/DataRequestState handlers Also: move DownloadResult back to its unstable position to minimize diff.
DataRequest no longer carries its own PeerSet clone; the column request reads self.peers (the lookup's pool) directly. Also fix the stale 'starts as None' comment on data_request and restore upstream's add_child_components comment.
| BlockProcessingResult::Imported(_fully_imported, _info) => { | ||
| state.on_processing_success()?; | ||
| } | ||
| BlockProcessingResult::ParentUnknown { .. } => { | ||
| return Err(LookupRequestError::BadState( | ||
| "data processing returned ParentUnknown".to_owned(), | ||
| )); | ||
| } | ||
| BlockProcessingResult::Error { penalty, .. } => { | ||
| let peers = state.on_processing_failure()?; | ||
| if let Some((action, whom, msg)) = penalty { | ||
| whom.apply(action, &peers, msg, cx); | ||
| } | ||
| } |
There was a problem hiding this comment.
In a future PR it would be nice to separate BlockProcessingResult from the data column download path, i.e. introduce a separate DataProcessingResult enum. I think you mentioned that this is a deeper change, at a quick glance it does seems like it'd explode the diff. But I think it would be really nice to tackle that in a future PR
There was a problem hiding this comment.
I wrote a quick summary of the changes here for anyone else interested in taking a look:
This PR removes a layer of abstraction across requests for blocks and columns. Blocks and columns follow different request paradigms and abstracting over the two is a bit nonsensical. Handling these different request types explicitly helps make the code easier to understand esp when it comes to reasoning about certain edge cases. It should also help make it easier to extend lookup sync for gloas. Also in the future it may make it easier to implement tree sync. Lookup sync is a pretty confusing state machine, so anything that makes it less confusing is very welcome in my opinion.
The bulk of the changes are in single_block_lookup.rs. The PR introduces two new types
BlockRequest
and
DataRequest
The existing SingleBlockLookup is updated to store BlockRequest and DataRequest fields instead of BlockRequestState and ComponentRequestState
data_request is the data columns request. It is set to WaitingForBlock initially. Once the block is received we can set it to the correct variant, eitherRequest or NoData
Most of SingleBlockLookup implementation is functionally equivalent, just updated to handle the updated fields
One important difference is continue_requests, which is what actually makes this whole state machine clearer/more readable.
Its separated into two loops, the first loop is the block request with its states handled accordingly. and the second loop is the data requests with its states handled accordingly. Note that that we will try to make as much progress as possible in the block loop and then move to the data column loop and try to make progress as well. Once were done with both we'll mark the lookup as completed. Explicit, straightforward and easy to extend for gloas payload envelope lookup. Nice
Another difference is State::Processed contains the actual data type (i.e. block or column). Beforehand we had this dual oracle setup the data could exist in our local oracle or the global oracle
The local oracle: the lookup's own block_request_state.state via
peek_downloaded_data()
The global oracle: fork_choice/da cache
This dual oracle could cause unintended side effects or in the worst case a stalled lookup. Lookup sync was designed to prevent these stalled lookups. Thinking about all the different edge cases that could arise was difficult and made extending/changing lookup sync more difficult than necessary.
With the move to State::Processed holding the actual downloaded value, we now have a single source of truth. We just need to do
if let Some(block) = self.block_request.state.peek_downloaded_data() {
let block_epoch = block.slot().epoch(...);
if block.num_expected_blobs() == 0 { /* NoData */ }
else if cx.chain.should_fetch_custody_columns(block_epoch) { /* Request */ }
...
}
and dont need to worry about a bunch of different edge cases. As someone who is only now diving deeper into lookup sync, this is a very welcome change.
- drop misleading 'Block request — always present' comment; reword data_request - add_child_components Sidecar: clearer comment (no stale 'blob request state') - drop stale '// None = waiting for block' - fix continue_requests doc state name (AwaitingProcess, not Downloaded) - on_data_processing_result doc: custody columns
Not needed: a zero-hash (genesis parent) search just creates a lookup that fails its download attempts and is dropped. Removes logic not present in unstable; tests pass without it.
Merge Queue Status
This pull request spent 31 minutes 54 seconds in the queue, including 28 minutes 25 seconds running CI. Required conditions to merge
|
- Rename PayloadRequest::NoPayload -> PreGloas; post-Gloas blocks always instantiate a payload Request (a payload may exist; FULL/EMPTY is decided by whether a FULL child donated peers). Restore envelope-known-to-fork-choice skip inside payload_lookup_request. - Port remaining gloas test guards onto sigp#9391 harness. Status: gloas lookups 47/55 pass; fulu + phase0 55/55 (no regression). 8 custody_lookup_* gloas tests still fail on peer-attribution semantics — see report.
Rebase the gloas lookup-sync work onto sigp#9391's RequestState trait-removal design: payload-envelope request reuses the generic SingleLookupRequestState, concrete BlockRequest/DataRequest/PayloadRequest, parent-imported gate against awaiting_parent: Option<Hash256>. (Some gloas custody-failure tests still fail — known peer-attribution issue, pushed for visibility.)
RequestStatetrait refactor.#9155 is quite complex and sensitive so I believe it's best to review the refactor first without any of the Gloas noise.
This PR combines PR #9327's processing-result classification redesign (BlockProcessingResult becomes Imported/ParentUnknown/Error with producer-side penalty classification in network_beacon_processor) with PR #9155's RequestState trait removal (concrete BlockRequest/DataRequest state machine in single_block_lookup.rs, concrete download/ processing handlers in block_lookups/mod.rs). The common.rs RequestState trait is deleted. ParentUnknown remains reactive (surfaced from the processing result), as in unstable. All gloas constructs are stripped.
AI disclosure
AI ported the trait request change of #9155 to sigp/unstable. Then fixed by hand ~50% of the code to reduce the diff by half. All lines have been reviewed carefully by me