Gloas lookup sync#68
Conversation
f551bd3 to
bae92bd
Compare
a6a37aa to
f28fed4
Compare
f28fed4 to
8f4a5f0
Compare
8f4a5f0 to
d7af091
Compare
pawanjay176
left a comment
There was a problem hiding this comment.
I like the simplicity of the new approach and the removal of the Request state trait.
First pass, will continue tomorrow.
Also, curious if you want to wire up the payload processing in this PR or leave it to a future PR. I'm fine with both.
| peers, | ||
| peer_type, | ||
| cx.next_id(), | ||
| awaiting_parent.map(AwaitingParent::PreGloas), |
There was a problem hiding this comment.
This is a bit confusing.
Could AwaitingParent be a struct instead of an enum?
struct AwaitingParent {
parent_hash: Hash256,
parent_root: Option<ExecutionBlockHash>
}I find the pattern of creating an AwaitingParent::PreGloas variant by default and adjusting it later very confusing. a gloas parent_hash not containing a parent_root is conceptually confusing as well.
With the struct approach, adding a parent root later on as we figure out its gloas seems clearer to me.
| "Block download response" | ||
| ); | ||
|
|
||
| let result = lookup.on_block_download_response(id.req_id, response.map_err(|_| ()), cx); |
There was a problem hiding this comment.
I don't get why we are swallowing the errors here and in the other on_*download_response functions.
| peer: PeerId, | ||
| }, | ||
| /// Block processing complete | ||
| Done { |
There was a problem hiding this comment.
Complete instead of Done maybe to be consistent with naming in other parts of the codebase?
| if let BlockRequest::Downloading { state, .. } = self { | ||
| state.insert_verified_response(result) | ||
| } else { | ||
| false |
There was a problem hiding this comment.
Should we log something here? Might be hard to debug in a later refactor if we accidentally modify the state and the invalid state error gets swallowed
| peer_group: PeerGroup, | ||
| }, | ||
| /// Data sent for processing, awaiting result | ||
| Processing { |
There was a problem hiding this comment.
Note to self: re review why we need DataDownloadKind here?
| match &mut self.data_request { | ||
| DataRequest::WaitingForBlock => { | ||
| if let Some(block) = self.block_request.peek_block() { | ||
| let block = block.clone(); |
| Duration::ZERO, | ||
| ) | ||
| .map_err(LookupRequestError::SendFailedProcessor)?; | ||
| self.block_request = BlockRequest::Processing { block, peer }; |
There was a problem hiding this comment.
can't we break after this too? processing needs async trigger
| } | ||
| let kind = data.kind(); | ||
| let peer_group = peer_group.clone(); | ||
| self.data_request = DataRequest::Processing { kind, peer_group }; |
| match &mut self.payload_request { | ||
| PayloadRequest::WaitingForBlock => { | ||
| if let Some(block) = self.block_request.peek_block() { | ||
| let block = block.clone(); |
| state.on_download_start(req_id)?; | ||
| } | ||
| Ok(LookupRequestResult::NoRequestNeeded(_reason)) => { | ||
| // Payload RPC not wired yet — skip download, mark as done |
There was a problem hiding this comment.
We have it now right? Can add it in maybe? I'm also fine with deferring that to another PR though
5533da0 to
c23b1af
Compare
|
Rebased onto Addressed @pawanjay176's comments from 2026-04-18 (17/18):
Deferred (per your "fine with deferring that to another PR" note):
Test status (FORK_NAME=gloas): 59/59 lookup tests pass. Will open the upstream PR against |
eb40ff0 to
3ff8491
Compare
Rewrites the single block lookup state machine for Gloas, where block, data (blobs/columns), and execution payload envelope are independent components that can arrive and import out of order. - Three additive-only sub-state-machines for block / data / payload streams. Peer sets start empty for data/payload and grow as children arrive — the parent lookup's completion requirement can widen over time without mutating any state machine. - `AwaitingParent` becomes a struct carrying the child's `parent_block_hash` so the parent can be classified empty/full from the child's bid reference. - Wires `PayloadEnvelopesByRoot` RPC end-to-end through `SyncNetworkContext`: request sending, response routing (`SingleLookupReqId::SinglePayloadEnvelope`), and integration into `PayloadRequest`. Envelope *processing* is still a TODO; only the download path is wired. - Test rig: serves envelopes from a `network_envelopes_by_root` cache populated from the external harness; bumps test validator count to 8 so `proposer_lookahead` can populate at the Fulu → Gloas upgrade. - Enables gloas in `TEST_NETWORK_FORKS`. - Fixes: genesis parent check, infinite retry loop on repeated download failure, no-op in `on_completed_request`, and peer sets not being cleared on disconnect.
3ff8491 to
ebe9fe2
Compare
We are not submitting ptc votes that we produce to our lcoal ptc op pool. So when we are the block producer we don't include our own ptc votes! Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Get the Gloas HTTP API tests passing, partly through fixes and partly through disabling tests that don't fit the Gloas paradigm. Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
We yolo'd to alpha 7. We're just changing the proposer preference to include dependent root, instead of checkpoint root. This way we can actually construct it within the VC without needing a view of fork choice. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Update the ci workflow to use warpbuild snapshot image and test suit uses `Swatinew/rust-cache` to utilize warpbuild cache Co-Authored-By: lemon <snyxmk@gmail.com>
- Avoid sending 0x00 block hashes for the safe and finalized block hashes post-Gloas. - Add code to check this inside the mock EL, which will be reached in all Gloas beacon chain tests Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fixes a flaky CI failure in `data_column_reconstruction_at_deadline` where 2 `column_reconstruction` events are emitted instead of the expected 1. - Change `queued_column_reconstructions` from `HashMap<Hash256, DelayKey>` to `HashMap<Hash256, Option<DelayKey>>`, where `None` indicates reconstruction was already dispatched. - On dispatch (`ReadyColumnReconstruction`), set the entry to `None` instead of removing it. This prevents a subsequent gossip column from inserting a fresh reconstruction request into the now-vacant slot. - Prune stale `None` entries on each dispatch to keep the map bounded. Co-Authored-By: Josh King <josh@sigmaprime.io>
…gp#9257) Two audit failures for `hickory-proto` which is used upstream in `libp2p-dns` and `libp2p-mdns` - https://rustsec.org/advisories/RUSTSEC-2026-0118.html - https://rustsec.org/advisories/RUSTSEC-2026-0119.html Tracking Issue: sigp#9258 Since RUSTSEC-2026-0118 does not even have any non-patched versions available and RUSTSEC-2026-0119 requires a major version bump I think we would need to wait on a release from libp2p in both cases. So for now, add an ignore for each so we can at least unblock CI Co-Authored-By: Mac L <mjladson@pm.me>
Allow for the vc to submit its proposer preferences to the network Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
) Addresses issue sigp#9220 The `payload_attestation_data` endpoint returns 400 when no block has been received for the requested slot. This causes the VC to log at CRIT level for what is expected behaviour per spec: validators should simply not submit a payload attestation when no block has been seen. - Return 404 (Not Found) instead of 400 from `payload_attestation_data` when no block exists for the slot. This is consistent with other beacon api endpoints. - Downgrade the VC log from `crit` to `debug` when a 503 is received, since this is an expected no-op per spec. - Add `BlockNotFound` rejection type to `warp_utils`. - Add a test asserting the 404 response for an empty slot. Co-Authored-By: Josh King <josh@sigmaprime.io> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
N/A libssl download seems to be failing on [CI](https://github.com/sigp/lighthouse/actions/runs/25346412432/job/74316275231?pr=9126). This was originally added to unblock CI in sigp#6777, but we may not need this anymore. Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
We have a legacy `TestRandom` trait which generates random types for testing and fuzzing. This function overlaps with `arbitrary` which is used very commonly in the ecosystem. Remove `TestRandom` and generate random type instances using `Arbitrary`. Co-Authored-By: Mac L <mjladson@pm.me> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
|
Closing for sigp#9155 |
Summary
Fixes for Gloas lookup sync + PayloadEnvelopesByRoot RPC wiring.
Rebased on current
sigp/unstable.Changes
NoRequestNeededplaceholder with actual request sending, response routing, and state machine updatesTest plan