Skip to content

Gloas lookup sync#68

Closed
dapplion wants to merge 16 commits into
unstablefrom
gloas-lookup-sync-fixes
Closed

Gloas lookup sync#68
dapplion wants to merge 16 commits into
unstablefrom
gloas-lookup-sync-fixes

Conversation

@dapplion
Copy link
Copy Markdown
Owner

@dapplion dapplion commented Mar 9, 2026

Summary

Fixes for Gloas lookup sync + PayloadEnvelopesByRoot RPC wiring.

Rebased on current sigp/unstable.

Changes

  • Rewrite single block lookup state machine for Gloas (AwaitingParent, PayloadRequest states)
  • Fix genesis parent check + infinite retry loop
  • Address full parent missing payload (envelope_is_known_to_fork_choice)
  • Fix on_completed_request no-op + clear all peer sets on disconnect
  • Wire PayloadEnvelopesByRoot RPC: replaces the NoRequestNeeded placeholder with actual request sending, response routing, and state machine updates

Test plan

  • Compiles on current unstable
  • Test on local Kurtosis ePBS devnet
  • Verify envelope downloads during single block lookups at head

Comment thread .ai/plans/gloas-lookup-levels.md Outdated
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from f551bd3 to bae92bd Compare March 10, 2026 02:23
@dapplion dapplion requested a review from michaelsproul as a code owner March 10, 2026 02:23
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch 6 times, most recently from a6a37aa to f28fed4 Compare March 16, 2026 17:09
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from f28fed4 to 8f4a5f0 Compare April 2, 2026 04:34
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 8f4a5f0 to d7af091 Compare April 7, 2026 04:28
Copy link
Copy Markdown

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of the new approach and the removal of the Request state trait.
First pass, will continue tomorrow.
Also, curious if you want to wire up the payload processing in this PR or leave it to a future PR. I'm fine with both.

peers,
peer_type,
cx.next_id(),
awaiting_parent.map(AwaitingParent::PreGloas),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing.

Could AwaitingParent be a struct instead of an enum?

struct AwaitingParent {
    parent_hash: Hash256,
    parent_root: Option<ExecutionBlockHash>
}

I find the pattern of creating an AwaitingParent::PreGloas variant by default and adjusting it later very confusing. a gloas parent_hash not containing a parent_root is conceptually confusing as well.

With the struct approach, adding a parent root later on as we figure out its gloas seems clearer to me.

"Block download response"
);

let result = lookup.on_block_download_response(id.req_id, response.map_err(|_| ()), cx);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we are swallowing the errors here and in the other on_*download_response functions.

peer: PeerId,
},
/// Block processing complete
Done {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete instead of Done maybe to be consistent with naming in other parts of the codebase?

if let BlockRequest::Downloading { state, .. } = self {
state.insert_verified_response(result)
} else {
false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something here? Might be hard to debug in a later refactor if we accidentally modify the state and the invalid state error gets swallowed

peer_group: PeerGroup,
},
/// Data sent for processing, awaiting result
Processing {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: re review why we need DataDownloadKind here?

match &mut self.data_request {
DataRequest::WaitingForBlock => {
if let Some(block) = self.block_request.peek_block() {
let block = block.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary clone here?

Duration::ZERO,
)
.map_err(LookupRequestError::SendFailedProcessor)?;
self.block_request = BlockRequest::Processing { block, peer };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we break after this too? processing needs async trigger

}
let kind = data.kind();
let peer_group = peer_group.clone();
self.data_request = DataRequest::Processing { kind, peer_group };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can break after this too?

match &mut self.payload_request {
PayloadRequest::WaitingForBlock => {
if let Some(block) = self.block_request.peek_block() {
let block = block.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary clone again?

state.on_download_start(req_id)?;
}
Ok(LookupRequestResult::NoRequestNeeded(_reason)) => {
// Payload RPC not wired yet — skip download, mark as done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it now right? Can add it in maybe? I'm also fine with deferring that to another PR though

@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 5533da0 to c23b1af Compare April 21, 2026 22:38
@dapplion
Copy link
Copy Markdown
Owner Author

Rebased onto sigp/unstable and squashed to a single commit. Pushed c23b1af56.

Addressed @pawanjay176's comments from 2026-04-18 (17/18):

  • AwaitingParent enum → struct with Option<ExecutionBlockHash> parent_hash + pre_gloas()/post_gloas() constructors (mod.rs:489)
  • DoneComplete across BlockRequest / DataRequest / PayloadRequest; is_done()is_complete() (sbl.rs:118)
  • on_*_download_response handlers now debug! log the error before collapsing to Err(()) (mod.rs:550)
  • insert_verified_response logs before dropping on bad state (sbl.rs:184)
  • Docstring added to DataDownloadKind explaining why it caches the request kind (sbl.rs:203)
  • Docstring on the Arc<RwLock<..>> peer sets explaining sharing with long-running ActiveCustodyRequest (sbl.rs:412)
  • Docstring on is_full_payload for intended usage; conservative false + log when block not downloaded (sbl.rs:470, :475)
  • Docstring on continue_requests loop explaining per-stream semantics (sbl.rs:567)
  • Pass block_root directly; drop the local br (sbl.rs:572)
  • if false dead branch → if state.is_completed() (sbl.rs:574)
  • "Single peer" comment added to block download peer handling (sbl.rs:580)
  • let Some(_) = ... else {}.is_some() + bool flag (sbl.rs:600)
  • break after transitioning to Processing (sbl.rs:673, :733) — "processing needs async trigger"
  • .clone() of Arc<Block> eliminated by extracting (slot, expected_blobs) before the &mut self helper call (sbl.rs:684, :744); helpers renamed create_data_request/create_payload_request

Deferred (per your "fine with deferring that to another PR" note):

  • Wiring payload-envelope processing (sbl.rs:759). Adds a send_payload_for_processing in SyncNetworkContext plus a beacon_processor Work variant; keeping this PR focused on the download path.

Test status (FORK_NAME=gloas): 59/59 lookup tests pass.

Will open the upstream PR against sigp:unstable next.

@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch 3 times, most recently from eb40ff0 to 3ff8491 Compare April 22, 2026 06:56
Rewrites the single block lookup state machine for Gloas, where block, data
(blobs/columns), and execution payload envelope are independent components
that can arrive and import out of order.

- Three additive-only sub-state-machines for block / data / payload streams.
  Peer sets start empty for data/payload and grow as children arrive — the
  parent lookup's completion requirement can widen over time without
  mutating any state machine.
- `AwaitingParent` becomes a struct carrying the child's `parent_block_hash`
  so the parent can be classified empty/full from the child's bid reference.
- Wires `PayloadEnvelopesByRoot` RPC end-to-end through `SyncNetworkContext`:
  request sending, response routing (`SingleLookupReqId::SinglePayloadEnvelope`),
  and integration into `PayloadRequest`. Envelope *processing* is still a TODO;
  only the download path is wired.
- Test rig: serves envelopes from a `network_envelopes_by_root` cache
  populated from the external harness; bumps test validator count to 8 so
  `proposer_lookahead` can populate at the Fulu → Gloas upgrade.
- Enables gloas in `TEST_NETWORK_FORKS`.
- Fixes: genesis parent check, infinite retry loop on repeated download
  failure, no-op in `on_completed_request`, and peer sets not being cleared
  on disconnect.
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 3ff8491 to ebe9fe2 Compare April 22, 2026 07:32
chong-he and others added 8 commits April 30, 2026 04:57
- sigp#9107


  Remove all instances of `test_logger` in the code


Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io>
We are not submitting ptc votes that we produce to our lcoal ptc op pool. So when we are the block producer we don't include our own ptc votes!


  


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Get the Gloas HTTP API tests passing, partly through fixes and partly through disabling tests that don't fit the Gloas paradigm.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
We yolo'd to alpha 7. We're just changing the proposer preference to include dependent root, instead of checkpoint root. This way we can actually construct it within the VC without needing a view of fork choice.


  


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Update the ci workflow to use warpbuild snapshot image and test suit uses `Swatinew/rust-cache` to utilize warpbuild cache


  


Co-Authored-By: lemon <snyxmk@gmail.com>
- Avoid sending 0x00 block hashes for the safe and finalized block hashes post-Gloas.
- Add code to check this inside the mock EL, which will be reached in all Gloas beacon chain tests


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fixes a flaky CI failure in `data_column_reconstruction_at_deadline` where 2 `column_reconstruction` events are emitted instead of the expected 1.


  - Change `queued_column_reconstructions` from `HashMap<Hash256, DelayKey>` to `HashMap<Hash256, Option<DelayKey>>`, where `None` indicates reconstruction was already dispatched.
- On dispatch (`ReadyColumnReconstruction`), set the entry to `None` instead of removing it. This prevents a subsequent gossip column from inserting a fresh reconstruction request into the now-vacant slot.
- Prune stale `None` entries on each dispatch to keep the map bounded.


Co-Authored-By: Josh King <josh@sigmaprime.io>
…gp#9257)

Two audit failures for `hickory-proto` which is used upstream in `libp2p-dns` and `libp2p-mdns`

- https://rustsec.org/advisories/RUSTSEC-2026-0118.html
- https://rustsec.org/advisories/RUSTSEC-2026-0119.html

Tracking Issue: sigp#9258


  Since RUSTSEC-2026-0118 does not even have any non-patched versions available and RUSTSEC-2026-0119 requires a major version bump I think we would need to wait on a release from libp2p in both cases.
So for now, add an ignore for each so we can at least unblock CI


Co-Authored-By: Mac L <mjladson@pm.me>
eserilev and others added 7 commits May 4, 2026 11:33
Allow for the vc to submit its proposer preferences to the network


  


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
)

Addresses issue sigp#9220

The `payload_attestation_data` endpoint returns 400 when no block has been received for the requested slot. This causes the VC to log at CRIT level for what is expected behaviour per spec: validators should simply not submit a payload attestation when no block has been seen.


  - Return 404 (Not Found) instead of 400 from `payload_attestation_data` when no block exists for the slot. This is consistent with other beacon api endpoints.
- Downgrade the VC log from `crit` to `debug` when a 503 is received, since this is an expected no-op per spec.
- Add `BlockNotFound` rejection type to `warp_utils`.
- Add a test asserting the 404 response for an empty slot.


Co-Authored-By: Josh King <josh@sigmaprime.io>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
N/A


  libssl download seems to be failing on [CI](https://github.com/sigp/lighthouse/actions/runs/25346412432/job/74316275231?pr=9126).
This was originally added to unblock CI in sigp#6777, but we may not need this anymore.


Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
We  have a legacy `TestRandom` trait which generates random types for testing and fuzzing.
This function overlaps with `arbitrary` which is used very commonly in the ecosystem.


  Remove `TestRandom` and generate random type instances using `Arbitrary`.


Co-Authored-By: Mac L <mjladson@pm.me>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
@dapplion
Copy link
Copy Markdown
Owner Author

Closing for sigp#9155

@dapplion dapplion closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants