refactor(kona/service): move cancellation out of actors and other testability changes#20948
refactor(kona/service): move cancellation out of actors and other testability changes#20948op-will wants to merge 10 commits into
Conversation
ca6da49 to
f538932
Compare
…t with step; split engine actor Selectively port the cancellation-token ownership changes from #19141 without adopting that PR's Builder / InboundData / init() trait pattern. NodeActor trait: - Replace `start(self, ctx) -> Result<(), Error>` with `step(&mut self) -> Result<(), Error>`. - Drop the StartData associated type and the CancellableContext trait. - The orchestrator macro (`spawn_and_wait!`) now owns the umbrella CancellationToken, the loop, and the select! against `cancelled()`; no actor sees a cancellation primitive. Actor migrations: - Derivation, DelegateDerivation, L1Watcher, Sequencer: drop the cancellation_token field + CancellableContext impl; move loop-local state onto self (tickers, payload-to-seal, last seal duration, etc). - Network: take a live NetworkHandler at construction; build/start the libp2p swarm in RollupNode::start so the constructor stays sync. The NetworkInboundData bundle is deleted in favor of injecting senders individually. - Rpc: become non-generic. Module assembly and the initial server launch move upstream; the actor holds config + modules + handle and relaunches up to RpcBuilder::restart_count on stop. A Drop impl calls ServerHandle::stop() so graceful cancellation shuts the jsonrpsee server cleanly. Engine actor restructure: - The fan-out EngineActor (which only routed EngineActorRequest variants into one of two sub-tasks) is deleted. - EngineProcessor is promoted to EngineActor; EngineRpcProcessor is promoted to EngineRpcActor. Both run as first-class peers under spawn_and_wait!. - EngineProcessingRequest takes over the freed name EngineActorRequest; the old fan-out EngineActorRequest enum and its RpcRequest variant are deleted (the rpc client sends EngineRpcRequest directly). - EngineRequestReceiver and EngineRpcRequestReceiver placeholder traits (testing-only scaffolding per their own doc comments) are deleted. - No more JoinHandle polling, is_finished() checks, or PhantomData generics in the engine actors. RollupNode::start: - Single CancellationToken owned by the macro; no actor receives it. - All cross-actor channels (mpsc + watch) created at the top of start() in one visually-grouped block. - Actor construction broken out into five build_* helpers (build_engine_actors, build_derivation_actor, build_l1_watcher, build_sequencer, build_rpc_actor) plus three private type aliases. - create_engine_actor helper deleted (inlined into build_engine_actors). External callers: - bin/node and examples/gossip: build the swarm handler upstream and spawn a step loop on a tokio task. Fixes a pre-existing bug in bin/node's net subcommand where the prior `network.start(()).await?` blocked forever, making subsequent interval-poll code unreachable. - Network/sequencer integration tests: TestNetworkBuilder::build is now async; TestNetwork holds the four inbound senders individually. - SequencerActor::new tests pass `block_time: 2` to avoid tokio::time::interval(0) panicking. Verification: cargo build -p kona-node-service: 0 errors cargo check --all-targets -p kona-node-service -p kona-node -p example-gossip: 0 errors cargo test --lib -p kona-node-service: 109 passed cargo test --test integration test_p2p_network_conn: passed cargo test --test integration test_sequencer_network_conn: passed
f538932 to
9ef3414
Compare
Introduces RpcServerLauncher + RpcServerHandle traits so the actor's relaunch and shutdown logic can be unit-tested with a controllable mock instead of a real jsonrpsee server. The production path is unchanged: a new JsonrpseeServerLauncher wraps RpcBuilder and produces a real ServerHandle. Adds six unit tests covering the restart budget, failed relaunches, and the Drop-stops-handle path. Also documents that RollupNode shutdown is unordered, and that L1WatcherActor's builder intentionally returns impl NodeActor because its block-stream type is unnameable.
The accessor's name shadowed the field of the same name, making self.engine_config() and &self.engine_config visually ambiguous at the only call site. Since the accessor only cloned the field, callers can clone inline.
Constructing the actor with an unstarted handler causes step() to hang or fail on the first gossip poll. The constructor stays sync to keep NodeActor minimal, so the live-handler invariant lives in the caller — document that explicitly.
| /// Returning `Ok(())` indicates the actor is ready to be stepped again. | ||
| /// Returning `Err(_)` indicates the actor has encountered a fatal | ||
| /// condition and should not be stepped further. | ||
| async fn step(&mut self) -> Result<(), Self::Error>; |
There was a problem hiding this comment.
Review Note: I kept step() from the reference PR, as it's good enough for now.
If/when we separate actors from tasks, we can change this trait to more closely fit actors.
The doc comment referenced crate::service::node::RollupNode, but service is a pub(crate) module so the path is not part of the documented graph. RollupNode is re-exported from the crate root; link there instead.
DelegateDerivationActor used to take its two external dependencies — the sync-status fetch client and the L1 chain provider — by concrete type, so the actor's validation logic (sync-status fetch, L1 consistency check, conditional forwarding) could not be exercised without standing up real HTTP and RPC clients. That is why the actor has no unit tests today. Introduces a one-method DerivationDelegateProvider trait, generalizes the L1 provider to any kona_derive::ChainProvider, and threads the two generics through the actor and its enum wrapper in RollupNode. The production path is unchanged: DerivationDelegateClient implements the new trait and AlloyChainProvider already implements ChainProvider. No new tests in this commit — the DI seam alone unlocks future test work without committing to a specific test matrix here.
EngineRpcActor previously held an Arc<EngineClient_>, giving it access to the full Engine API surface — including mutation methods like forkchoiceUpdated, newPayload, and getPayload that an RPC query actor must never call. Constrain the actor to a new EngineRpcClient trait exposing only the two methods EngineQueries::handle actually needs: l2_block_by_label and get_storage_hash (a narrowed projection of get_proof that returns just the storage hash field used to compute the L2-to-L1 message-passer storage root pre-Isthmus). A blanket impl of EngineRpcClient for every T: EngineClient keeps production wiring (OpEngineClient) unchanged; only the actor's static type bound has tightened. Tests can now implement the two-method trait directly instead of the entire EngineClient/OpEngineApi surface. Also rename the field engine_client -> engine_rpc_client to match the new narrowed responsibility.
| let mut delegated_derivation_ticker = | ||
| time::interval(Self::DERIVATION_DELEGATE_POLL_INTERVAL); |
There was a problem hiding this comment.
Could argue this should be a new(...) argument, but unit tests can just create DelegateDerivationActor via struct literal if they need to change it.
| } | ||
| } | ||
| async fn step(&mut self) -> Result<(), Self::Error> { | ||
| let request = self.inbound_request_rx.recv().await.ok_or_else(|| { |
There was a problem hiding this comment.
Note: removing info!(target: "derivation", "Starting derivation"); because
- It doesn't fit into this
step(...)redesign - We are not necessarily starting derivation, as we need to wait for a request (i.e.
ELSyncCompleted)
| { | ||
| type Error = EngineError; | ||
|
|
||
| async fn step(&mut self) -> Result<(), Self::Error> { |
There was a problem hiding this comment.
Design note: start(...) used to create two subtasks:
- a "processing handler" that is the actual engine (in engine_request_processor.rs)
- an RPC handler (in rpc_request_processor.rs)
Instead of having start(...) / step(...) proxy requests to the subtasks, I just spun out EngineRpcActor as its own actor since it doesn't share any meaningful state with EngineActor. This has the added benefit of removing request queue contention between RPC requests and actual engine requests.
This looks like a big delta, but really it's just inlining the contents of engine_request_processor.rs as the new EngineActor and removing the proxy logic.
| } | ||
| L1WatcherQueries::L1State(sender) => { | ||
| let current_l1 = *latest_head.borrow(); | ||
| let current_l1 = *self.latest_head.borrow(); |
There was a problem hiding this comment.
This used to subscribe to its own latest_head, which is unnecessary because we can just borrow the value from the publisher
…tency All receiver fields now share the _rx suffix: - signer -> unsafe_block_signer_rx - p2p_rpc -> p2p_rpc_rx - admin_rpc -> admin_query_rx publish_rx and unsafe_block_rx already followed the convention. Also rename the local select! binding signer -> unsafe_block_signer so it no longer collides with self.handler.signer, which represents a different concept (the local block-signing key, not an address). Call sites are positional so no caller updates required.
…ractly Rewrite the constructor doc comment so it describes the precondition (the libp2p swarm must already be built and started) rather than naming the specific NetworkBuilder method chain. The trade-off rationale — sync constructor over an init() trait method — is retained.
test_launch_no_modules, test_launch_with_modules, and test_real_launcher_smoke all bound real localhost sockets via jsonrpsee. Unit tests should not spin up actual servers; the mock-driven RpcActor tests still cover the restart/stop logic, and the production JsonrpseeServerLauncher is exercised end-to-end by integration tests at the RollupNode level. With the only remaining users of the free `launch` function now inside launcher.rs itself, drop its pub(crate) visibility too.
Summary
Selectively port the cancellation-token ownership changes from
ethereum-optimism/optimism#19141
into the kona node service, without adopting that PR's Builder /
InboundData / trait-level
init()pattern. Pulls in the within-actortestability and restructuring improvements; keeps direct dependency
injection into constructors instead of routing everything through
builder structs.
NodeActor trait
start(self, ctx) -> Result<(), Error>withstep(&mut self) -> Result<(), Error>.StartDataassociated type and theCancellableContexttrait entirely.spawn_and_wait!) now owns the umbrellaCancellationToken, the loop, and theselect!againstcancelled(); no actor sees a cancellation primitive.Actor migrations
cancellation_tokenfield andCancellableContextimpl; move loop-local state ontoself(tickers, payload-to-seal, last seal duration, etc).DerivationDelegateProvidertrait, and the L1 provider as anykona_derive::ChainProvider, instead of the concreteDerivationDelegateClientandAlloyChainProvider. Unlocks unit-testing of sync-status validation; production wiring unchanged.NetworkHandlerat construction; build/start the libp2p swarm inRollupNode::startso the actor constructor stays sync. TheNetworkInboundDatabundle is deleted in favor of injecting senders individually.RpcBuilder::restart_counton stop. ADropimpl callsRpcServerHandle::stop()so graceful cancellation shuts the jsonrpsee server cleanly.RpcActoris generic over a newRpcServerLaunchertrait (with a productionJsonrpseeServerLauncherimpl) so the restart/stop logic can be unit-tested with a controllable mock handle.Engine actor restructure
EngineActor(which only routedEngineActorRequestvariants into one of two sub-tasks) is deleted.EngineProcessor→EngineActor;EngineRpcProcessor→EngineRpcActor. Both run as first-class peers underspawn_and_wait!.EngineProcessingRequesttakes over the freed nameEngineActorRequest; the old fan-outEngineActorRequestenum and itsRpcRequestvariant are deleted (the rpc client sendsEngineRpcRequestdirectly).EngineRequestReceiverandEngineRpcRequestReceiverplaceholder traits (testing-only scaffolding per their own doc comments) are deleted.JoinHandlepolling,is_finished()checks, orPhantomDatagenerics in the engine actors.EngineRpcActoris constrained to a new read-onlyEngineRpcClienttrait (inkona-engine, blanket-impl'd for everyT: EngineClient) instead of the fullEngineClient. The trait exposes onlyl2_block_by_labeland a narrowedget_storage_hashprojection ofget_proof. The actor cannot reach Engine API mutation methods at the type level. Field renamedengine_client→engine_rpc_clientto match.RollupNode::start
CancellationTokenowned by the macro; no actor receives it.start()in one visually-grouped block.build_*helpers:build_engine_actors,build_derivation_actor,build_l1_watcher,build_sequencer,build_rpc_actor— plus four private type aliases.create_engine_actorhelper deleted (inlined intobuild_engine_actors).start()docs gain a short "Shutdown" section: shutdown is unordered, channel-closed errors at teardown are expected.External callers
bin/nodeandexamples/gossip: build the swarm handler upstream and spawn a step loop on a tokio task. Fixes a pre-existing bug inbin/node's net subcommand where the priornetwork.start(()).await?blocked forever, making subsequent interval-poll code unreachable.TestNetworkBuilder::buildis now async;TestNetworkholds the four inbound senders individually.SequencerActor::newtests passblock_time: 2to avoidtokio::time::interval(0)panicking.Follow-ups (to file as separate issues)
RpcBuilder::restart_countis currently interpreted as a lifetime cap across the actor's lifetime; a long-running node whose RPC port flaps occasionally over months will eventually exhaust it and shut down. The more useful shape is a rate (e.g. N restarts per unit time), so transient flapping doesn't permanently disable the RPC. Track separately; out of scope here.Testing
cargo build -p kona-node-service— 0 errorscargo check --all-targets -p kona-node-service -p kona-node -p example-gossip -p kona-engine— 0 errorscargo test --lib -p kona-node-service— 112 passedcargo test --test integration test_p2p_network_conn— passedcargo test --test integration test_sequencer_network_conn— passedcargo +nightly fmt -p kona-node-service -p kona-engine— cleancargo clippy -p kona-node-service -p kona-engine --all-targets— no new warningsRUSTDOCFLAGS="--cfg docsrs -D warnings" cargo +nightly doc -p kona-node-service -p kona-engine --all-features --no-deps --document-private-items— clean