PlainNettyTransport.close: use quietPeriod=0 in shutdownGracefully#474
Open
codexcoder21 wants to merge 1 commit into
Open
Conversation
c0cffe2 to
8d636b3
Compare
…efully PlainNettyTransport.close() and QuicTransport.close() chain the EventLoopGroup.shutdownGracefully() futures through CompletableFuture.allOf, so the caller's close() future doesn't resolve until shutdownGracefully does. The no-arg call falls through to Netty's defaults (quietPeriod=2s, timeout=15s). The 2-second quiet period keeps the loop alive in case more work is submitted -- but by the time close() runs, every channel is closed and the transport is marked closed, so no further work can be submitted to either group. The 2-second wait is pure latency on every close. Pass quietPeriod=0 explicitly so the loop terminates as soon as the run-loop sees the shutdown flag. The 5-second timeout argument still bounds how long shutdownGracefully will wait for the loop to actually exit. Adds PlainNettyTransportShutdownTimingTest covering both single-close and 20-cycle scenarios. Both tests fail on the unfixed code (single close ~2 s, 20-cycle loop >40 s) and pass after the fix (single close ~5 ms, 20-cycle loop ~330 ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8d636b3 to
117377c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PlainNettyTransport.close()currently callsworkerGroup.shutdownGracefully()andbossGroup.shutdownGracefully()with no arguments. That falls through to Netty's defaults ofquietPeriod = 2 SECONDS,timeout = 15 SECONDS. After PR #412 chained the returned futures viaCompletableFuture.allOf(...).thenApply { }, the caller'sclose()future no longer resolves until thoseshutdownGracefullyfutures resolve — which means everyclose()blocks for the full 2-second quiet period, even when there is no traffic in flight to drain.This patch passes an explicit
quietPeriod = 0to both calls. The event loop terminates as soon as its run-loop notices the shutdown flag, the returned future resolves in tens of milliseconds, and the caller-can-wait correctness contract introduced in #412 is preserved.Why the current behaviour is wrong for typical callers
The quiet period is the time
shutdownGracefullykeeps the event loop alive AFTER its last task finished, in case more work is submitted. Onceclose()has been called on this transport:.close()invoked on it.closed = trueis set, solisten()anddial()throwLibp2pException.workerGrouporbossGroup.Waiting 2 seconds for "more work that isn't coming" is pure latency. The
timeoutargument (which I keep at 5 seconds) is the meaningful upper bound — it caps how longshutdownGracefullywill wait for the loop to actually exit once shutdown has been requested.Cumulative impact downstream
The 2-second per-close latency compounds badly for callers that drive many short-lived host lifecycles back-to-back, e.g. test fixtures that create-and-tear-down a host per iteration. A consumer
@Timeout(30)test of 20 host create/close cycles times out after 12-13 iterations (each ~2 s of quiet-period wait plus iteration overhead), even though every individualclose()is doing exactly what it was asked to do.That is what surfaced as 12 timed-out tests in CodexCoder21Organization/UrlResolver run
7f19e875, once a downstream classpath change started loading post-#412 classes instead of the pre-#412 classes for the first time. The full investigation is in the report on the run, but the short version is: every test that creates manyUrlProtocol2instances in a loop hits the cumulative quiet-period budget exhaustion this PR fixes.Regression test
PlainNettyTransportShutdownTimingTestexercises the contract directly. Both tests pass with this PR's fix and FAIL ondevelopHEAD:developsingle close should resolve well below the default 2 second quiet period20 sequential listen-then-close cycles complete in under 5 secondsThe 1-second budget on the single-close test is deliberately conservative against any future change that might restore a smaller-but-nonzero quiet period — even a hypothetical 1 s default would fail this test decisively.
Verification
(Pre-fix: 2 tests, 2 failed — matching the timing data above.)
Risk
Minimal. The change tightens shutdown latency without changing what state shutdown leaves behind. Any caller that genuinely relies on a 2-second quiet period to absorb in-flight tasks should already be calling
close()only after explicitly draining its traffic — and if it does have an unflushed task at close time, the 5-secondtimeoutargument still gives the loop time to drain it before the future completes.🤖 Generated with Claude Code