perf - Reduce TSO waker churn and quantify impact with Criterion#529
perf - Reduce TSO waker churn and quantify impact with Criterion#529mingley wants to merge 3 commits intotikv:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @mingley! |
📝 WalkthroughWalkthroughAdds a Criterion micro-benchmark and documentation for TSO waker policies, and modifies timestamp oracle logic to observe batch sizes and to conditionally wake or register wakers based on pending-queue fullness transitions. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pd/timestamp.rs (1)
78-82:⚠️ Potential issue | 🟡 MinorBackground task errors are silently discarded due to missing
JoinHandlehandling.The
Result<()>return type and explicitOk(())at line 116 align cleanly with the?usage inside the function. However, theJoinHandlereturned bytokio::spawn(run_tso(...))at line 62 is never stored or awaited. Since errors can occur at both thepd_client.tso()call (line 99) and withinallocate_timestamps()(line 105), failures in the background task will go unnoticed and the connection closure will only be discovered when callers receive a channel-closed error instead of the root cause.Consider storing the
JoinHandleand handling its potential error, or spawn a task that logs/propagates failures.
🧹 Nitpick comments (3)
benches/tso_waker_policy.rs (2)
20-36: The "old" and "new" response benchmarks have asymmetric work, which is expected but worth noting.In
response_policy_old,wake()is called unconditionally on every iteration, while inresponse_policy_new, it's called only on the full→non-full transition (~once per 1024 iterations). The reported speedup primarily measures the cost of not callingwake(), rather than the overhead of the conditional check itself. This is fine for validating the optimization's effect, but the doc and PR description should be clear that the speedup reflects the amortized skip rate under this specific simulation pattern.Also applies to: 38-57
9-11: Consider documenting whatFULL_EVERYandFULL_WINDOWrepresent.These simulation parameters control how often the queue becomes full in the benchmark, directly affecting the measured speedup ratio. A brief comment explaining their role would help future readers understand and tune the benchmark.
doc/tso_waker_criterion.md (1)
1-58: Consider noting that results should be re-run when the benchmark or production code changes.Hardcoded benchmark results in committed documentation risk becoming stale as the code evolves. Consider adding a note that these numbers are a point-in-time snapshot and should be re-collected after significant changes to
timestamp.rsor the benchmark itself.
Signed-off-by: Michael Ingley <mingley@linkedin.com>
c2fdbf5 to
a66b5e6
Compare
Signed-off-by: Michael Ingley <mingley@linkedin.com>
|
Addressed the CodeRabbit feedback in commit
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Signed-off-by: Michael Ingley <mingley@linkedin.com>
|
Addressed local review finding #1 in
|
Summary
This PR reduces unnecessary wake/waker churn in the TSO path and adds reproducible Criterion benchmarks with absolute latency numbers.
Detailed, file-by-file changes:
Cargo.tomlcriterion = "0.5"to[dev-dependencies].[[bench]]name = "tso_waker_policy"harness = falsebenches/tso_waker_policy.rs(new)FULL_EVERY: reset cadence to “full queue”FULL_WINDOW: consecutive full iterations inside each windowresponse/old: unconditionalwake()response/new: wake only on full -> non-full transitionregister/old: unconditionalregister()in no-request branchregister/new: register only when queue is fullsrc/pd/timestamp.rsTimestampOracle::new:run_tso(...)in spawned async block that logs errors viawarn!.run_tsostream loop:while let Some(Ok(resp))towhile let Some(resp)+let resp = resp?;.should_wake_senderand only wakes when pending queue transitions from full to non-full.self_waker.register(cx.waker())in mutex-lock contention branch.self_wakeronly when blocked by full pending queue.poll_recv’s receiver waker when queue is not full.observe_tso_batch(requests.len())when creating a TSO batch request.doc/tso_waker_criterion.md(new)Criterion absolute latency snapshot (100 samples, 2s warmup, 6s measurement):
response/old:[3.2519 ns 3.2712 ns 3.2926 ns]response/new:[763.41 ps 766.39 ps 769.43 ps]register/old:[2.3768 ns 2.3819 ns 2.3874 ns]register/new:[286.76 ps 287.51 ps 288.27 ps]Median speedups:
4.27x8.28xTesting Done
Executed locally on branch
mingley/tso-waker-criterion:cargo fmt --all -- --checkcargo clippy --all-targets --all-features -- -D warningscargo test53 passed, 0 failed(unit tests) and49doc tests passed.cargo bench --bench tso_waker_policy -- --noplotdoc/tso_waker_criterion.md.Summary by CodeRabbit
Chores
Documentation
Performance