perf - Reduce TSO waker churn and quantify impact with Criterion by mingley · Pull Request #529 · tikv/client-rust

mingley · 2026-02-06T05:39:47Z

Summary

This PR reduces unnecessary wake/waker churn in the TSO path and adds reproducible Criterion benchmarks with absolute latency numbers.

Detailed, file-by-file changes:

Cargo.toml

Added criterion = "0.5" to [dev-dependencies].
Added bench target:
- [[bench]]
- name = "tso_waker_policy"
- harness = false
Why: enable reproducible micro-benchmarking for the TSO wake/register policy changes.

benches/tso_waker_policy.rs (new)

Added a dedicated Criterion benchmark for old vs new policy behavior.
Added benchmark constants and comments documenting simulation semantics:
- FULL_EVERY: reset cadence to “full queue”
- FULL_WINDOW: consecutive full iterations inside each window
Benchmarks included:
- response/old: unconditional wake()
- response/new: wake only on full -> non-full transition
- register/old: unconditional register() in no-request branch
- register/new: register only when queue is full
Why: directly quantify policy overhead changes in isolation.

src/pd/timestamp.rs

TimestampOracle::new:
- wrapped run_tso(...) in spawned async block that logs errors via warn!.
- Why: preserve root-cause visibility if the background task exits with error.
run_tso stream loop:
- changed from while let Some(Ok(resp)) to while let Some(resp) + let resp = resp?;.
- Why: propagate gRPC stream errors instead of silently treating them as normal termination.
Response-side wake policy:
- now computes should_wake_sender and only wakes when pending queue transitions from full to non-full.
- Why: avoid redundant wake calls when capacity state did not change.
Request-stream lock contention path:
- removed redundant self_waker.register(cx.waker()) in mutex-lock contention branch.
- Why: avoid registration churn in a path that should simply repoll after lock availability.
No-request branch policy:
- now registers self_waker only when blocked by full pending queue.
- Why: rely on poll_recv’s receiver waker when queue is not full.
Batch metric emission:
- added observe_tso_batch(requests.len()) when creating a TSO batch request.
- Why: keep existing batch-size metric aligned with actual request grouping point.

doc/tso_waker_criterion.md (new)

Added benchmark guide and reproducible run command.
Added explicit “point-in-time snapshot” note and rerun guidance when code/benchmark changes.
Added interpretation note clarifying that response speedup is amortized benefit from skipping redundant wakes under this simulation.
Added absolute latency outputs and median speedup math.

Criterion absolute latency snapshot (100 samples, 2s warmup, 6s measurement):

response/old: [3.2519 ns 3.2712 ns 3.2926 ns]
response/new: [763.41 ps 766.39 ps 769.43 ps]
register/old: [2.3768 ns 2.3819 ns 2.3874 ns]
register/new: [286.76 ps 287.51 ps 288.27 ps]

Median speedups:

response path: 4.27x
register path: 8.28x

Testing Done

Executed locally on branch mingley/tso-waker-criterion:

cargo fmt --all -- --check

Result: pass.

cargo clippy --all-targets --all-features -- -D warnings

Result: pass.

cargo test

Result: pass.
Summary: 53 passed, 0 failed (unit tests) and 49 doc tests passed.

cargo bench --bench tso_waker_policy -- --noplot

Result: pass.
Output captured in doc/tso_waker_criterion.md.

Summary by CodeRabbit

Chores
- Added benchmarking infrastructure to measure and compare wake/registration policies.
Documentation
- Added a benchmarking guide describing goals, method, and interpreted results.
Performance
- Reduced unnecessary wake/registration activity in timestamp request handling, improving responsiveness and lowering microsecond-/sub-microsecond-scale latency.

ti-chi-bot · 2026-02-06T05:39:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign you06 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-02-06T05:39:58Z

Welcome @mingley!

It looks like this is your first PR to tikv/client-rust 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to tikv/client-rust. 😃

coderabbitai · 2026-02-06T05:40:11Z

📝 Walkthrough

Walkthrough

Adds a Criterion micro-benchmark and documentation for TSO waker policies, and modifies timestamp oracle logic to observe batch sizes and to conditionally wake or register wakers based on pending-queue fullness transitions.

Changes

Cohort / File(s)	Summary
Build config `Cargo.toml`	Add `criterion = "0.5"` to `[dev-dependencies]` and add `[[bench]]` entry `tso_waker_policy` with `harness = false`.
Benchmarks `benches/tso_waker_policy.rs`	New Criterion micro-benchmark comparing old vs new TSO waker wake/register policies; implements a noop waker and synthetic pending-queue scenarios; configures group warmup/measurement and labeled benchmark cases.
Documentation `doc/tso_waker_criterion.md`	Add benchmark doc describing goal, method (cargo bench with Criterion), target file, config, usage, and reported micro-benchmark results and interpretation.
TSO runtime logic `src/pd/timestamp.rs`	Spawn TSO loop as async task; add `observe_tso_batch(...)` on non-empty batches; compute `should_wake_sender` and wake sender only when queue transitions from full→non-full; register self-waker only when queue remains full after allocation; return `Ok(())` explicitly and adjust related polling logic.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant TSO_Task as "TimestampOracle Task"
participant PendingQ as "Pending Queue"
participant Sender as "Sender Waker / Receiver"

Client->>TSO_Task: send TSO request
TSO_Task->>PendingQ: allocate ID / append request
PendingQ-->>TSO_Task: report previous_full_state
TSO_Task->>PendingQ: compute current_full_state after allocation
alt previously full and now not full
    TSO_Task->>Sender: should_wake_sender == true (wake sender)
    Sender-->>Client: resume sender processing
else still full / no capacity
    TSO_Task->>TSO_Task: register self-waker (block)
end
TSO_Task->>TSO_Task: observe_tso_batch(size) when batch formed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I rabbit-hop through benches bright,
I time each wake in morning light,
Old wakes, new wakes — I sniff and jot,
Tiny ticks counted, every dot 🐇⏱️
Queue full? I thump; queue frees — I plot.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Reduce TSO waker churn and quantify impact with Criterion' accurately summarizes the main changes: optimization of TSO waker behavior and addition of a Criterion benchmark to measure performance impact.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/pd/timestamp.rs (1)

78-82: ⚠️ Potential issue | 🟡 Minor

Background task errors are silently discarded due to missing JoinHandle handling.

The Result<()> return type and explicit Ok(()) at line 116 align cleanly with the ? usage inside the function. However, the JoinHandle returned by tokio::spawn(run_tso(...)) at line 62 is never stored or awaited. Since errors can occur at both the pd_client.tso() call (line 99) and within allocate_timestamps() (line 105), failures in the background task will go unnoticed and the connection closure will only be discovered when callers receive a channel-closed error instead of the root cause.

Consider storing the JoinHandle and handling its potential error, or spawn a task that logs/propagates failures.

🧹 Nitpick comments (3)

benches/tso_waker_policy.rs (2)

20-36: The "old" and "new" response benchmarks have asymmetric work, which is expected but worth noting.

In response_policy_old, wake() is called unconditionally on every iteration, while in response_policy_new, it's called only on the full→non-full transition (~once per 1024 iterations). The reported speedup primarily measures the cost of not calling wake(), rather than the overhead of the conditional check itself. This is fine for validating the optimization's effect, but the doc and PR description should be clear that the speedup reflects the amortized skip rate under this specific simulation pattern.

Also applies to: 38-57

9-11: Consider documenting what FULL_EVERY and FULL_WINDOW represent.

These simulation parameters control how often the queue becomes full in the benchmark, directly affecting the measured speedup ratio. A brief comment explaining their role would help future readers understand and tune the benchmark.

doc/tso_waker_criterion.md (1)

1-58: Consider noting that results should be re-run when the benchmark or production code changes.

Hardcoded benchmark results in committed documentation risk becoming stale as the code evolves. Consider adding a note that these numbers are a point-in-time snapshot and should be re-collected after significant changes to timestamp.rs or the benchmark itself.

Signed-off-by: Michael Ingley <mingley@linkedin.com>

mingley · 2026-02-06T06:04:15Z

Addressed the CodeRabbit feedback in commit cd6aa8f:

Background TSO task errors are now surfaced via logs

Updated src/pd/timestamp.rs so the spawned task wraps run_tso(...) and logs any Err(...) with context (warn!) instead of silently discarding it.

Benchmark assumptions are now explicit

Added comments in benches/tso_waker_policy.rs for FULL_EVERY / FULL_WINDOW.
Added inline notes clarifying old vs new wake behavior (unconditional vs transition-only wake).

Benchmark doc now includes interpretation/staleness guidance

Updated doc/tso_waker_criterion.md to state these numbers are point-in-time and should be re-run after changes to src/pd/timestamp.rs or the benchmark.
Added a note that the response benchmark speedup reflects amortized skipped wake calls under this simulation pattern.

mingley · 2026-02-06T06:06:59Z

@coderabbitai review

coderabbitai · 2026-02-06T06:07:07Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Signed-off-by: Michael Ingley <mingley@linkedin.com>

mingley · 2026-02-06T06:22:00Z

Addressed local review finding #1 in bba9a1c:

src/pd/timestamp.rs: changed stream loop from while let Some(Ok(resp)) to while let Some(resp) + let resp = resp?; so gRPC stream errors are propagated out of run_tso and logged by the spawned-task wrapper instead of being silently treated as normal termination.

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

Optimize TSO waker scheduling and add Criterion benchmark

a66b5e6

Signed-off-by: Michael Ingley <mingley@linkedin.com>

mingley force-pushed the mingley/tso-waker-criterion branch from c2fdbf5 to a66b5e6 Compare February 6, 2026 05:47

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Feb 6, 2026

Address review feedback on TSO waker PR

cd6aa8f

Signed-off-by: Michael Ingley <mingley@linkedin.com>

Propagate TSO stream errors from background task

bba9a1c

Signed-off-by: Michael Ingley <mingley@linkedin.com>

mingley changed the title ~~Reduce TSO waker churn and quantify impact with Criterion~~ perf - Reduce TSO waker churn and quantify impact with Criterion Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf - Reduce TSO waker churn and quantify impact with Criterion#529

perf - Reduce TSO waker churn and quantify impact with Criterion#529
mingley wants to merge 3 commits intotikv:masterfrom
mingley:mingley/tso-waker-criterion

mingley commented Feb 6, 2026 •

edited

Loading

Uh oh!

ti-chi-bot bot commented Feb 6, 2026

Uh oh!

ti-chi-bot bot commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mingley commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Summary by CodeRabbit

Uh oh!

ti-chi-bot bot commented Feb 6, 2026

Uh oh!

ti-chi-bot bot commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026

Uh oh!

mingley commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mingley commented Feb 6, 2026 •

edited

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading