resource_control: add PredictedReadBytes hint for RC paging pre-charge by YuhaoZhang00 · Pull Request #10599 · tikv/pd

YuhaoZhang00 · 2026-04-14T08:24:52Z

Summary

Add an optional predictedReadBytesProvider interface on RequestInfo.
When a caller (e.g. TiDB maintaining a per-logical-scan EMA across paging
cop RPCs) supplies a non-zero PredictedReadBytes, BeforeKVRequest and
AfterKVRequest use that value as the byte basis for the paging pre-charge
instead of PagingSizeBytes. PagingSizeBytes remains the fallback and
worst-case cap.

Existing RequestInfo implementations compile unchanged — the hint is an
optional interface, not a method on RequestInfo.

Why

The fixed 4 MiB paging pre-charge introduced in #10548 typically overshoots
actual scanned bytes and stalls concurrent workers at Phase 1 under tight
resource tiers. A learned per-scan estimate from the TiDB side eliminates
the over-estimation without changing kvproto or TiKV behavior.

Status

Draft. Part of a stacked change:

tikv/client-go: adds tikvrpc.Request.PredictedReadBytes + RequestInfo getter
pingcap/tidb: maintains per-logical-scan TAEMA on copTask, feeds prediction on every paging RPC

Not yet ready for review; e2e validation on the simulation cluster is pending.

Test plan

Unit tests in client/resource_group/controller/ cover hint-present and hint-absent paths, plus an excess-pre-charge refund case
e2e: FullScan + JOIN paging sweep on simulation cluster (pending)

Add PagingSizeBytes() to RequestInfo interface. When a read request carries a byte budget (from RC paging), BeforeKVRequest pre-charges the estimated read RU in Phase 1, and AfterKVRequest subtracts it in Phase 2 to maintain correct total cost. This makes concurrent workers throttle at Phase 1 instead of all hitting Phase 2 simultaneously. Signed-off-by: JmPotato <github@ipotato.me>

Test that BeforeKVRequest pre-charges pagingSizeBytes * ReadBytesCost, AfterKVRequest subtracts it, and the net total equals baseCost + actualCost. Signed-off-by: JmPotato <github@ipotato.me>

…ment When paging byte budget (pagingSizeBytes) is enabled, BeforeKVRequest pre-charges ReadBytesCost * pagingSizeBytes RRU into the token limiter. AfterKVRequest then computes the actual cost and subtracts the pre-charge. If the actual cost is less than the pre-charge (common case, since pagingSizeBytes is an upper bound), the settlement delta is negative. Previously, the negative delta was correctly recorded in the consumption counter but silently dropped by the token limiter due to a `v > 0` guard — causing permanent token leakage proportional to (pagingSizeBytes - actualReadBytes) per request. Fix: add Limiter.RefundTokens (inverse of RemoveTokens) and call it from both onResponseImpl and onResponseWaitImpl when the settlement delta is negative. This ensures the limiter's available token balance accurately reflects actual consumption after each request completes. RefundTokens design notes: - No burst cap applied (consistent with Reconfigure; getTokens handles lazy capping on the next limiter operation). - No maybeNotify call (refunding moves balance away from the low-token threshold, never toward it). Signed-off-by: JmPotato <github@ipotato.me>

Introduce an optional predictedReadBytesProvider interface on RequestInfo. When a caller (e.g. TiDB maintaining a per-logical-scan EMA across paging RPCs) supplies a non-zero PredictedReadBytes, BeforeKVRequest/AfterKVRequest use that value as the byte basis for the paging pre-charge instead of PagingSizeBytes. PagingSizeBytes remains the fallback and worst-case cap. This lets TiDB replace the current fixed 4 MiB pre-charge (which matches the paging byte budget but typically overshoots actual scanned bytes and stalls concurrent workers at Phase 1) with a learned estimate, without changing kvproto or TiKV behavior. The hint is added as an optional interface (not a method on RequestInfo) so existing RequestInfo implementations compile unchanged; they continue to fall back to PagingSizeBytes. Ref: per-logical-scan EMA pre-deduction design Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

ti-chi-bot · 2026-04-14T08:24:56Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-04-14T08:24:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign husharp for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-14T08:25:00Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84e6a801-df3a-4938-9445-ab4c0558565c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ti-chi-bot · 2026-04-14T08:25:03Z

Hi @YuhaoZhang00. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-04-14T08:36:28Z

Codecov Report

❌ Patch coverage is 91.59664% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.94%. Comparing base (3eb99ae) to head (d652178).
⚠️ Report is 19 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10599      +/-   ##
==========================================
+ Coverage   78.88%   78.94%   +0.06%     
==========================================
  Files         530      532       +2     
  Lines       71548    72069     +521     
==========================================
+ Hits        56439    56895     +456     
- Misses      11092    11139      +47     
- Partials     4017     4035      +18

Flag	Coverage Δ
unittests	`78.94% <91.59%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds four Prometheus metrics under namespace resource_manager_client to make the PredictedReadBytes-based paging pre-charge fix directly observable end-to-end: - paging_precharge_source_total{source=predicted|fallback} - paging_precharge_bytes_total{source} - paging_actual_bytes_total{source} - paging_prediction_residual_bytes (histogram, predicted path only) They are wired into onRequestWaitImpl / onResponseImpl so no change to the ResourceCalculator interface is needed. Labels are cached per group to keep the hot path allocation-free. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

YuhaoZhang00 · 2026-04-14T09:22:22Z

Added 4 observability metrics on the PD client side for upcoming e2e validation of the paging pre-charge path:

resource_manager_client_request_paging_precharge_source_total{source=predicted|fallback} — does the PredictedReadBytes hint actually drive BeforeKVRequest
resource_manager_client_request_paging_precharge_bytes_total{source} — pre-charged byte volume per path
resource_manager_client_request_paging_actual_bytes_total{source} — observed ReadBytes per path (ratio with above ≈ over-charge factor; historical ≈4×, target ≈1×)
resource_manager_client_request_paging_prediction_residual_bytes histogram — signed actual − predicted distribution for EMA accuracy

No public interface change; instrumentation is at the onRequestWait / onResponse call sites in group_controller.go. Draft status unchanged; keeping this open pending Step E validation results.

The paging actual-bytes counter was only updated from onResponseImpl, but most responses under RC throttling flow through onResponseWaitImpl, so the counter stayed at 0 in practice and the over-charge ratio metric could not be computed. Mirror the observePagingActual call into the wait path. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

Pre-charge now requires a learned PredictedReadBytes hint. Without a hint (EMA cold start or unhinted caller) the request is not pre-charged and is billed at Phase 2 by actual read bytes only. PagingSizeBytes is no longer consulted for pre-charge so the protocol-level paging cap and the RU-billing pre-charge are fully decoupled. Motivation: Round 2 data showed the fallback path over-charged 4-268x because PagingSizeBytes is a worst-case cap, not an expectation. Under concurrency this produced large artificial Phase 1 throttling, pushing QPS/latency into tier-dependent non-linear regressions. The cold window loses Phase 1 pre-throttling with this change; Phase 2 token-bucket billing still enforces the quota. Rewrote the model tests to reflect the new semantics: - TestPredictedReadBytesPreCharge asserts hint-driven pre-charge is unaffected by PagingSizeBytes. - TestNoPreChargeWithoutPredictedReadBytes asserts PagingSizeBytes alone no longer triggers pre-charge. - Existing refund/settlement tests now drive pre-charge via PredictedReadBytes instead of PagingSizeBytes. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

…trics Follow-up to removing the PagingSizeBytes fallback: the source label on paging pre-charge metrics had only two values (predicted / fallback) and fallback is no longer reachable. Drop the label dimension entirely and rename PagingPrechargeSourceCounter to PagingPrechargeCounter; the remaining three counters and one histogram now measure pre-charged requests only (those with a PredictedReadBytes hint > 0). Inlined estimatePrechargeSource at its three call sites by calling estimatedReadBytes directly, and simplified observePagingPrecharge / observePagingActual to their single remaining case. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

Record RPCs that implement the predicted-bytes hint interface but report zero (EMA cold-start or feature-disabled) and therefore skip Phase 1 pre-charge. Two new counters tagged by resource group: - paging_nonprecharge_total: count of bypassed RPCs - paging_nonprecharge_actual_bytes_total: actual bytes read by them Observed from the Phase 2 settlement path alongside the existing pre-charge actual-bytes metric, gated on non-write requests. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

…g precharge Replace "Phase 1" / "Phase 2" in comments and test names with the existing API terms (BeforeKVRequest pre-charge / AfterKVRequest settle). The Phase 1/2 framing was introduced only in this branch's own commits and is not part of any preexisting controller convention; staying with the BeforeKVRequest/AfterKVRequest vocabulary keeps the prose readable to anyone already familiar with ResourceCalculator. No behavior change; only comments, test variable names, and metric Help strings. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>

ti-chi-bot · 2026-04-21T07:20:14Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456, multiple issues should use full syntax for each issue and be separated by a comma, like: Issue Number: close #123, ref #456.

_{📖 For more info, you can check the "Linking issues" section in the CONTRIBUTING.md.}

JmPotato and others added 4 commits April 2, 2026 10:18

resource_control: add unit test for paging bytes pre-charge

b98ff48

Test that BeforeKVRequest pre-charges pagingSizeBytes * ReadBytesCost, AfterKVRequest subtracts it, and the net total equals baseCost + actualCost. Signed-off-by: JmPotato <github@ipotato.me>

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 14, 2026

ti-chi-bot Bot added dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 14, 2026

ti-chi-bot Bot added contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 14, 2026

ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 14, 2026

ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 14, 2026

YuhaoZhang00 mentioned this pull request Apr 14, 2026

store/copr: refine RC paging pre-charge with per-scan EMA pingcap/tidb#67759

Draft

12 tasks

YuhaoZhang00 added 3 commits April 14, 2026 18:09

ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 20, 2026

ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource_control: add PredictedReadBytes hint for RC paging pre-charge#10599

resource_control: add PredictedReadBytes hint for RC paging pre-charge#10599
YuhaoZhang00 wants to merge 10 commits intotikv:masterfrom
YuhaoZhang00:demo/ema-precharge

YuhaoZhang00 commented Apr 14, 2026

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Review skipped

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

YuhaoZhang00 commented Apr 14, 2026

Uh oh!

ti-chi-bot Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YuhaoZhang00 commented Apr 14, 2026

Summary

Why

Status

Test plan

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ti-chi-bot Bot commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

YuhaoZhang00 commented Apr 14, 2026

Uh oh!

ti-chi-bot Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

codecov Bot commented Apr 14, 2026 •

edited

Loading