Skip to content

server/resource_group: add allocation observability#10605

Draft
okJiang wants to merge 1 commit intotikv:masterfrom
okJiang:codex/rc-server-observability-latest
Draft

server/resource_group: add allocation observability#10605
okJiang wants to merge 1 commit intotikv:masterfrom
okJiang:codex/rc-server-observability-latest

Conversation

@okJiang
Copy link
Copy Markdown
Member

@okJiang okJiang commented Apr 20, 2026

What problem does this PR solve?

Issue Number: ref #10488

Resource control observability on the server side does not clearly show why tokens are granted slowly, whether service_limit is involved, or whether server-side metrics stay correct across cleanup and edge cases.

What is changed and how does it work?

This PR adds server-side allocation metrics and fixes several correctness gaps so the controller and server can be diagnosed together.

server/resource_group: add allocation observability

It includes:

  • server metrics for granted tokens, slot counts, slot events, token loan, trickle duration, RU config, and throttling causes
  • explicit cause attribution for service_limit versus group_fill_rate_or_burst
  • keyspace-name fallback and cleanup fixes so metrics stay continuous
  • tests for immediate grant, trickle grant, slot-event accounting, and metric cleanup

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Manual test:

  • Verified in a local tiup playground setup with 3 PD, 2 TiDB, and 3 TiKV.
  • Ran a shared-service_limit scenario with two resource groups and confirmed Grafana shows service_limit in throttling causes.
  • Ran a low-fill-rate scenario and confirmed Grafana shows group_fill_rate_or_burst without false service_limit signals.

Code changes

Side effects

  • Increased code complexity

Related changes

Release note

Add server-side resource control metrics for allocation state, token trickle, and throttling causes.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 20, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. labels Apr 20, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d34757f-3022-4ca3-9a27-efb522af420b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andremouche for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 93.23308% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.98%. Comparing base (b21a183) to head (b65de01).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10605      +/-   ##
==========================================
+ Coverage   78.96%   78.98%   +0.01%     
==========================================
  Files         532      532              
  Lines       71883    72083     +200     
==========================================
+ Hits        56766    56937     +171     
- Misses      11093    11116      +23     
- Partials     4024     4030       +6     
Flag Coverage Δ
unittests 78.98% <93.23%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant