Skip to content

[improvement](executor) use real elapsed time to compute workload group metrics refresh interval#63537

Open
bosswnx wants to merge 2 commits into
apache:masterfrom
bosswnx:master
Open

[improvement](executor) use real elapsed time to compute workload group metrics refresh interval#63537
bosswnx wants to merge 2 commits into
apache:masterfrom
bosswnx:master

Conversation

@bosswnx
Copy link
Copy Markdown

@bosswnx bosswnx commented May 22, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

The original implementation of WorkloadGroupMetrics::refresh_metrics() uses config::workload_group_metrics_interval_ms / 1000 as a fixed
divisor to compute per-second CPU and scan IO rates. This is inaccurate when:

  1. The refresh thread is delayed due to system load or scheduling jitter
  2. The configured interval is changed at runtime

In both cases, the reported per-second CPU/IO rates diverge from reality.

This PR replaces the fixed config-based interval with the actual monotonic time delta between two consecutive refreshes, so the rates stay
accurate regardless of thread scheduling delays or runtime config changes. It also adds a division-by-zero guard for sub-second refresh
intervals and corresponding unit tests.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

bosswnx added 2 commits May 22, 2026 16:39
…up metrics refresh interval

  Replace the fixed config-based interval with the actual monotonic time delta
  between two refreshes when calculating per-second CPU and scan IO rates in
  WorkloadGroupMetrics, so the rates stay accurate even when the refresh thread
  is delayed or the configured interval is changed at runtime.

  Also add a guard against division by zero when two refreshes happen within
  less than one second, and add unit tests covering:
  - Real elapsed time rate computation
  - Sub-second interval safety (no division by zero)
  - Proportional rate vs interval relationship
  - Memory metrics correctness
  - First-refresh boundary behavior
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bosswnx
Copy link
Copy Markdown
Author

bosswnx commented May 22, 2026

/review

1 similar comment
@bosswnx
Copy link
Copy Markdown
Author

bosswnx commented May 22, 2026

/review

@bosswnx
Copy link
Copy Markdown
Author

bosswnx commented May 22, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31508 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d693096930c953e3230c9e5097a436cb8109e56e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17636	4032	4195	4032
q2	q3	10796	1406	802	802
q4	4679	473	339	339
q5	7558	2271	2089	2089
q6	229	174	142	142
q7	994	771	643	643
q8	9458	1814	1628	1628
q9	5169	4940	4919	4919
q10	6405	2081	1782	1782
q11	432	280	259	259
q12	627	425	307	307
q13	18115	3391	2768	2768
q14	266	265	235	235
q15	q16	822	785	710	710
q17	973	981	906	906
q18	6840	5657	5589	5589
q19	1314	1286	1028	1028
q20	596	502	293	293
q21	6233	2889	2704	2704
q22	468	578	333	333
Total cold run time: 99610 ms
Total hot run time: 31508 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4738	4537	4660	4537
q2	q3	4864	5297	4655	4655
q4	2129	2208	1432	1432
q5	5000	4702	4610	4610
q6	237	178	136	136
q7	1822	1786	1545	1545
q8	2467	2125	2260	2125
q9	7743	7282	7236	7236
q10	4499	4373	3997	3997
q11	540	400	381	381
q12	712	715	510	510
q13	2987	3353	2812	2812
q14	270	280	256	256
q15	q16	676	718	630	630
q17	1267	1253	1247	1247
q18	7405	7004	6997	6997
q19	1142	1093	1091	1091
q20	2214	2194	1948	1948
q21	5344	4769	4571	4571
q22	530	479	424	424
Total cold run time: 56586 ms
Total hot run time: 51140 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169734 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d693096930c953e3230c9e5097a436cb8109e56e, data reload: false

query5	4322	681	530	530
query6	318	224	195	195
query7	4226	570	308	308
query8	318	237	236	236
query9	8813	4052	4000	4000
query10	457	336	302	302
query11	5794	2386	2203	2203
query12	182	130	124	124
query13	1260	644	420	420
query14	6052	5420	5129	5129
query14_1	4418	4380	4431	4380
query15	217	210	187	187
query16	989	445	411	411
query17	952	726	582	582
query18	2463	486	352	352
query19	213	206	161	161
query20	134	135	126	126
query21	217	141	122	122
query22	13635	13614	13431	13431
query23	17174	16444	16000	16000
query23_1	16064	16108	16136	16108
query24	7422	1822	1326	1326
query24_1	1320	1288	1334	1288
query25	598	507	453	453
query26	1341	356	182	182
query27	2657	572	341	341
query28	4525	1989	1948	1948
query29	1006	654	529	529
query30	309	251	203	203
query31	1114	1081	947	947
query32	94	78	77	77
query33	555	370	309	309
query34	1191	1160	647	647
query35	776	794	689	689
query36	1287	1323	1168	1168
query37	160	121	97	97
query38	3275	3186	3136	3136
query39	935	910	915	910
query39_1	864	867	882	867
query40	235	151	132	132
query41	76	71	70	70
query42	120	114	111	111
query43	343	347	308	308
query44	
query45	216	205	192	192
query46	1105	1188	761	761
query47	2236	2252	2108	2108
query48	410	429	319	319
query49	663	514	394	394
query50	978	353	254	254
query51	4319	4298	4203	4203
query52	110	108	98	98
query53	268	294	215	215
query54	339	294	272	272
query55	97	95	89	89
query56	322	323	322	322
query57	1396	1381	1270	1270
query58	306	289	287	287
query59	1633	1673	1447	1447
query60	339	366	310	310
query61	156	158	156	156
query62	669	623	551	551
query63	245	202	208	202
query64	2390	801	653	653
query65	
query66	1714	477	361	361
query67	29991	30062	29933	29933
query68	
query69	471	341	320	320
query70	1113	1011	1038	1011
query71	313	279	274	274
query72	3001	2715	2488	2488
query73	851	782	420	420
query74	5100	4964	4770	4770
query75	2672	2614	2273	2273
query76	2273	1147	798	798
query77	420	425	343	343
query78	12282	12114	11593	11593
query79	1450	1079	737	737
query80	948	541	469	469
query81	505	283	247	247
query82	1329	158	126	126
query83	364	282	260	260
query84	317	149	114	114
query85	927	540	454	454
query86	438	350	336	336
query87	3481	3388	3194	3194
query88	3599	2692	2658	2658
query89	448	383	333	333
query90	1760	189	191	189
query91	180	171	140	140
query92	84	81	76	76
query93	1474	1368	878	878
query94	611	358	330	330
query95	677	397	345	345
query96	1090	825	344	344
query97	2699	2700	2577	2577
query98	238	248	249	248
query99	1106	1105	975	975
Total cold run time: 253385 ms
Total hot run time: 169734 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants