Add per-thread CPU stats reporting in JSON output#346
Add per-thread CPU stats reporting in JSON output#346fcostaoliveira wants to merge 5 commits intomasterfrom
Conversation
Report per-second, per-thread CPU usage in JSON output using native OS APIs (Mach thread_info on macOS, pthread_getcpuclockid on Linux). Fix unsigned underflow when sampling finished threads. Add psutil-based external validation test and enforce no thread reports 100% CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Use gettimeofday to measure real elapsed time between samples instead of assuming exactly 1 second. The sleep(1) plus loop processing meant the actual interval always exceeded 1s, systematically inflating CPU percentages and potentially exceeding 100%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| env.assertTrue('Thread 0' in second_data) | ||
| thread_cpu = second_data['Thread 0'] | ||
| env.assertTrue(thread_cpu >= 0) | ||
| env.assertTrue(thread_cpu < 100) |
There was a problem hiding this comment.
CPU percentage rounding can break strict less-than-100 assertions
Medium Severity
The C++ code writes CPU percentages to JSON using "%.2f" formatting, which rounds values like 99.995% up to 100.00. The tests assert thread_cpu < 100, which fails when parsing 100.00 from JSON. The high-load test (500 clients, 1 thread, pipeline 100) is specifically designed to drive CPU near 100%, making this rounding-induced failure realistic on fast multi-core systems. Additionally, the wall time is measured with gettimeofday (non-monotonic), so an NTP clock adjustment could produce cpu_pct > 100% directly.
Additional Locations (2)
Compare total process CPU (all threads) from both sources instead of trying to isolate worker threads. psutil sees internal threads (libevent, I/O) that memtier doesn't report in JSON, which inflated the external "worker" sum and caused the cross-validation to fail. Also widen tolerance to 25pp to account for untracked internal threads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| addTLSArgs(benchmark_specs, env) | ||
| config = get_default_memtier_config(threads=2, clients=5, requests=1000) | ||
| master_nodes_list = env.getMasterNodesList() | ||
| overall_expected_request_count = get_expected_request_count(config) |
There was a problem hiding this comment.
| gettimeofday(&cpu_cur_tv, NULL); | ||
| double wall_usec = (double)(cpu_cur_tv.tv_sec - cpu_prev_tv.tv_sec) * 1000000.0 | ||
| + (double)(cpu_cur_tv.tv_usec - cpu_prev_tv.tv_usec); | ||
| if (wall_usec < 1.0) wall_usec = 1.0; // guard against division by zero |
There was a problem hiding this comment.
Wall clock adjustment causes incorrect CPU percentages
Low Severity
If the system clock is adjusted backwards (via NTP, daylight saving, or manual change) during benchmark execution, wall_usec becomes negative. The guard if (wall_usec < 1.0) wall_usec = 1.0 clamps it to 1 microsecond instead of handling the error condition properly, causing CPU deltas divided by 1.0 to produce wildly inflated percentages (potentially millions of percent) in the JSON output.
| unsigned long long main_prev_cpu = get_thread_cpu_usec(pthread_self()); | ||
| std::vector<unsigned long long> thread_prev_cpu(threads.size()); | ||
| for (size_t t = 0; t < threads.size(); t++) { | ||
| thread_prev_cpu[t] = get_thread_cpu_usec(threads[t]->m_thread); |
There was a problem hiding this comment.
Uninitialized pthread handle used for CPU measurement
High Severity
The code reads threads[t]->m_thread immediately after calling start() without verifying that thread creation succeeded. If pthread_create fails, m_thread remains uninitialized and passing it to get_thread_cpu_usec causes undefined behavior (potential crash when calling pthread_mach_thread_np on macOS or pthread_getcpuclockid on Linux with garbage pthread_t values).
Additional Locations (1)
Only call debugPrintMemtierOnError when the benchmark actually fails, preventing FileNotFoundError from missing Redis log files on success. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| table.add_column(column); | ||
| } | ||
|
|
||
| void run_stats::set_cpu_stats(std::vector<per_second_cpu_stats> cpu_stats) |
There was a problem hiding this comment.
CPU stats missing from aggregated results
Low Severity
When run_count > 1, average.aggregate_average(all_stats) computes aggregated output, but aggregate_average() never propagates/merges m_cpu_stats, so “AGGREGATED AVERAGE RESULTS” JSON omits CPU Stats even though individual runs may include them.
Additional Locations (1)
# Conflicts: # memtier_benchmark.cpp
| } | ||
| } | ||
| cpu_prev_tv = cpu_cur_tv; | ||
| cpu_history.push_back(cpu_snap); |
There was a problem hiding this comment.
CPU stats break across thread restarts
Medium Severity
Per-thread CPU tracking keeps thread_prev_cpu[t] keyed only by thread index, but worker threads can be join()ed and restart()ed with a different pthread_t. After a restart, deltas are computed against the previous thread’s CPU clock (or 0 if calls fail), producing incorrect per-thread CPU% and potentially spurious/missed >95% warnings.


Summary
thread_infoon macOS,pthread_getcpuclockidon Linux)Details
A new
CPU Statssection is added to the JSON output underALL STATS, reporting per-second CPU percentages for the main thread and each worker thread. This helps users identify when memtier itself is the bottleneck rather than the server under test.Files changed
memtier_benchmark.cpp—get_thread_cpu_usec()helper + per-second CPU sampling in the benchmark looprun_stats.cpp/run_stats.h/run_stats_types.h—per_second_cpu_statsstruct and JSON serializationtests/test_cpu_stats.py— Three integration tests (structure, high-load, external validation)tests/test_requirements.txt— Addedpsutildependency for external validation testAGENTS.md/DEVELOPMENT.md— Documentation updatesTest plan
test_cpu_stats_in_json— Verifies CPU Stats appear in JSON with correct per-thread structuretest_cpu_stats_high_load— 500 clients on 1 thread; verifies non-trivial CPU and >95% warningtest_cpu_stats_external_validation— Cross-validates against psutil measurements (±15pp tolerance)pthread_getcpuclockidpath compiles🤖 Generated with Claude Code
Note
Medium Risk
Adds per-second CPU sampling in the main benchmark loop and emits new JSON/stderr output; main risk is measurement overhead and potential platform-specific differences (macOS Mach vs Linux pthread clocks) affecting performance and tests.
Overview
Adds per-second, per-thread CPU utilization reporting to the benchmark run and exposes it in JSON output under
ALL STATS→CPU Stats(including main thread and each worker thread).Implements OS-specific CPU time sampling (Mach on macOS,
pthread_getcpuclockidon Linux), stores samples inrun_stats, and prints a stderr warning when any worker exceeds 95% CPU.Introduces new integration coverage in
tests/test_cpu_stats.py(JSON structure checks, high-load warning/usage assertions, and optionalpsutilcross-validation) and documents the new testing/dependency requirements (addspsutiltotests/test_requirements.txt).Written by Cursor Bugbot for commit bff2e9c. This will update automatically on new commits. Configure here.