Skip to content

Add per-thread CPU stats reporting in JSON output#346

Open
fcostaoliveira wants to merge 5 commits intomasterfrom
per-thread-cpu-stats
Open

Add per-thread CPU stats reporting in JSON output#346
fcostaoliveira wants to merge 5 commits intomasterfrom
per-thread-cpu-stats

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Collaborator

@fcostaoliveira fcostaoliveira commented Feb 24, 2026

Summary

  • Report per-second, per-thread CPU usage in JSON output using native OS APIs (Mach thread_info on macOS, pthread_getcpuclockid on Linux)
  • Emit a stderr warning when any worker thread exceeds 95% CPU, indicating results may be unreliable
  • Add integration tests: JSON structure validation, high-load stress test, and psutil-based external cross-validation

Details

A new CPU Stats section is added to the JSON output under ALL STATS, reporting per-second CPU percentages for the main thread and each worker thread. This helps users identify when memtier itself is the bottleneck rather than the server under test.

Files changed

  • memtier_benchmark.cppget_thread_cpu_usec() helper + per-second CPU sampling in the benchmark loop
  • run_stats.cpp / run_stats.h / run_stats_types.hper_second_cpu_stats struct and JSON serialization
  • tests/test_cpu_stats.py — Three integration tests (structure, high-load, external validation)
  • tests/test_requirements.txt — Added psutil dependency for external validation test
  • AGENTS.md / DEVELOPMENT.md — Documentation updates

Test plan

  • test_cpu_stats_in_json — Verifies CPU Stats appear in JSON with correct per-thread structure
  • test_cpu_stats_high_load — 500 clients on 1 thread; verifies non-trivial CPU and >95% warning
  • test_cpu_stats_external_validation — Cross-validates against psutil measurements (±15pp tolerance)
  • Build on Linux to verify pthread_getcpuclockid path compiles

🤖 Generated with Claude Code


Note

Medium Risk
Adds per-second CPU sampling in the main benchmark loop and emits new JSON/stderr output; main risk is measurement overhead and potential platform-specific differences (macOS Mach vs Linux pthread clocks) affecting performance and tests.

Overview
Adds per-second, per-thread CPU utilization reporting to the benchmark run and exposes it in JSON output under ALL STATSCPU Stats (including main thread and each worker thread).

Implements OS-specific CPU time sampling (Mach on macOS, pthread_getcpuclockid on Linux), stores samples in run_stats, and prints a stderr warning when any worker exceeds 95% CPU.

Introduces new integration coverage in tests/test_cpu_stats.py (JSON structure checks, high-load warning/usage assertions, and optional psutil cross-validation) and documents the new testing/dependency requirements (adds psutil to tests/test_requirements.txt).

Written by Cursor Bugbot for commit bff2e9c. This will update automatically on new commits. Configure here.

Report per-second, per-thread CPU usage in JSON output using native OS
APIs (Mach thread_info on macOS, pthread_getcpuclockid on Linux). Fix
unsigned underflow when sampling finished threads. Add psutil-based
external validation test and enforce no thread reports 100% CPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Feb 24, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Comment thread memtier_benchmark.cpp Outdated
Use gettimeofday to measure real elapsed time between samples instead
of assuming exactly 1 second. The sleep(1) plus loop processing meant
the actual interval always exceeded 1s, systematically inflating CPU
percentages and potentially exceeding 100%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread tests/test_cpu_stats.py
env.assertTrue('Thread 0' in second_data)
thread_cpu = second_data['Thread 0']
env.assertTrue(thread_cpu >= 0)
env.assertTrue(thread_cpu < 100)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU percentage rounding can break strict less-than-100 assertions

Medium Severity

The C++ code writes CPU percentages to JSON using "%.2f" formatting, which rounds values like 99.995% up to 100.00. The tests assert thread_cpu < 100, which fails when parsing 100.00 from JSON. The high-load test (500 clients, 1 thread, pipeline 100) is specifically designed to drive CPU near 100%, making this rounding-induced failure realistic on fast multi-core systems. Additionally, the wall time is measured with gettimeofday (non-monotonic), so an NTP clock adjustment could produce cpu_pct > 100% directly.

Additional Locations (2)

Fix in Cursor Fix in Web

Compare total process CPU (all threads) from both sources instead of
trying to isolate worker threads. psutil sees internal threads (libevent,
I/O) that memtier doesn't report in JSON, which inflated the external
"worker" sum and caused the cross-validation to fail. Also widen
tolerance to 25pp to account for untracked internal threads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread tests/test_cpu_stats.py
addTLSArgs(benchmark_specs, env)
config = get_default_memtier_config(threads=2, clients=5, requests=1000)
master_nodes_list = env.getMasterNodesList()
overall_expected_request_count = get_expected_request_count(config)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable in CPU stats test

Low Severity

The variable overall_expected_request_count is computed using get_expected_request_count(config) but never used anywhere in the test_cpu_stats_in_json function. This is dead code that should be removed to keep the test clean and maintainable.

Fix in Cursor Fix in Web

Comment thread memtier_benchmark.cpp
gettimeofday(&cpu_cur_tv, NULL);
double wall_usec = (double)(cpu_cur_tv.tv_sec - cpu_prev_tv.tv_sec) * 1000000.0
+ (double)(cpu_cur_tv.tv_usec - cpu_prev_tv.tv_usec);
if (wall_usec < 1.0) wall_usec = 1.0; // guard against division by zero
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wall clock adjustment causes incorrect CPU percentages

Low Severity

If the system clock is adjusted backwards (via NTP, daylight saving, or manual change) during benchmark execution, wall_usec becomes negative. The guard if (wall_usec < 1.0) wall_usec = 1.0 clamps it to 1 microsecond instead of handling the error condition properly, causing CPU deltas divided by 1.0 to produce wildly inflated percentages (potentially millions of percent) in the JSON output.

Fix in Cursor Fix in Web

Comment thread memtier_benchmark.cpp
Comment thread memtier_benchmark.cpp
unsigned long long main_prev_cpu = get_thread_cpu_usec(pthread_self());
std::vector<unsigned long long> thread_prev_cpu(threads.size());
for (size_t t = 0; t < threads.size(); t++) {
thread_prev_cpu[t] = get_thread_cpu_usec(threads[t]->m_thread);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uninitialized pthread handle used for CPU measurement

High Severity

The code reads threads[t]->m_thread immediately after calling start() without verifying that thread creation succeeded. If pthread_create fails, m_thread remains uninitialized and passing it to get_thread_cpu_usec causes undefined behavior (potential crash when calling pthread_mach_thread_np on macOS or pthread_getcpuclockid on Linux with garbage pthread_t values).

Additional Locations (1)

Fix in Cursor Fix in Web

Only call debugPrintMemtierOnError when the benchmark actually fails,
preventing FileNotFoundError from missing Redis log files on success.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread run_stats.cpp
table.add_column(column);
}

void run_stats::set_cpu_stats(std::vector<per_second_cpu_stats> cpu_stats)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU stats missing from aggregated results

Low Severity

When run_count > 1, average.aggregate_average(all_stats) computes aggregated output, but aggregate_average() never propagates/merges m_cpu_stats, so “AGGREGATED AVERAGE RESULTS” JSON omits CPU Stats even though individual runs may include them.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Comment thread memtier_benchmark.cpp
}
}
cpu_prev_tv = cpu_cur_tv;
cpu_history.push_back(cpu_snap);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU stats break across thread restarts

Medium Severity

Per-thread CPU tracking keeps thread_prev_cpu[t] keyed only by thread index, but worker threads can be join()ed and restart()ed with a different pthread_t. After a restart, deltas are computed against the previous thread’s CPU clock (or 0 if calls fail), producing incorrect per-thread CPU% and potentially spurious/missed >95% warnings.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant