Skip to content

[SVLS-8351] Add CPU Enhanced Metrics#77

Draft
kathiehuang wants to merge 7 commits intomainfrom
kathie.huang/add-cpu-enhanced-metrics
Draft

[SVLS-8351] Add CPU Enhanced Metrics#77
kathiehuang wants to merge 7 commits intomainfrom
kathie.huang/add-cpu-enhanced-metrics

Conversation

@kathiehuang
Copy link
Contributor

@kathiehuang kathiehuang commented Feb 13, 2026

What does this PR do?

  • Adds CPU limit and rate enhanced metrics
    • Creates a metrics collector that reads and submit CPU metrics every 3 seconds
    • Sets up:
      • CgroupStats struct for reading statistics from cgroup v1 files
        • This normalizes the stats to nanoseconds
      • CpuStats struct to store the computed CPU total and limit metrics
        • Converts u64 values to to f64
        • Calculates CPU limit percentage
      • CpuMetricsCollector struct that will collect this information at an interval and submit as metrics

Motivation

Additional Notes

Describe how to test/QA your changes

Built with serverless-compat-self-monitoring.

To do: Test different collection and aggregation intervals to find optimal balance between collection frequency and added overhead

Added a bunch of debug logs to see what was going on with the calculations:

DEBUG datadog_trace_agent::metrics_collector: Contents of /sys/fs/cgroup/cpuset/cpuset.cpus: 0-1
DEBUG datadog_trace_agent::metrics_collector: Range: ["0", "1"]
DEBUG datadog_trace_agent::metrics_collector: Total CPU count: 2
DEBUG datadog_trace_agent::metrics_collector: CFS scheduler quota is -1, setting to None
DEBUG datadog_trace_agent::metrics_collector: Could not read scheduler quota from /sys/fs/cgroup/cpu/cpu.cfs_quota_us
DEBUG datadog_trace_agent::metrics_collector: No CPU limit found, defaulting to host CPU count: 2 CPUs
DEBUG datadog_trace_agent::metrics_collector: Collected cpu stats!
DEBUG datadog_trace_agent::metrics_collector: CPU usage: 9871234519
DEBUG datadog_trace_agent::metrics_collector: CPU limit: 200%, defaulted: true
DEBUG datadog_trace_agent::metrics_collector: Submitting CPU metrics!

Looks good but can't read scheduler quota from cpu.cfs_quota_us, so it always falls back to the host cpu count from the crate num_cpus

References: datadog-agent cgroup collection and calculation logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant