-
-
Notifications
You must be signed in to change notification settings - Fork 260
Description
Feature Request: Extend pmUnits to support electrical and thermal dimensions
Background
PCP's pmUnits structure currently supports three dimensions: Space (bytes), Time (seconds), and Count (dimensionless). While this covers many use cases, it forces PMDAs that collect electrical and thermal metrics to define them as dimensionless, losing semantic meaning and preventing proper unit conversion.
Current workaround: All PMDAs with power/thermal metrics use PMDA_PMUNITS(0,0,0,0,0,0) (dimensionless) and rely on metric names or pmrep labels to convey units to users.
Problem
Multiple PMDAs collect metrics that would benefit from proper unit definitions:
| PMDA | Current Dimensionless Metrics | Example |
|---|---|---|
| nvidia | temperature, fanspeed, power, energy | nvidia.temperature (°C), nvidia.power (mW) |
| darwin | thermal.cpu.die, power.battery.voltage, power.battery.amperage, power.battery.capacity, thermal.fan.speed | All defined as dimensionless |
| amdgpu | GPU temperature, power consumption | Hardware monitoring |
| smart | disk temperature sensors | smart.temperature_celsius |
| lmsensors | Linux hwmon sensors (temp, voltage, current, fans) | System health monitoring |
| roomtemp | Environmental temperature monitoring | Datacenter monitoring |
Impact:
- pmrep/pmchart cannot perform unit conversions (e.g., °C ↔ °F, mW → W)
- Unit metadata lost at the PMDA level
- Tools display raw numbers without semantic context
- Inconsistent user experience compared to well-defined metrics (bytes, seconds)
Proposed Solution
Extend pmUnits to support additional physical dimensions commonly encountered in system monitoring:
Proposed New Dimensions
| Dimension | Base Unit | Scale Examples | Use Cases |
|---|---|---|---|
| Temperature | Celsius (°C) | Celsius, Fahrenheit, Kelvin | CPU/GPU/disk temps, thermal sensors |
| Voltage | Volt (V) | V, mV, μV | Power supply, battery voltage, sensor rails |
| Current | Ampere (A) | A, mA, μA | Power draw, battery current |
| Power | Watt (W) | W, mW, μW, kW | Power consumption, TDP monitoring |
| Energy | Joule (J) | J, Wh, kWh | Energy counters, battery capacity |
| Frequency | Hertz (Hz) | Hz, kHz, MHz, GHz | CPU/GPU clocks, timers |
| Angular Velocity | RPM | RPM | Fan speeds |
Implementation Sketch
Extend the pmUnits bitfield structure (in pmapi.h) to add new dimension fields, similar to how dimSpace, dimTime, and dimCount work today:
// New dimension constants (example)
#define PM_TEMP_CELSIUS 0 /* degrees Celsius */
#define PM_TEMP_FAHRENHEIT 1 /* degrees Fahrenheit */
#define PM_TEMP_KELVIN 2 /* Kelvin */
#define PM_VOLTAGE_VOLT 0 /* volts */
#define PM_VOLTAGE_MVOLT 1 /* millivolts */
// ... etcPMDAs could then define metrics properly:
// Before (dimensionless)
PMDA_PMUNITS(0, 0, 0, 0, 0, 0)
// After (semantic temperature)
PMDA_PMUNITS(0, 0, 0, 0, 0, 0, 1, PM_TEMP_CELSIUS)Discussion Points
-
Scope: Which dimensions are most valuable? Should we start with a subset (e.g., temperature + power) or implement comprehensively?
-
Backward compatibility: How to handle existing dimensionless metrics? Deprecation path? Automatic migration?
-
Conversion logic: Should
libpcpgain conversion functions (e.g.,pmConvertTemp(),pmConvertPower())? Or leave to client tools? -
Standard adoption: Should PCP align with SI units, IEC standards, or support multiple unit systems?
-
Effort vs. benefit: Is the improved semantic correctness and UX worth the engineering effort to extend the core unit system?
Benefits
- Semantic correctness: Metrics carry proper unit metadata
- Better UX: Tools can display appropriate units and perform conversions
- Developer experience: PMDA authors can define metrics naturally
- Future-proofing: New hardware monitoring PMDAs get proper unit support from day one
Questions for Maintainers
- Does this align with PCP's design philosophy?
- Are there architectural concerns with extending
pmUnits? - Would this be accepted if implemented with tests and documentation?
- Any alternative approaches we should consider instead?
Context: This came up while developing pmrep views for the darwin PMDA's thermal/power metrics, where specifying units like dC or mAh caused PM_ERR_CONV errors. Investigation revealed this is a systemic limitation affecting multiple PMDAs.