This is an open source work-in-progress repository collecting a set of software tools for power monitoring and controlling in HPC systems. Some features might change in the future, some features might be removed when part of the technology used here become deprecated.
Welcome to Software Power Limiting Tools repository - an open source collection of tools designed for supporting energy-aware high-performance computing. So far the repository supports Intel based CPUs and NVIDIA GPUs.
- Static Energy Profiler (StEP)
./build/StEP - Dynamic Energy-Performance Optimizer
./build/DEPO
This tool is designed for static exploration of the energy characteristic of the given Device. The device types supported at the moment are:
- Intel CPU
- Intel XPU (Ponte Vecchio)
- NVIDIA GPU
The tool is designed for automatic examination of power limits impact on energy consumption and performance
of application executed on examined device.
Basically, the tool automates the subsequent execution of the application with different power caps (limits)
and as a result produces the report of the best power cap with respect to 3 predefined target optimization metrics.
Since Intel CPUs don't have clear minimal power limit value the tool it is exploring the power limits range between
the idle consumption power limit level (measured prior to application execution) and default power cap value
(usually the same as maximal available power limit value). The power limits are examined with Linear Search method,
starting from the highest value, with decrement defined percentage-wise by the user in the params.conf file.
The details of the tool for CPU might be found in the paper:
Krzywaniak, A., Czarnul, P., & Proficz, J. (2019).
Extended investigation of performance-energy trade-offs under power capping in HPC environments.
For GPU the power limits are explored between max and min value (which are defined by NVIDIA and available to obtain via NVML API). The power decrement is fixed for 5W.
The details of the tool for GPU might be found in the paper:
Krzywaniak, A., Czarnul, P., & Proficz, J. (2022).
GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition.
This tool is designed for dynamic exploration and selection of the power cap according to selected target optimization metric. The tool shall be launched with the application for which the energy and performance shall be optimized by adjusting the power cap value with respect to the selected target metric. The tool is able to perform the search with Linear Search Algorithm or with Golden Section Search Algorithm. The details of the tool for CPU may be found in the paper:
Krzywaniak, A., Czarnul, P., & Proficz, J. (2022).
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing. SOFTWARE-PRACTICE & EXPERIENCE, 52, 2598-2634.
The details of the tool for GPU may be found in the paper:
Adam Krzywaniak, Paweł Czarnul, Jerzy Proficz (2023)
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool, Future Generation Computer Systems, Volume 145, 2023, Pages 396-414, ISSN 0167-739X.
It is recommended to create a build directory.
mkdir build
cd build
cmake ..
make
mkdir build
cd build
cmake -DWITH_XPU=ON ..
cmake ../ -DWITH_XPU=ON -DLIBZE_LOADER_PATH=<custom directory with libze_loader.so>
To test your configuration you can run unit tests (with super user priviliges):
sudo ctest -Vor with logging all messages to console (where log_level is: info/warn/debug/trace/err/critical)
sudo SPDLOG_LEVEL=trace ctest -Vsudo apt update && sudo apt install build-essential cmake gnuplot
sudo apt install libboost-all-dev graphviz
sudo apt install libyaml-cpp-dev
Note: the power limiting feature requires root privileges in Ubuntu OS, hence below commands are executed as sudo user.
sudo ./build/apps/StEP/StEP ./minibenchmarks/openmp/fft 16384 25
Above command shall run exemplary FFT application with StEP for CPU and produce
as a result cpu_experiment_* folder with .csv logs and visualised .png
power log similar to below one:
and StEP tool results visualised as below:
By default Current capping is used for XPU. In case you want to test power capping, you need to set the USE_AMPERES environment variable to 0 before running the application. For example:
sudo USE_AMPERES=0 ./build/apps/StEP/StEP <cmdline of your Intel XPU workload>
sudo ./build/apps/DEPO/DEPO --ls --en ./minibenchmarks/openmp/fft 1024 300
Above command shall run exemplary FFT application with DEPO and produce
as a result cpu_experiment_* folder with .csv logs and visualised .png
power log. See details in the next section.
One may modify the execution parameters in config.yaml file or with command line parameters which may be listed
with DEPO --help.
The parameters in the config.yaml file are documented in comments.
When using the GPU backend, DEPO accepts a single device id or a comma-separated list, for example --gpu 0 or --gpu 0,1. The following behaviors apply in addition to the single-GPU case described above.
-
Single GPU (
--gpuwith one id): One NVIDIA device is controlled; power limits and kernel-activity tracing follow the same model as in the original DEPO GPU workflow. -
Multi-GPU without
--async: All listed GPUs receive the same enforced power limit at each tuning step (one scalar cap applied to every selected device via NVML). The CUPTI injection library still drives the sharedkernels_countsignal used for online performance tracing in the usual way. -
Multi-GPU with
--async: Only meaningful when at least two GPU ids are given (if you pass--asyncwith a single GPU, DEPO warns and behaves as without--async). Here DEPO enables independent NVML power limits per listed GPU. The chosen search algorithm (Linear Search or Golden Section Search) is run once per GPU in order: while one GPU is being swept, the others keep the current baseline caps, so the final limits may differ between devices. The reported scalar cap in summaries corresponds to the average of the per-GPU micro-watt limits; the execution phase applies the full per-GPU cap vector. This--asyncflag is not the same mechanism as the experimental external trigger file described in the “Experimental asynchronous Tuning” subsection below. -
Build/runtime: GPU injection requires building the profiling injection library (e.g. under
profiling_injection) and making its path available to DEPO (seeCUDA_INJECTION64_PATH//tmp/depo_gpu_pathas used in your environment). Power capping still requires appropriate privileges (e.g.sudoon typical Linux setups), consistent with other DEPO GPU usage notes in this document. -
Console monitoring (multi-GPU): When more than one subdevice is active, the live table uses time in ms, then for each GPU instantaneous power and enforced cap columns in pairs:
P_gpu<id>[W],Cap_gpu<id>[W](NVML per-GPU limit), instead of a single combinedP_capcolumn.
In DEPO there are several optimization modes available:
- Just power sampling, which launches the application and monitors and reports power and energy consumption when finished, available when
--no-tuningparameter is passed.
- Single immediate tuning, which launches the Tuing Phase as soon as the optimized device activity is detected, available with
config.yamlparameters set to:repeatTuningPeriodInSec: 0anddoWaitPhase: 0.
- Single tuning with wait, which launches Tuing Phase after SMA based Power filter detects stable average power consumption, available with
config.yamlparameters set to:repeatTuningPeriodInSec: 0anddoWaitPhase: 1.
- Periodic immediate tuning, which launches the Tuning Phase as soon as the optimized device activity is detected and repeats the tuning phase after a period defined in seconds with
repeatTuningPeriodInSec: 30(for 30s execution with selected power cap before next Tuning Phase). AssumingdoWaitPhase: 0.
- Periodic tuning with wait, which adds Wait Phase before first Tuning Phase, available with
config.yamlparameters set to:repeatTuningPeriodInSec: 30(for 30s period before next Tuing Phase) anddoWaitPhase: 1.
For any mode one may run DEPO with Linear Search algorithm as well:

There is also a way of triggering a tuning phase on demand with external signal.
For now it works with application executed in any of available DEPO modes besides "just sampling". It will be fixed soon.
The feature allows triggering the next Tuning Phase with external trigger using specific file modification.
All one has to do to trigger asynchronously the next Tuing Phase is to execute touch /tmp/trigger_file during execution of DEPO with selected application.

If one wish to add support for other CPU vendors, other GPU vendors or other compute devices they have to make sure that the target device provides:
- an API for monitoring the power or energy consumption
- an API for monitoring the performance of the compute device (e.g., executed instructions per second counters)
- an API for controlling the power limits
The new HW support may be added by preparing a NewDevice class inherited from the Device class, which would implement the interface required for using the Device by Eco class.
Next step would be adding the NewDevice option in the DEPO application source file, i.e., src/apps/DynamicECO.cpp or writing own DEPO program with just the NewDevice class.
If you find this code usefull please cite any of our papers which contributed to this codebase:
-
Static Energy Profiler (StEP) for CPU:
@INPROCEEDINGS{9188149, author={Krzywaniak, Adam and Czarnul, Pawel and Proficz, Jerzy}, booktitle={2019 International Conference on High Performance Computing & Simulation (HPCS)}, title={Extended investigation of performance-energy trade-offs under power capping in HPC environments}, year={2019}, pages={440-447}, doi={10.1109/HPCS48598.2019.9188149} } -
Static Energy Profiler (StEP) for GPU:
@inproceedings{10.1007/978-3-031-08751-6_48, author = {Krzywaniak, Adam and Czarnul, Pawel and Proficz, Jerzy}, booktitle = {Computational Science -- ICCS 2022}, isbn = {978-3-031-08751-6}, pages = {667--681}, publisher = {Springer International Publishing}, title = {GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition}, year = {2022} } -
Dynamic Energy-Performance Optimizer (DEPO) for CPU:
@article{https://doi.org/10.1002/spe.3139, author = {Krzywaniak, Adam and Czarnul, Paweł and Proficz, Jerzy}, title = {{DEPO}: A dynamic energy-performance optimizer tool for automatic power capping for energy efficient high-performance computing}, journal = {Software: Practice and Experience}, volume = {52}, number = {12}, pages = {2598-2634}, keywords = {automatic power capping, green computing, HPC, performance-energy trade-off, software tools}, doi = {https://doi.org/10.1002/spe.3139}, year = {2022} } -
Dynamic Energy-Performance Optimizer (DEPO) for GPU:
@article{KRZYWANIAK2023396, author = {Adam Krzywaniak and Pawe{\l} Czarnul and Jerzy Proficz}, doi = {https://doi.org/10.1016/j.future.2023.03.041}, issn = {0167-739X}, journal = {Future Generation Computer Systems}, keywords = {Energy-aware computing, High-performance computing, Green computing, Machine learning, GPU energy optimization}, pages = {396-414}, title = {Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool}, volume = {145}, year = {2023}, }
https://developer.nvidia.com/cuda-downloads
DEPO GPU requires CUDA compute capabilities 7.0 or higher, what means that architectures older than Volta (e.g., Pascal) are not supported.



