From 1f616a86f3ce2f725a0514b1caa48b9d45009673 Mon Sep 17 00:00:00 2001
From: Ludovic Rousseau We'll talk here about the case where scaphandre is able to effectively measure the power consumption of the host (see compatibility section for more on sensors and their prerequesites) and specifically about the PowercapRAPL sensor. Let's clarify what's happening when you collect metrics with scaphandre and this sensor.
-RAPL stands for Running Average Power Limit. It's a technnology embedded in most Intel and AMD x86 CPUs produced afeter 2012. Thanks to this technology it is possible to get the total energy consumption of the CPU, of the consumption per CPU socket, plus in some cases, the consumption of the DRAM controller. In most cases it represents the vast majority of the energy consumption of the machine (except when running GPU intensive workloads, for example). Further improvements shall be made in scaphandre to fully measure the consumption when GPU are involved (or a lot of hard drives on the same host...).Some details about RAPL
Between scaphandre and those data is the powercap kernel module that writes the energy consumption in files. Scaphandre, reads those files, stores the data in buffer and then allows for more processing through the exporters.
The PowercapRAPL sensor does actually some more than just collecting those energy consumption metrics (and casting it in power consumption metrics). Every time the exporter asks for a measurement (either periodically like in the Stdout exporter, or every time a request comes like for the Prometheus exporter) the sensor reads the values of the energy counters from powercap, stores those values and does the same for the CPU usage statistics of the CPU (the one you can see in /proc/stats
) and for each running process on the machine at that time (see /proc/PID/stats
). With those data it is possible to compute the ratio of CPU time actively spent for a given PID on the CPU time actively spent doing something. With this ratio we can then get the subset of power consumption that is related to that PID on a given timeframe (between two measurement requests).