Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.
- Installation
- Getting started
- Features and additional information
- Code snippets
- System monitoring and server-side backends installation and configuration
- Install CERN certificates
yum -y install CERN-CA-certs
- Add
alisw
repo (as root)
cat > /etc/yum.repos.d/alisw-el7.repo <<EOF
[alisw-el7]
name=ALICE Software - EL7
baseurl=https://ali-ci.cern.ch/repo/RPMS/el7.x86_64/
enabled=1
gpgcheck=0
EOF
- Install Monitoring RPM package (as root)
yum -y install alisw-Monitoring+v1.5.4-1.x86_64
- Configure Modules
export MODULEPATH=/opt/alisw/el7/modulefiles:$MODULEPATH
- Load enviroment
eval `modulecmd bash load Monitoring/v1.5.4-1`
The installation directory is: /opt/alisw/el7/Monitoring/v1.5.4-1
Click here if you don't have aliBuild installed
- Compile
Monitoring
and its dependecies viaaliBuild
aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq
- Load the enviroment for Monitoring (in the
alice
directory)
alienv load Monitoring/latest
In case of an issue with aliBuild
refer to the official instructions.
Manual installation of the O2 Monitoring module.
- C++ compiler with C++14 support, eg.:
gcc-c++
package fromdevtoolset-6
on CentOS 7clang++
on Mac OS
- Boost >= 1.56
- libcurl
- ApMon (optional)
git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install
The recommended way of getting (unique_ptr
to) monitoring instance is Get
ing it from MonitoringFactory
by passing backend URI(s) as a parameter (comma seperated if more than one).
The library is accessible from o2::monitoring
namespace.
using namespace o2::monitoring;
MonitoringFactory::Get("backend[-protocol]://host:port[?query]");
See table below to find out how to create URI
for each backend:
Backend name | Transport | URI backend[-protocol] | URI query |
---|---|---|---|
InfluxDB | HTTP | influxdb-http |
/write?db=<db> |
InfluxDB | UDP | influxdb-udp |
- |
ApMon | UDP | apmon |
- |
InfoLogger | - | infologger |
- |
Flume | UDP | flume |
- |
send(Metric&& metric)
Where metric constructor receives following parameters:
T value
std::string& name
[time_point<system_clock> timestamp]
For example:
monitoring->send({10, "myMetricInt"});
Two additional methods can be chained the to send(Metric&& metric)
in order to insert custom tags or set custom timestamp:
addTags(std::vector<Tag>&& tags)
setTimestamp(std::chrono::time_point<std::chrono::system_clock>& timestamp)
For example:
monitoring->send(Metric{10, "myMetric"}.addTags({{"tag1", "value1"}, {"tag2", "value2"}}));
monitoring->send(Metric{10, "myCrazyMetric"}.setTimestamp(timestamp));
It's also possible to send multiple, grouped values in a single metric (Flume
and InfluxDB
backends are supproted, others fallback into sending values in seperate metrics)
void sendGroupped(std::string name, std::vector<Metric>&& metrics)
For example:
monitoring->sendGroupped("measurementName", {{20, "myMetricIntMultiple"}, {20.30, "myMetricFloatMultple"}});
In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:
monitoring->enableBuffering(const unsigned int maxSize)
...
monitoring->flushBuffer();
enableBuffering
takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.
For example:
monitoring->enableBuffering(5);
for (int i = 1; i < 10; i++) {
monitoring->send({10, "myMetricInt"});
}
monitoring->send({20, "myMetricInt2"});
monitoring->flushBuffer();
Metrics consist of 4 parameters: name, value, timestamp and tags.
Parameter name | Type | Required | Default |
---|---|---|---|
name | string | yes | - |
value | int / double / string / uint64_t | yes | - |
timestamp | chrono::time_point<std::chrono::system_clock> | no | current timestamp |
tags | vector | no | -** |
**Default tag set is process specific and included in each metric:
- hostname
- PID
- process name
The module can calculate derived metrics. To do so, use addDerivedMetric(std::string name, DerivedMetricMode mode)
with one of two available modes:
DerivedMetricMode::RATE
- rate between two following metrics;DerivedMetricMode::AVERAGE
- average value of all metrics stored in cache;
Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.
Currently process monitoring is supported only on Linux. To enable it use:
enableProcessMonitoring([interval in seconds]);
The following metrics are generated every interval:
- etime - elapsed time since the process was started, in the form [[DD-]hh:]mm:ss
- pcpu - cpu utilization of the process in "##.#" format. Currently, it is the CPU time used divided by the time the process has been running (cputime/realtime ratio), expressed as a percentage. It will not add up to 100% unless you are lucky
- pmem - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage
- bytesReceived - the total number of bytes of data received by the process (per interface)
- bytesTransmitted - the total number of bytes of data transmitted by the process (per interface).
Code snippets are available in examples directory.
- Sending metric - examples/1-Basic.cxx
- Sending metric with custom taggs - examples/2-TaggedMetrics.cxx
- Sending metric with user defined timestamp - examples/3-UserDefinedTimestamp.cxx
- Calculating derived metrics - examples/4-RateDerivedMetric.cxx
- Sending multiple values in a single metric - examples/8-Multiple.cxx
This guide explains manual installation. For ansible
deployment see AliceO2Group/system-configuration gitlab repo.