Monitoring

Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.

Installation

RPM (CentOS 7 only)

Install CERN certificates

yum -y install CERN-CA-certs

Add alisw repo (as root)

cat > /etc/yum.repos.d/alisw-el7.repo <<EOF
[alisw-el7]
name=ALICE Software - EL7
baseurl=https://ali-ci.cern.ch/repo/RPMS/el7.x86_64/
enabled=1
gpgcheck=0
EOF

Install Monitoring RPM package (as root)

yum -y install alisw-Monitoring+v1.5.4-1.x86_64

Configure Modules

export MODULEPATH=/opt/alisw/el7/modulefiles:$MODULEPATH

Load enviroment

eval `modulecmd bash load Monitoring/v1.5.4-1`

The installation directory is: /opt/alisw/el7/Monitoring/v1.5.4-1

aliBuild

Click here if you don't have aliBuild installed

Compile Monitoring and its dependecies via aliBuild

aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq

Load the enviroment for Monitoring (in the alice directory)

alienv load Monitoring/latest

In case of an issue with aliBuild refer to the official instructions.

Manual

Manual installation of the O² Monitoring module.

Requirements

C++ compiler with C++14 support, eg.:
- gcc-c++ package from devtoolset-6 on CentOS 7
- clang++ on Mac OS
Boost >= 1.56
libcurl
ApMon (optional)

Monitoring module compilation

git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install

Getting started

Monitoring instance

The recommended way of getting (unique_ptr to) monitoring instance is Geting it from MonitoringFactory by passing backend URI(s) as a parameter (comma seperated if more than one). The library is accessible from o2::monitoring namespace.

using namespace o2::monitoring;
MonitoringFactory::Get("backend[-protocol]://host:port[?query]");

See table below to find out how to create URI for each backend:

Backend name	Transport	URI backend[-protocol]	URI query
InfluxDB	HTTP	`influxdb-http`	`/write?db=<db>`
InfluxDB	UDP	`influxdb-udp`	-
ApMon	UDP	`apmon`	-
InfoLogger	-	`infologger`	-
Flume	UDP	`flume`	-

Sending metric

send(Metric&& metric)

Where metric constructor receives following parameters:

T value
std::string& name
[time_point<system_clock> timestamp]

For example:

monitoring->send({10, "myMetricInt"});

Customized metrics

Two additional methods can be chained the to send(Metric&& metric) in order to insert custom tags or set custom timestamp:

addTags(std::vector<Tag>&& tags)
setTimestamp(std::chrono::time_point<std::chrono::system_clock>& timestamp)

For example:

monitoring->send(Metric{10, "myMetric"}.addTags({{"tag1", "value1"}, {"tag2", "value2"}}));
monitoring->send(Metric{10, "myCrazyMetric"}.setTimestamp(timestamp));

Features and additional information

Grouped values

It's also possible to send multiple, grouped values in a single metric (Flume and InfluxDB backends are supproted, others fallback into sending values in seperate metrics)

void sendGroupped(std::string name, std::vector<Metric>&& metrics)

For example:

monitoring->sendGroupped("measurementName", {{20, "myMetricIntMultiple"}, {20.30, "myMetricFloatMultple"}});

Buffering metrics

In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:

monitoring->enableBuffering(const unsigned int maxSize)
...
monitoring->flushBuffer();

enableBuffering takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.

For example:

monitoring->enableBuffering(5);
for (int i = 1; i < 10; i++) {
  monitoring->send({10, "myMetricInt"});
}
monitoring->send({20, "myMetricInt2"});
monitoring->flushBuffer();

Metrics

Metrics consist of 4 parameters: name, value, timestamp and tags.

Parameter name	Type	Required	Default
name	string	yes	-
value	int / double / string / uint64_t	yes	-
timestamp	chrono::time_point<std::chrono::system_clock>	no	current timestamp
tags	vector	no	-**

**Default tag set is process specific and included in each metric:

hostname
PID
process name

Calculating derived metrics

The module can calculate derived metrics. To do so, use addDerivedMetric(std::string name, DerivedMetricMode mode) with one of two available modes:

DerivedMetricMode::RATE - rate between two following metrics;
DerivedMetricMode::AVERAGE - average value of all metrics stored in cache;

Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.

Monitoring process

Currently process monitoring is supported only on Linux. To enable it use:

enableProcessMonitoring([interval in seconds]);

The following metrics are generated every interval:

etime - elapsed time since the process was started, in the form [[DD-]hh:]mm:ss
pcpu - cpu utilization of the process in "##.#" format. Currently, it is the CPU time used divided by the time the process has been running (cputime/realtime ratio), expressed as a percentage. It will not add up to 100% unless you are lucky
pmem - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage
bytesReceived - the total number of bytes of data received by the process (per interface)
bytesTransmitted - the total number of bytes of data transmitted by the process (per interface).

Code snippets

Code snippets are available in examples directory.

Sending metric - examples/1-Basic.cxx
Sending metric with custom taggs - examples/2-TaggedMetrics.cxx
Sending metric with user defined timestamp - examples/3-UserDefinedTimestamp.cxx
Calculating derived metrics - examples/4-RateDerivedMetric.cxx
Sending multiple values in a single metric - examples/8-Multiple.cxx

System monitoring, server-side backends installation and configuration

This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.

Collectd
Flume
InfluxDB
Grafana
MonALISA (external link)

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
cmake		cmake
config		config
doc		doc
examples		examples
include/Monitoring		include/Monitoring
src		src
test		test
.clang-format		.clang-format
.gitignore		.gitignore
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monitoring

Table of contents

Installation

RPM (CentOS 7 only)

aliBuild

Manual

Requirements

Monitoring module compilation

Getting started

Monitoring instance

Sending metric

Customized metrics

Features and additional information

Grouped values

Buffering metrics

Metrics

Calculating derived metrics

Monitoring process

Code snippets

System monitoring, server-side backends installation and configuration

About

Releases

Packages

Languages

License

jvino/Monitoring

Folders and files

Latest commit

History

Repository files navigation

Monitoring

Table of contents

Installation

RPM (CentOS 7 only)

aliBuild

Manual

Requirements

Monitoring module compilation

Getting started

Monitoring instance

Sending metric

Customized metrics

Features and additional information

Grouped values

Buffering metrics

Metrics

Calculating derived metrics

Monitoring process

Code snippets

System monitoring, server-side backends installation and configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages