Skip to content

Latest commit

 

History

History
345 lines (227 loc) · 16.4 KB

README.md

File metadata and controls

345 lines (227 loc) · 16.4 KB

observability-telegraf

This repo contains observability-telegraf which is a containerized version of Telegraf agent.

Design goal is to have configured container that contains running Telegraf agent with certain plugins.

Minimum requirements

Pre-configuration

Pre-configuration is needed for a container to read metrics from specific plugins:

By default, plugin uses command sudo iptables -nvL INPUT -x. iptables has become a legacy tool and has been replaced by iptables-nft. If there is a need to use iptables-nft line #binary = "iptables-ntf" should be uncommented in the configuration.

Plugin is based on Linux Kernel modules that expose specific metrics over sysfs or devfs interfaces. The following dependencies are expected by plugin:

  • intel-rapl module which exposes Intel Runtime Power Limiting metrics over sysfs (/sys/devices/virtual/powercap/intel-rapl),
  • msr kernel module that provides access to processor model specific registers over devfs (/dev/cpu/cpu%d/msr),
  • cpufreq kernel module - which exposes per-CPU Frequency over sysfs (/sys/devices/system/cpu/cpu%d/cpufreq/scaling_cur_freq).
  • intel-uncore-frequency module exposes Intel uncore frequency metrics over sysfs (/sys/devices/system/cpu/intel_uncore_frequency),

Minimum kernel version required is 3.13 to satisfy most of requirements, for uncore_frequency metrics intel-uncore-frequency module is required (available since kernel 5.6).

Please make sure that kernel modules are loaded and running (cpufreq is integrated in kernel). Modules might have to be manually enabled by using modprobe. Depending on the kernel version, run commands:

# kernel 5.x.x:
sudo modprobe rapl
sudo modprobe msr
sudo modprobe intel_rapl_common
sudo modprobe intel_rapl_msr

# also for kernel >= 5.6.0
sudo modprobe intel-uncore-frequency

# kernel 4.x.x:
sudo modprobe msr
sudo modprobe intel_rapl

The Redfish plugin needs hardware servers for which DMTF's Redfish is enabled.

For quick check proper work of redfish plugin, you can do a mockup: Mockup must be preformed on HOST!

  1. Get a source code: git clone https://opendev.org/x/python-redfish.git
  2. Go into dmtf/mockup_0.99.0a folder.
  3. Run ./buildImage.sh and ./run-redfish-simulator.sh
  4. Check that a container is running and listening on port 8000, by command: docker ps
  5. Now run observability-telegraf with redfish plugin.
  • The DPDK plugin needs external application built with Data Plane Development Kit.
  • ./telegraf-intel-docker.sh has default location of DPDK socket -/var/run/dpdk/rte, if DPDK socket is located somewhere else, user must specify this in running stage providing --dpdk_socket_path flag. Providing path to a directory that contains the hosts' own Docker socket file is not recommended.

Make sure the container has read and write access to the socket. It can be done e.g. by chmod a+rw /var/run/dpdk/rte/dpdk_telemetry.v2"

The plugin requires JSON files with event definitions to work properly. Those can be specified in ./telegraf-intel-docker.sh by providing --pmu_events parameter. Providing path to a directory that contains the hosts' own Docker socket file is not recommended.

More information about event definitions and where to get them should be found in plugin's README.

The script telegraf-intel-docker.sh has a default location for the libvirt socket at /var/run/libvirt/libvirt-sock. If the libvirt socket is located elsewhere, users must provide the --libvirt_socket_path flag at runtime to specify the custom location. It is not recommended to use a directory that contains the host's own libvirt socket file.

Make sure the container has write access to the socket. It can be done e.g. by chmod a+w /var/run/libvirt/libvirt-sock"

Similarly, the script assumes that the default location of libvirt TLS certificates is at /etc/pki/CA. However, users can override this location by providing the --libvirt_tls_cert parameter at runtime with the desired directory path.

Additionally, the script assumes that the .ssh directory is located in the user's home directory at $HOME/.ssh. If this is not the case, users can specify an alternate location at runtime by using the --ssh_dir parameter.

More information about event definitions and where to get them should be found in plugin's README.

The script telegraf-intel-docker.sh assumes that the default location of P4Runtime TLS certificates is at /etc/pki/CA. However, users can override this location by providing the --p4runtime_tls_cert parameter at runtime with the desired directory path.

If rasdaemon exists on the host OS, please make sure rasdaemon version on host matches exactly v0.6.7 (as the container does). Then mount the rasdaemon library directory to the container, so that both versions are kept in sync: ./telegraf-intel-docker.sh --use-host-rasdaemon. An alternative is to remove rasdaemon from the host OS.

  • The Intel DLB plugin needs external application built with Data Plane Development Kit and installed Intel® Dynamic Load Balancer Driver.

  • ./telegraf-intel-docker.sh has default location of DPDK socket -/var/run/dpdk/rte, if DPDK socket is located somewhere else, user must specify this in running stage providing --dpdk_socket_path flag. Providing path to a directory that contains the hosts' own Docker socket file is not recommended.

Make sure the container has read and write access to the socket. It can be done e.g. by chmod a+rw /var/run/dpdk/rte/dpdk_telemetry.v2"

Intel Baseband Accelerator Input Plugin requires a properly configured and running pf-bb-config. When running in daemon mode (VFIO mode) the pf_bb_config application is running as a service and exposes a socket for CLI interaction. The path to the socket user must specify in the option --intel_baseband_socket_path (eg --intel_baseband_socket_path /tmp/pf_bb_config.0000:b1:00.0.sock). The response from socket is stored from the .log file (eg /var/log/pf_bb_cfg_0000:b1:00.0.log). If pf-bb-config creates files ending in .log and _resposne.log, select the file _resposne.log. The path to the file user must specify in the --intel_baseband_log_path option (for the example above it will be --intel_baseband_log_path /var/log/pf_bb_cfg_0000:b1:00.0.log or, if there is a file _resposne.log, intel_baseband_log_path /var/log/pf_bb_cfg_0000:b1:00.0_response.log).

For correct operation of operator telegraph user must specify both options (--intel_baseband_socket_path and --intel_baseband_log_path). Remember to set the same values in the telegraf.conf file.

Installation

From source

  1. Install Docker 20.10.6. or newer. Docker installation guide
  2. Clone Telegraf Intel Docker repository. Cloning this repo into /tmp or any privileged directory is not recommended.
  3. Go into cloned repository cd telegraf_intel_docker.
  4. Run ./telegraf-intel-docker.sh build-run <image-name> <container-name> from source file directory to build and run Docker container in background. Provide valid image and container names in place of <image-name> and <container-name>.

How to use it

  • See available options with:

    ./telegraf-intel-docker.sh

  • Build and run Telegraf Intel Docker container:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name>

  • Build and run with DPDK socket path:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name> --dpdk_socket_path <socket-path>

  • Build and run with mounted rasdaemon folder:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name> --use-host-rasdaemon

  • Build and run with path to directory with PMU events definitions:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name> --pmu_events <events definition path>

  • Build Telegraf Intel Docker image:

    ./telegraf-intel-docker.sh build <image-name>

  • Run with DPDK socket path:

    ./telegraf-intel-docker.sh run <image-name> <container-name> --dpdk_socket_path <socket-path>

  • Run with non-default libvirt socket path, customized location of .ssh directory and tls certs:

    ./telegraf-intel-docker.sh run <image-name> <container-name> --libvirt_socket_path <socket_path> --ssh_dir <ssh_dir> --libvirt_tls_cert <certs_dir>

  • Run with mounted rasdaemon folder:

    ./telegraf-intel-docker.sh run <image-name> <container-name> --use-host-rasdaemon

  • Run with path to directory with PMU events definitions:

    ./telegraf-intel-docker.sh run <image-name> <container-name> --pmu_events <events definition path>

  • Restart Telegraf Intel Docker container (e.g. for reload Telegraf configuration file):

    ./telegraf-intel-docker.sh restart <image-name> <container-name>

  • Stop and remove all Telegraf Intel Docker container, and images linked to it:

    ./telegraf-intel-docker.sh remove <image-name> <container-name>

  • Remove Telegraf Intel Docker images:

    ./telegraf-intel-docker.sh remove-build <image-name>

  • Enter Telegraf Intel Docker container via the bash:

    ./telegraf-intel-docker.sh enter <container-name>

  • See Telegraf logs with:

    ./telegraf-intel-docker.sh logs <container-name>

Changing Telegraf configuration file

What is Telegraf configuration file?

  • Telegraf's configuration file is written using TOML and is composed of three sections: global tags, agent settings, and plugins.
  • Plugins can be loaded, unloaded or configured in configuration file.

To change Telegraf configuration file:

  • From source file directory edit Telegraf configuration file using text editor (e.g. nano):

    nano telegraf/telegraf.conf

  • Use script to reload Telegraf configuration file and load new plugins:

    ./telegraf-intel-docker.sh restart <image-name> <container-name>

  • Verify Telegraf logs to check that everything works as expected:

    ./telegraf-intel-docker.sh logs <container-name>

Usage example

  • Creating and running Telegraf Docker image:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name>

    This command will create and run Telegraf docker image with given name.

  • To see logs from Telegraf in the container:

    ./telegraf-intel-docker.sh logs <container-name>

    To exit viewing logs press: CTRL + C.

  • To load new Telegraf configuration file:

    ./telegraf-intel-docker.sh restart <image-name> <container-name> - This will restart the container, and run it with the new configuration.

  • To build and run the container with DPDK socket path:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name> --dpdk_socket_path /var/run/dpdk/rte

  • To build and run the container with necessary files for Intel Baseband Accelerator Input Plugin:

    ./telegraf-intel-docker.sh build-run <image-name> <container-name> --intel_baseband_socket_path /tmp/pf_bb_config.0000:b1:00.0.sock --intel_baseband_log_path /var/log/pf_bb_cfg_0000:b1:00.0.log


Available plugins

Input plugins

List of supported Telegraf input plugins.

Enabled by default

The following plugins should work on a majority of the host's configurations.

  1. CGroup
  2. CPU
  3. Disk
  4. Disk IO
  5. DNS Query
  6. ETH Tool
  7. Hugepages
  8. IP Tables
  9. Kernel VMStat
  10. Mem
  11. Net
  12. Ping
  13. Smart
  14. System
  15. Temp

Disabled by default

Some plugins need special attention regarding host's configuration. Observability Telegraf supports them, so they can be enabled by uncommenting associated config fields in telegraf/telegraf.conf file. Please ensure configuration requirements are properly fulfilled for plugins listed below.

  1. Intel Baseband
  2. Intel DLB
  3. Intel PowerStat
  4. Intel RDT
  5. Intel PMU
  6. DPDK
  7. IPMI Sensor
  8. Libvirt
  9. P4Runtime
  10. RAS
  11. Redfish

Output plugins

List of supported Telegraf output plugins enabled by default.

  1. File
  2. Prometheus client

Changelog

1.3.0

  • Update telegraf version: 1.24.3 -> 1.27.4
  • Add P4Runtime plugin (disabled by default)
  • Add Intel DLB plugin (disabled by default)
  • Add Intel Baseband plugin (disabled by default)
  • Add new features: cpu_base_frequency for Powerstat plugin
  • Update the final alpine image: 3.16 -> 3.18

1.2.0

  • Update telegraf version: 1.21.3 -> 1.24.3
  • Update version of pqos (intel_cmt_cat): 4.2.0 -> 4.4.1
  • Add Hugepages plugin (enabled by default)
  • Add new features: uncore_freq and max_turbo_freq for Powerstat plugin
  • Update the final alpine image: 3.15 -> 3.16