Skip to content

Latest commit

 

History

History
261 lines (193 loc) · 9.53 KB

set_telegraf.adoc

File metadata and controls

261 lines (193 loc) · 9.53 KB

DRAFT set_telegraf for FS computers

1. Introduction

The set_telegraf utility can be used to change the state of telegraf data collection between full and partial. In addition when it is set to full, the gromet and metclient services are enabled; when setting to partial, those services are disabled.

Currently, this functionality is only needed for systems with RDBE back ends that have operational and spare FS computers. Only one full data collection should occur among the two computers. The other computer should only use a partial data collection, which is basically computer health monitoring.

Normally FS operation occurs on the operational computer, which is typically running the fs1 system. If the disks for the fs1 system suffer a catastrophic failure, it will be necessary to move operations to the fs2 system, probably running on the spare computer. The set_telegraf utility provides a way to switch the configuration as needed.

The set_telegraf script, assumes systemd (not initd) is being used. This script can be used by root (or optionally an AUID account) to change which system, fs1 or fs2, runs telegraf (and gromet and metclient) for operations. It can also be used just to set the configuration on a system with full to partial to disable use of gromet and metclient.

2. set_telegraf installation

The actions for installing set_telegraf on fs1 and fs2 are almost identical. The two differences are noted as IMPORTANT in the step fs1 set_telegraf installation below and summarized in fs2 set_telegraf installation step below.

The instructions assume that telegraf (and gromet and metclient) have been installed. On the system being used for operations (usually fs1), telegraf should be in the full configuration (described below) and gromet and metclient services should be enabled. On the system (usually fs2), telegraf should be in the partial configuration (see below) and the gromet and metclient services should be disabled.

The full configuration for telegraf points the symblic link /etc/telegraf/telegraf.conf to /etc/telegraf/telegraf.conf.full. For the partial configuration, the link points to /etc/telegraf.conf.partial.

Caution
It is strongly recommended that gromet and all its clients, including telegraf (as set in /etc/telegraf/telegraf.conf), use the met_server (or something other unique) alias as the host name for gromet. This allows the interface to be changed between the local interface (127.0.0.1) and the external interface (the IP address of the machine) just by editing /etc/hosts and changing which interface the alias is assigned to and restarting gromet and all its clients. Further, fs1 and fs2 should use the same alias and interface, either the local interface or the external interface for that system. This allows the (back up) FS on fs2 to access the correct interface if it is used for operations without changing /usr2/control/equip.ctl.

2.1. fs1 set_telegraf installation

These instructions are performed on fs1 by root. To install the set_telegraf script:

Important
For installing on fs2, these instructions are performed on fs2.
  1. Clone the set_telegraf repository:

    cd /usr2
    git clone https://github.com/nvi-inc/set_telegraf.git
    chown -R prog.rtx set_telegraf
  2. Place a copy of set_telegraf in /usr/local/sbin and set permissions:

    cd /usr/local/sbin
    cp /usr2/set_telegraf/set_telegraf .
    chmod u+rwx,go+r,go-wx set_telegraf
  3. Enable use with sudo from AUID accounts (optional)

    If you installed the CIS hardening, you can enable use of set_telegraf from AUID accounts that are part of the operators group.

    1. Place a copy of the script that runs set_telegraf with sudo in /usr/local/bin and set permissions:

      cd /usr/local/bin
      cp /usr2/set_telegraf/set_telegraf.sudo set_telegraf
      chmod u+rwx,go+rx,go-w set_telegraf
    2. Run visudo, then add at end:

      %operators	ALL=(ALL) /usr/local/sbin/set_telegraf

2.1.1. fs2 set_telegraf installation

The directions for fs2 are identical to the ones for fs1, except:

  • All work is performed on fs2

Please follow the directions in the fs1 set_telegraf installation sub-section above with those changes, which are noted as IMPORTANT there, then proceed to the step Testing set_telegraf below.

2.2. Testing set_telegraf

The instructions below, alternately disable and enable telegraf from collecting antenna data, and met. data.

Caution
Be careful to enter the command on the machine indicated.
  1. On fs1 as root, execute:

    set_telegraf partial
  2. Verify that the grafana display is not showing updating antenna/met. data.

  3. On fs2 as root, execute:

    set_telegraf full
  4. Verify that the grafana display is showing updating antenna/met. data.

  5. On fs2 as root, execute:

    set_telegraf partial
  6. Verify that the grafana display is not showing updating antenna/met. data.

  7. On fs1 as root, execute:

    set_telegraf full
  8. Verify that the grafana display is showing updating antenna/met. data.

If in each case grafana was showing or not showing the data as indicated, then the system is checked out and has been returned to the full configuration telegraf being on fs1. The partial configuration telegraf on fs2 should still be collecting diagnostic information for that system. This is the normal configuration.

2.3. Use of set_telegraf

The set_telegraf utility provides a command that can be used to switch the configuration of telegraf on the fs1 and fs2 systems. The telegraf configuration on the fs1 system (normally in the operational chassis) is usually the full configuration, collecting data from the antenna, FS, datalogger, and met. server, as well as the performance data for that system. The configuration on the fs2 system (normally in the spare chassis) is the partial configuration. It only collects the performance data for that system. If for some reason the fs1 disk can’t be used (in either the operational or spare computer chassis) and fs2 disks are pressed into service for operations, set_telegraf provides a means to change the telegraf configuration on the fs1 disks to the partial one and the configuration on the fs2 disks into the full one.

Note
The node names of the systems are associated with the disks, not the computer chassis. Thus if the fs1 disks are moved from the usual operational computer chassis to the spare computer chassis, then fs1 is running in the spare computer chassis. If the fs1 disks are moved to the spare chassis, they can still be used for operations, including using the full configuration of telegraf.
Important
It is important that only one system use the full configuration of telegraf at any given time. As a result, you should always change the current full configuration to partial before enabling the full configuration on the other system. If it is not possible to disable the current full configuration (for example, the disks won’t boot) before enabling it on the disks, the system with the previous full configuration should be kept off the network until it has been switched to partial. This can be done either be keeping it turned off or disconnecting it from the network.
Note
If your system is CIS hardened and use of sudo has been able for set_telegraf, all the commands below can be executed from AUID accounts that are part of the operators group. The AUID user will typically be prompted to enter their AUID account password.
  1. When moving operations to the fs2 system:

    To switch the full configuration from fs1 to fs2:

    1. Change the telegraf on the fs1 disks to partial, as root (or using an AUID account):

      set_telegraf partial
    2. Change the telegraf on the fs2 disks to full, as root (or using an AUID account):

      set_telegraf full
    3. If gromet was serving data to the network instead of 127.0.0.1, i.e., the alias for gromet (usually met_server) is assigned to to external interface, you will need to adjust all other systems that were getting met. data from fs1 to point to fs2 instead.

  2. When operations can be restored to the fs1 system, switch the systems back:

    To switch the full configuration from fs2 to fs1:

    1. Change the telegraf on the fs2 disks to partial, as root (or using an AUID account):

      set_telegraf partial
    2. Change the telegraf on the fs1 disks to full, as root (or using an AUID account):

      set_telegraf full
    3. If gromet was serving data to the network instead of 127.0.0.1, you will need to adjust all other systems that were getting met. data from fs2 to point to fs1 instead.