The set_telegraf utility can be used to change the state of telegraf data collection between full and partial. In addition when it is set to full, the gromet and metclient services are enabled; when setting to partial, those services are disabled.
Currently, this functionality is only needed for systems with RDBE back ends that have operational and spare FS computers. Only one full data collection should occur among the two computers. The other computer should only use a partial data collection, which is basically computer health monitoring.
Normally FS operation occurs on the operational computer, which is typically running the fs1 system. If the disks for the fs1 system suffer a catastrophic failure, it will be necessary to move operations to the fs2 system, probably running on the spare computer. The set_telegraf utility provides a way to switch the configuration as needed.
The set_telegraf script, assumes systemd
(not initd
) is being
used. This script can be used by root (or optionally an AUID
account) to change which system, fs1 or fs2, runs telegraf (and
gromet and metclient) for operations. It can also be used just to
set the configuration on a system with full to partial to disable
use of gromet and metclient.
The actions for installing set_telegraf on fs1 and fs2 are almost identical. The two differences are noted as IMPORTANT in the step fs1 set_telegraf installation below and summarized in fs2 set_telegraf installation step below.
The instructions assume that telegraf (and gromet and metclient) have been installed. On the system being used for operations (usually fs1), telegraf should be in the full configuration (described below) and gromet and metclient services should be enabled. On the system (usually fs2), telegraf should be in the partial configuration (see below) and the gromet and metclient services should be disabled.
The full configuration for telegraf points the symblic link /etc/telegraf/telegraf.conf to /etc/telegraf/telegraf.conf.full. For the partial configuration, the link points to /etc/telegraf.conf.partial.
Caution
|
It is strongly recommended that gromet and all its clients,
including telegraf (as set in /etc/telegraf/telegraf.conf), use
the met_server (or something other unique) alias as the host name
for gromet. This allows the interface to be changed between the
local interface (127.0.0.1 ) and the external interface (the IP
address of the machine) just by editing /etc/hosts and changing
which interface the alias is assigned to and restarting gromet and
all its clients. Further, fs1 and fs2 should use the same alias
and interface, either the local interface or the external interface
for that system. This allows the (back up) FS on fs2 to access the
correct interface if it is used for operations without changing
/usr2/control/equip.ctl.
|
These instructions are performed on fs1 by root. To install the set_telegraf script:
Important
|
For installing on fs2, these instructions are performed on fs2. |
-
Clone the set_telegraf repository:
cd /usr2 git clone https://github.com/nvi-inc/set_telegraf.git chown -R prog.rtx set_telegraf
-
Place a copy of set_telegraf in /usr/local/sbin and set permissions:
cd /usr/local/sbin cp /usr2/set_telegraf/set_telegraf . chmod u+rwx,go+r,go-wx set_telegraf
-
Enable use with sudo from AUID accounts (optional)
If you installed the CIS hardening, you can enable use of set_telegraf from AUID accounts that are part of the operators group.
-
Place a copy of the script that runs set_telegraf with sudo in /usr/local/bin and set permissions:
cd /usr/local/bin cp /usr2/set_telegraf/set_telegraf.sudo set_telegraf chmod u+rwx,go+rx,go-w set_telegraf
-
Run visudo, then add at end:
%operators ALL=(ALL) /usr/local/sbin/set_telegraf
-
The directions for fs2 are identical to the ones for fs1, except:
-
All work is performed on fs2
Please follow the directions in the fs1 set_telegraf installation sub-section above with those changes, which are noted as IMPORTANT there, then proceed to the step Testing set_telegraf below.
The instructions below, alternately disable and enable telegraf from collecting antenna data, and met. data.
Caution
|
Be careful to enter the command on the machine indicated. |
-
On fs1 as root, execute:
set_telegraf partial
-
Verify that the grafana display is not showing updating antenna/met. data.
-
On fs2 as root, execute:
set_telegraf full
-
Verify that the grafana display is showing updating antenna/met. data.
-
On fs2 as root, execute:
set_telegraf partial
-
Verify that the grafana display is not showing updating antenna/met. data.
-
On fs1 as root, execute:
set_telegraf full
-
Verify that the grafana display is showing updating antenna/met. data.
If in each case grafana was showing or not showing the data as indicated, then the system is checked out and has been returned to the full configuration telegraf being on fs1. The partial configuration telegraf on fs2 should still be collecting diagnostic information for that system. This is the normal configuration.
The set_telegraf utility provides a command that can be used to switch the configuration of telegraf on the fs1 and fs2 systems. The telegraf configuration on the fs1 system (normally in the operational chassis) is usually the full configuration, collecting data from the antenna, FS, datalogger, and met. server, as well as the performance data for that system. The configuration on the fs2 system (normally in the spare chassis) is the partial configuration. It only collects the performance data for that system. If for some reason the fs1 disk can’t be used (in either the operational or spare computer chassis) and fs2 disks are pressed into service for operations, set_telegraf provides a means to change the telegraf configuration on the fs1 disks to the partial one and the configuration on the fs2 disks into the full one.
Note
|
The node names of the systems are associated with the disks, not the computer chassis. Thus if the fs1 disks are moved from the usual operational computer chassis to the spare computer chassis, then fs1 is running in the spare computer chassis. If the fs1 disks are moved to the spare chassis, they can still be used for operations, including using the full configuration of telegraf. |
Important
|
It is important that only one system use the full configuration of telegraf at any given time. As a result, you should always change the current full configuration to partial before enabling the full configuration on the other system. If it is not possible to disable the current full configuration (for example, the disks won’t boot) before enabling it on the disks, the system with the previous full configuration should be kept off the network until it has been switched to partial. This can be done either be keeping it turned off or disconnecting it from the network. |
Note
|
If your system is CIS hardened and use of sudo has been able for set_telegraf, all the commands below can be executed from AUID accounts that are part of the operators group. The AUID user will typically be prompted to enter their AUID account password. |
-
When moving operations to the fs2 system:
To switch the full configuration from fs1 to fs2:
-
Change the telegraf on the fs1 disks to
partial
, as root (or using an AUID account):set_telegraf partial
-
Change the telegraf on the fs2 disks to
full
, as root (or using an AUID account):set_telegraf full
-
If gromet was serving data to the network instead of
127.0.0.1
, i.e., the alias for gromet (usuallymet_server
) is assigned to to external interface, you will need to adjust all other systems that were getting met. data from fs1 to point to fs2 instead.
-
-
When operations can be restored to the fs1 system, switch the systems back:
To switch the full configuration from fs2 to fs1:
-
Change the telegraf on the fs2 disks to
partial
, as root (or using an AUID account):set_telegraf partial
-
Change the telegraf on the fs1 disks to
full
, as root (or using an AUID account):set_telegraf full
-
If gromet was serving data to the network instead of
127.0.0.1
, you will need to adjust all other systems that were getting met. data from fs2 to point to fs1 instead.
-